We describe a system for volume rendering via ray casting, targeted at medical data and clinicians. We discuss the benefits of server vs client rendering, and of GPU vs CPU rendering, and show how we combine these two advantages using nVidia’s Tesla hardware and CUDA toolkit. The resulting system allows hopsital-acquired data to be visualized on-demand and in real-time by multiple simultaneous users, with low latency even on low bandwidth networks and on thin clients. Each GPU serves multiple clients, and our system scales to many GPUs, with data distribution and load balancing, to create a fully scalable system for commercial deployment. To optimize rendering performance, we present our novel solution for empty space skipping, which improves on previous techniques used with CUDA. To demonstrate the flexibility of our system, we show several new visualization techniques, including assisted interaction through automatic organ detection and the ability to toggle visibility of pre-segmented organs. These visualizations have been deemed by clinicians to be highly useful for diagnostic purposes. Our performance results indicate that our system may be the best-value option for hospitals to provide ubiquitous access to state-of-the-art 3D visualizations.