As a world leader in the research of protein behavior, the Daggett Research Group at the University of Washington is always in need of more computing power. The group augmented its existing Linux-based high-performance computing (HPC) resources with two new HPC systems based on Windows® Compute Cluster Server 2003, which has delivered better performance than Linux and enabled the group to make fundamental breakthroughs in how the results of simulations are analyzed.
The Daggett Research Group within the University of Washington Department of Bioengineering is world-renowned in the study of protein stability, function, and folding—some of the fundamental unsolved problems in molecular biology. Although much is known of native folded conformation of proteins, very little is known about the actual folding process. Understanding this has important implications for research into all biological processes, including aging and human diseases.
Given that experimental approaches provide only limited amounts of information, the Daggett Research Group uses computer simulations that require massive amounts of computation. Today, the bulk of that computing is done using the U.S. Department of Energy’s National Energy Research Scientific Computing Center (NERSC) in Berkeley, California. However, for the Daggett Research Group, shared computer time does not meet all its needs.
“We’re always in need of more processing power and never know from year to year how much shared time we’ll be allocated,” says Valerie Daggett, Professor of Bioengineering at the University of Washington. “Besides, we compete with lots of other organizations for those shared resources, and sometimes we need full control. Therefore, we need local high-performance computing resources as well. In the past, our answer to that need was a hodgepodge of hardware running Linux, assembled over the years.”
The Daggett Research Group augmented its local computing capabilities with high-performance computing (HPC) clusters running Windows® Compute Cluster Server 2003. “We were quite surprised when, without any optimization, the new Windows-based HPC system outperformed our highly optimized Linux cluster,” says Daggett. “In fact, we’ve been so happy with Windows Compute Cluster Server 2003 that we purchased a second cluster.”
||We were quite surprised when, without any optimization, the new Windows-based HPC system outperformed our highly optimized Linux cluster.
Professor, University of Washington
The group’s first Windows-based HPC cluster, purchased in 2006, has 20 nodes, each of which is configured with two 2-gigahertz (GHz) Intel 5130 Xeon dual-core processors, 4 gigabytes (GB) of RAM, and Intel PRO/1000 gigabit Ethernet adapters. The second such system, purchased in 2007, has 10 nodes, each configured with two 2-GHz Intel E5335 Xeon quad-core processors, 4 GB of RAM, and Intel PRO/1000 gigabit Ethernet adapters. Both run on Silicon Mechanics hardware.
Both clusters run the group’s internally developed in lucem Molecular Mechanics simulation software, which was originally written for Linux. Daggett Research Group researcher David Beck, PhD, used the Microsoft® Visual Studio® 2005 development system to adapt the software to run on Windows Compute Cluster Server 2003. “At first, I was apprehensive about what it would take,” says Beck. “But all it required was mapping the POSIX thread model to the Windows thread model, which only took 80 lines of code.”
The group also runs Microsoft SQL Server® 2005 data management software on each cluster node to analyze the terabytes of results from its protein-folding simulations. “Windows Compute Cluster Server 2003 enables us to double-task the cluster, using it for both simulation and analysis,” says Daggett. “This is something that we just can’t do with Linux, and it provides a huge advantage.”
The Daggett Research Group also used SQL Server 2005 as a back end to develop its own queuing service, which enables researchers to seamlessly dispatch jobs to either the Windows-based or Linux-based systems. “In computational chemistry, there’s a strong Linux/UNIX mindset,” says Daggett. “So people in the lab were reluctant to use Windows at first. Today, now that we’ve made the process of dispatching jobs transparent, our Windows-based clusters are given more work and we’re very much a mixed lab.”
With Windows Compute Cluster Server 2003, the Daggett Research Group was able to supplement its local HPC resources in a way that provides several advantages over a Linux-based solution. Some of those benefits include:
- Better performance. Even without optimization, the group’s simulation software performed 5 percent better on Windows Compute Cluster Server 2003 than the heavily optimized, Linux-based version running on identical hardware.
- Rapid development. Beck adapted the group’s simulation software to run on Windows Compute Cluster Server 2003 in only two weeks, including the time it took him to learn the Visual Studio 2005 development system.
- Simplified cluster deployment. The group now uses Windows Deployment Services and SYSPREP to deploy new cluster nodes, which, according to Beck, makes the process “relatively simple compared to Linux.”
- Ease of management. The Daggett Research Group can easily manage its Windows-based HPC cluster using the same Active Directory® service infrastructure it uses to manage other lab resources, including its five database and development server computers and 10 Windows-based desktop computers.
- Ease of integration. Using the Client for Network File System provided by Microsoft, the group can mount its Windows-based systems directly to its Linux-based file servers without having to use SAMBA open source software, resulting in significantly better performance.
- Entirely new capabilities. The ability to run SQL Server 2005 on each cluster node has fundamentally transformed the way that the group analyzes simulation results. “With the combination of Windows Compute Cluster Server and SQL Server 2005, we can attack problems in new ways and at new magnitudes,” says Daggett. “We can examine 100 times more data because tasks that used to take hours are now reduced to fractions of a second. It’s for this very reason that any new systems we purchase are based on Windows Compute Cluster Server 2003.”