As one of the world’s leading research institutions, Cornell University has been engaged in high-performance computing since that term referred to mainframe computing. More than 10 years ago, Cornell helped to pioneer the use of PC clusters. Today, the need for raw computing power is complemented by a need to manage massive data sets in real time. To give researchers—both at Cornell and at other institutions around the world—access to the most cost-effective platform for high-performance computing, the Cornell Center for Advanced Computing has migrated several installations to Windows® HPC Server 2008. The result is that researchers are more productive, because they can focus on their research rather than on the underlying technology; researchers have access to a broader range of commercial and custom applications; and the computing platform is as cost-effective to administer as it is to use.
The more scientists learn, the more they want to learn. This thirst for knowledge increases the demands they make on the tools—especially the computer systems—that they use to help further that knowledge.
At Cornell University, a major academic research institution, this dilemma is evident across the spectrum of the university’s approximately U.S.$650 million annual research portfolio. For example, Dr. Stephen Pope, Professor of Mechanical Engineering, is a pioneer in the field of turbulent combustion simulation, research that is essential to the effort to produce cleaner, more efficient combustion systems for manufacturing, power generation, and transportation.
The typical design and development procedures for turbulent combustion systems involve experimental testing, which is an extremely expensive and time-consuming process. “For some space applications, the appropriate conditions cannot be achieved at a reasonable cost in ground tests,” says Pope. “Because of this, reliable and accurate computer models are continually being sought to increase combustor performance and to reduce both the development costs and design cycle time. Computer models are currently used in the design of combustors, but substantial improvements are needed in their accuracy, reliability, and computational efficiency.”
Pope’s work involves testing massive algorithms in an iterative process that could take days or weeks using a conventional PC system. So, the focus of his research has been to create easy-to-use engineering tools to more accurately predict the behavior of a given combustion design. For example, he and his team are interested in using ANSYS FLUENT, the leading commercial application for computational fluid dynamics, as a front end to detailed flame calculations.
At what might be considered the other end of the research spectrum, Dr. Jaroslaw Pillardy, Director of the Computational Biology Service Unit (CBSU) at Cornell, is interested in helping researchers compile and analyze the equally massive amounts of data needed for breakthroughs in the life sciences—breakthroughs such as mapping genomes and predicting the structure of proteins. Here, as in the field of combustion systems, the pace of discovery has increased, new information has multiplied at exponential rates, and scientists face ever-greater challenges in managing and making sense of their data.
To help address these challenges, Pillardy has been developing an array of powerful but easy-to-use bioinformatics tools that can be accessed remotely over the Web or downloaded for local use in labs elsewhere. Ease of use and powerful compute capabilities are equally important attributes in these tools, according to Pillardy. He says that without them, researchers face significant obstacles. For instance, Pillardy describes a typical scenario in which a researcher finds the application he needs, spends a long time learning how to use it, and then has to wait weeks while running a single job: “When you go to the lab, you see a desktop PC off in the corner with a sign taped on the screen that says, ‘Please do not touch this computer for the next month.’”
Pillardy also claims he has too often seen researchers and biology students wasting valuable time and energy learning computer science. “They should be focusing on their core research, not on learning and using computer systems,” he says.Solution
Cornell has been researching and providing high-performance computing (HPC) solutions to address these and other subjects, such as plasma physics and high-end visualization, for more than 20 years. Its high-performance computing center was founded for this purpose in 1985, relying on the supercomputers of the day: parallel mainframe machines.
History of Cluster Computing at Cornell
||Researchers can focus on their research because of Windows HPC Server. You don’t want the platform to be the center of attention.
||Dr. Steven R. Lantz
Senior Research Associate, Cornell Center for Advanced Computing, Cornell University
In 1997, the center became a forerunner in the exploration of a new platform for supercomputing: low-cost PC hardware and software from Dell, Intel, and Microsoft. These PCs were used for cluster computing, in which an array of computers provides far more computational power than could be obtained from the single-machine systems of a generation ago. The university’s earliest work with cluster computing was based on the Windows NT® Server 4.0 operating system; its latest is based on the Windows® HPC Server 2008 operating system, the high-performance computing solution from Microsoft and successor to Windows Compute Cluster Server 2003. The move to Windows HPC Server has been facilitated by software, support, and expertise from Microsoft Research.
As the university’s technology has changed, its way of looking at that technology has also changed. “In the past, we looked to provide raw computing power,” says Dr. Steven R. Lantz, Senior Research Associate at the Center for Advanced Computing (CAC), which is dedicated to large-scale computing at Cornell. “Now, the challenge is managing data-intensive computing, problems with many terabytes of data, and being able to store, access, and move that amount of information at sufficiently high rates. That has called for expanding and optimizing our work with cluster computing.”Cluster Computing as Applied to Combustion Research
For Pope’s research in turbulent combustion, CAC hosts and manages a compute cluster running on Windows HPC Server. The Dell PowerEdge 1950 cluster consists of 36 3-GHz dual-processor, dual-core Xeon computers, each with 8 GB of RAM, for a total of 144 cores. Nearly all of the cores—140—are available for parallel computations; the other 4 cores are dedicated to cluster management over the high-performance QLogic SDR 4x InfiniBand interconnect. The system is rounded out by 2 Dell PowerEdge 2950 servers to support interactive work, and 2 terabytes of RAID file storage.
In addition to the ANSYS FLUENT front-end software mentioned earlier, Pope’s research calls for the use of a variety of commercial software, including Tecplot 360 for visualization, and The MathWorks MATLAB technical computing language for creating the algorithms that are used in the large-scale combustion simulations. The researchers also create custom software for the simulations using the Microsoft Visual Studio® 2008 development system—particularly the high-performance computing features of that system such as the parallel debugger—along with Process Explorer, which enables them to see the active states of their processes as those processes are running. In addition to direct access to the cluster through the university’s local network, Pope and his colleagues can access the cluster remotely using Windows Remote Desktop.Cluster Computing as Applied to Life Sciences Research
To address researchers’ needs for high-performance computing in the life sciences, Pillardy and his colleagues at the CBSU created BioHPC, a suite of bioinformatics applications covering all major areas of computational biology, including population genetics, sequence analysis and data mining, and protein structure prediction. To provide the performance that life sciences researchers require, the applications—35 so far and growing—operate on a cluster of computers running Windows HPC Server 2008 (BioHPC was originally deployed on Windows Compute Cluster 2003). The cluster, separate from the one used by CAC for the combustion research, consists of a total of 92 nodes with 376 cores, which are partly Dell PowerEdge 1U servers, and partly Dell PowerEdge M600 blade servers.
Because Pillardy and his colleagues intend BioHPC to be used by the broadest possible research community, they made it accessible primarily over the Web, through an interface written in Microsoft ASP.NET code. Since BioHPC was first deployed in 2003, it has processed about 100,000 computationally intensive data-processing jobs submitted by more than 8,700 researchers from 80 countries. Four universities are working with the CBSU to set up local BioHPC installations using the downloadable, open-source version of the software, and Pillardy expects three or more institutions to set up their own installations each year.Benefits
Cornell researchers using Windows HPC Server 2008 now are able to focus more on their research than on the technology that supports their research. They also have a broader array of third-party applications to use and can more easily create custom applications. Plus, administration of this platform is easier and more cost effective.Gives Researchers the Ability to Focus on Their Work, Not the Technology
Pope, Lantz, and Pillardy agree that the unique combination of the ease of working with Windows and the high-performance computing enabled by Windows HPC Server 2008 speeds research by making researchers more productive.
“Microsoft is our platform of choice for life sciences high-performance computing,” says Pillardy. “It provides access to compli-cated applications in an easy, familiar way, because it’s Windows. We have the hardware resources to run it. And the integration capabilities that are a key feature of Windows make it easy to add new applications as needed. We like it; we know it.”
“The result is that researchers can focus on their research because of Windows HPC Server,” adds Lantz. “You don’t want the platform to be the center of attention. That’s important for grad students, for example, who are juggling research, classes, and other responsibilities. They don’t have time to spare, and Windows HPC Server doesn’t waste their time.”
The researchers also point to the ways that Windows HPC Server, like the rest of the Windows family, makes applications highly accessible. For instance, the ability to use Windows Remote Desktop means that researchers don’t need to be in their lab to access their data and applications—they can work from home, from conferences, or other sites. That means they can work when and as they want to, outside of regular working hours. “Work can be conducted more effectively, more productively,” says Pope. “We can get to the insights we need to further our research.”Enables Broad Use of Commercial and Custom Software
The researchers are using Windows HPC Server to run just about any commercial and custom applications they want. “There are strong relationships between Microsoft and the software companies whose tools we need to use, such as ANSYS FLUENT, Tecplot 360, and The MathWorks MATLAB,” says Lantz. “That translates into software that we know will work together, enabling us to get our work done.”
When it comes to creating custom applications, the researchers praise Visual Studio. “There’s nothing like Visual Studio in the Linux world,” says Pillardy. “This is an incredibly rich environment with powerful tools, such as the parallel debugger and context-sensitive Help, that make development faster and more successful. And researchers and developers who know Visual Studio from the more traditional Windows world can immediately put their knowledge to use creating applications for high-performance computing. Even the database software has more tools than you find in other environments.”
Pillardy notes that the ability to rapidly create high-performance computing applications for Windows HPC Server does more than make researchers more productive; it also changes the ways that researchers operate. For example, Pillardy points to the hiring process. “When we hire, we can focus on hiring subject matter experts, rather than technology experts,” he says. “You can take a subject matter expert and easily teach him Windows. You don’t have to worry about whether he has the Windows and development skills coming in. That gives you more flexibility to hire the best people possible.”Reduces Time and Cost of HPC Administration
CAC manages its Windows HPC Server cluster using the same Windows management tools that it uses to manage its other Windows-based installations, further reducing the time and expense of supporting the high-performance computing cluster. “We use the same Windows deployment services to push out images that we use elsewhere,” says Dr. David Lifka, Director of CAC. “It’s a great timesaver when we want to bring new hardware online.”For More Information
Microsoft Server Product Portfolio
For more information about the Microsoft server product portfolio, go to:
For more information about Microsoft products and services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Information Centre at (877) 568-2495. Customers who are deaf or hard-of-hearing can reach Microsoft text telephone (TTY/TDD) services at (800) 892-5234 in the United States or (905) 568-9641 in Canada. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information using the World Wide Web, go to:
For more information about Cornell University, call (607) 227-1865 or visit
the Web site at: