4-page Case Study - Posted 10/12/2006
Views: 683
Rate This Evidence:

University of Cincinnati Genome Research Institute

Supercomputing Solution Reduces IT Administration Needs at Research Institute

The University of Cincinnati Genome Research Institute (GRI) is a global hub for biomedical research, focusing on new therapies for cancer and metabolic diseases such as obesity and diabetes. Founded in 2002, GRI brings together a range of academic, nonprofit, and industry partners. Drug-development technologies require large amounts of computing power, but the Linux and UNIX-based high-performance computing (HPC) clusters at GRI were too technically challenging for most researchers to use. In addition, supporting the mixed computing environment was difficult and time-consuming. The institute deployed Windows® Compute Cluster Server 2003 so that all researchers could use HPC without relying on super users to run jobs for them. GRI reports that the new HPC solution is easy to deploy and maintain, and job scheduling enables users to easily submit and monitor jobs from their workstations.

Situation

The University of Cincinnati Genome Research Institute (GRI) began in 2002 when a pharmaceutical company donated its campus in Cincinnati, Ohio, to the university. The university renovated the campus to create an advanced interdisciplinary academic research institute with a focus on discovering new drugs to treat disease. The campus also houses the institute’s pharmaceutical industry partners.

Matt Wortman, Director of Computational Biology and Information Technology at GRI, says, “Drug discovery is traditionally a commercial venture, whereas academia traditionally focuses on pure research. But as basic academic research advances toward an obvious therapeutic value, independent researchers are becoming more interested in coming up with therapeutic treatments for diseases.

“However, we don’t have the massive research and development budgets of a drug company,” Wortman continues. “Scientists and academic institutes are responsible for bringing in their own research dollars, and there are not a lot of drug discovery dollars out there right now. So we’ve created a commercial and academic partnership for drug discovery here at GRI.”

About 95 percent of the institute’s computers run the Windows® XP Professional operating system, with identity management and authentication managed through the Active Directory® service on Microsoft Windows Server® 2003 Standard Edition. The remaining five percent are UNIX or Linux-based high-performance computing (HPC) clusters. These clusters are used for “in silico” experiments—that is, experiments performed using computer simulations.

Wortman estimates that only 15 people at GRI actively used the UNIX and Linux computer clusters. However, that group of core users regularly ran computational jobs on behalf of approximately 60 other researchers who lacked the technical skills to use UNIX and Linux.

“We were trying to engage basic research biologists with in silico drug discovery, but the tools we had were far too technically complex for many of them,” says Wortman. “Users had to authenticate to various computers using Secure Shell and run their jobs from the command line. It was pretty difficult, and so we would run jobs for them. We wanted to move away from the core facility approach and enable people to work directly with the computer clusters.”

The heavy reliance on Linux for high-performance computing required GRI to have a skilled Linux administrator on call to maintain the systems, typically someone who did that job in addition to normal duties at the institute. “The mixed computing environment was complicated to manage because identity management wasn’t consistent across all computers—users had separate accounts in Linux, UNIX, and Windows[-based] environments,” says Wortman. “We wanted to make IT management easier by bringing all of our accounts into Active Directory and standardizing on an operating system that everyone was familiar with.”

Solution

In early 2006, the Genome Research Institute received an invitation to evaluate Windows® Compute Cluster Server 2003. “I’d had my eye on Windows Compute Cluster Server for some time, reading blogs and technical bulletins,” says Wortman. “I saw an opportunity to try it out and learn whether it would address our needs.”

In March 2006, GRI and Microsoft began to discuss the institute’s computing goals and the applications that it would need to run on Windows Compute Cluster Server. GRI had created a Linux-based drug discovery application that would need to be migrated to Windows. This application models the interactions between molecules to help researchers predict which of nearly 4 million drug-like compounds may interact with disease targets and warrant testing in the high-throughput screening facility. Because the molecules continually rotate and change shape, the computational process is extremely intensive.

Dan Rogers, Research Associate at GRI, used the Microsoft Visual Studio® 2005 development system to migrate the software code to Windows. Meanwhile, GRI installed Windows Compute Cluster Server on a seven-node cluster of server computers with AMD Opteron dual processors. Wortman tested the deployment to make sure that the nodes were communicating properly, using scripts written by Rogers and other scripts that he downloaded from Windows Compute Cluster Server message forums. Then Wortman and Rogers installed the migrated drug discovery application on the cluster.

GRI plans to add another five nodes to the cluster in the near future, and expects it to grow to 14 nodes total. Rogers has been working since the deployment to finish the migration of several other 64-bit bioinformatics software programs to Windows Compute Cluster Server. “I’ve compiled six or seven of them, fostering access to the job scheduler and taking advantage of the Microsoft Message Passing Interface and the capabilities of computer clustering,” says Rogers. GRI is also linking its HPC system to the Ohio Supercomputer Center in Columbus, Ohio, to share resources across the ultra-high-speed Third Frontier Network.

Benefits

The Genome Research Center has seen some notable benefits from its new high-performance computing solution, including easy installation, simplified job scheduling, and a lighter administrative burden. GRI expects that its less technical users will quickly adopt HPC as a research tool because of the familiar Windows operating system. In September 2006, GRI plans to make its Windows Compute Cluster Server deployment available for general use as part of the Genome Research Institute Discovery Platform project. Once launched, the project will provide Web-based access to bioinformatics software to researchers and institutions throughout Ohio.

Easy, Fast Installation
“The initial deployment of Windows Compute Cluster Server took only a couple of hours with the Remote Installation [Service],” says Wortman. “Later, we received additional hardware while I was out of town. Our Windows technician, who had no experience with high-performance computing, was able to set the new nodes up by himself in a matter of hours. With Linux, it would take at least a full day—if not several days—to set up an equivalent cluster. If you purchase a cluster operating system you can cut that time down considerably, but configuring the network protocols and getting all of your nodes and the job schedulers running is really a pain.”

Simplified Job Scheduling
The user experience is much better with Windows Compute Cluster Server, according to Wortman: “Previously to run a job, I had to connect using [Secure Shell] and type unbelievable amounts of text. Now I can launch and monitor jobs from my workstation using the job scheduler with XML templates that we’ve created.”

“The XML templates are easy to create and modify,” says Rogers. “I imagine we’ll make a lot of templates to begin with, and then individuals will customize them for their own uses. We’ll probably start out with 40 or maybe 50 templates, but I expect that there will be hundreds soon.”

Wortman adds, “Eventually we’ll move toward a Web browser interface. Researchers work a lot from home and on the road. Providing a hosted Web-based application is a way of uniting resources and making them easy to access and use.”

Wider Access to High-Performance Computing
Many of the researchers at GRI lacked the background needed to efficiently use computers running the UNIX and Linux operating systems, and they were unable to take time away from their research to learn these systems. For them, access to high-performance computing was possible only when the small core group of users who were proficient in these technologies had the time to run jobs on behalf of other researchers.

“To turn research biologists into supercomputer users, you have to make it easy for them, by providing systems that are familiar and easy to learn,” says Wortman. “Our vision is to make high-performance computing into the resource that our users didn’t even know they needed until now. Once people start using the new HPC system, I expect that we’ll see a cascading effect where others will see the benefits, and start using the new technology. That will lead to more applications being developed, which will lead to even more users.”

Less Administrative Work
One of the challenges of the mixed computing environment at GRI was the difficulty in keeping user accounts consistent on multiple systems. Deploying Windows Compute Cluster Server alleviates this problem because Active Directory provides a single point of administration for account creation, updates, and management for HPC.

By reducing the amount of work needed to manage its high-performance computing environment, GRI enables its staff to turn their valuable skills and expertise to other, more important areas. “One of our scientific programmers had to spend a large portion of his time being ‘the Linux guy,’” says Wortman. “Now he can focus on creating chemistry applications instead of on cluster maintenance."

For More Information

For more information about Microsoft products and services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Information Centre at (877) 568-2495. Customers who are deaf or hard-of-hearing can reach Microsoft text telephone (TTY/TDD) services at (800) 892-5234 in the United States or (905) 568-9641 in Canada. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information using the World Wide Web, go to:
www.microsoft.com

For more information about University of Cincinnati Genome Research Institute services, call (513) 558-5473 or visit the Web site at:
gri.uc.edu

Microsoft Server Product Portfolio

For more information about the Microsoft server product portfolio, go to:
www.microsoft.com/servers/default.mspx

For more information about Microsoft high-performance computing solutions, go to:
www.microsoft.com/hpc

© 2006 Microsoft Corporation. All rights reserved. This case study is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, Active Directory, Visual Studio, Windows, the Windows logo, and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. All other trademarks are property of their respective owners.
Solution Overview



Organization Size: 400 employees

Organization Profile

The University of Cincinnati Genome Research Institute (GRI) employs approximately 400 researchers, laboratory personnel, and support staff at its 24-acre campus in Cincinnati, Ohio.


Business Situation

GRI wanted to make high-performance computing (HPC) available to researchers who had difficulty using the institute’s technically challenging Linux and UNIX-based computer clusters.


Solution

The institute deployed Windows® Compute Cluster Server 2003 as an easy-to-use HPC environment.


Benefits

Easy, fast installation
Simplified job scheduling
Wider access to high-performance computing
Less administrative work


Hardware

AMD Opteron dual-processor server computers


Software and Services
  • Microsoft Visual Studio Team System
  • Microsoft Windows Compute Cluster Server 2003
  • Microsoft Windows Server 2003 R2

Vertical Industries
  • Biopharmaceutical Industry
  • Biotechnology Industry
  • Life Sciences

Country/Region
United States