4-page Partner Case Study
Posted: 4/27/2011
269
Rate This Evidence:

Microsoft Technical Computing Institute Makes Strides Toward Solving Genetic Mysteries With Parallel Development

Situation
Instituto Casa Sollievo della Sofferenza, an Italian reference hospital, and its affiliate CSS-Mendel conduct genetic research and make genetic diagnosis available to the public. The hospital comprises several research groups that specialize in such areas as cardiovascular diseases, gastrointestinal diseases, diabetes, cancer, and neurogenetics. One of these groups, bioinformatics, contributes to the efforts of several others by applying computer science to the medical field to make advancements in research.

Tommaso Mazza, Coordinator of the Computational Biology Unit at Casa Sollievo della Sofferenza and CSS-Mendel, has made a career of developing ways to study biological modeling using complex algorithms that identify differences in individuals’ DNA. Mazza and his colleagues use bioinformatics solutions to aid in research regarding neurological diseases, diabetes, and cancer. “Our work involves comparing genetic codes from affected patients with reference genetic material and looking for significant differences,” explains Mazza. “Such differences might trigger the chain of causation for those diseases, so we want to understand more about them. A key aspect of this research is that we exchange information directly with hospitals—it’s ‘bench-to-bed’ research in which the patient is the subject and object of our efforts.”

One of the challenges that researchers faced in the past was that of processing the entire human genome all at once. Available
These charts depict biological networks of real systems in nature
These charts depict biological networks of real systems in nature
algorithms and hardware did not allow to do that, but, barely, to give a narrow view of the genome. Nowadays, high throughout instruments exist that allow to sequence entire genomes at low cost and reduced times. Casa Sollievo della Sofferenza has recently acquired a next generation sequencer to speed-up the process of identifying new genes, mutations and genetic variants. “This task can result in errors, which have a negative impact on research veracity,” says Mazza. “Finding a true DNA mutation of any sort is the target of many genetics studies and the hope of any researcher, but a false positive is a curse that might result in a waste of resources and misleading claims.”

Mazza and his colleagues wanted to be able to perform algorithm-based simulations on larger, more complex models to derive more accurate results. “I had developed several sequential algorithms, but I realized that biological algorithms must first be made parallel, because nature itself behaves in a parallel manner to some extent,” says Mazza. (See Figure 1 for data visualizations of natural systems.) “Indeed, if a chemical transformation occurs, it does not take place for two molecules only, but, rather, for all molecules, and it does so simultaneously rather than in a sequential manner.”

The researchers further realized that parallelism must also be coupled with scalability because increases in biological knowledge will result in a significant growth in the size of the target systems of study. “For algorithms to complete their computations in a reasonable amount of time, they need to be highly scalable and efficient, so I sought to develop parallelized solutions to help researchers investigate larger sets of data,” says Mazza.

Solution
Mazza had a range of choices when it came to tools for developing his solutions, but he opted to use Microsoft technologies, particularly the C# programming language, Microsoft .NET Framework 4, and the Microsoft Visual Studio 2010 Ultimate development system. “I rely on Microsoft solutions to handle my high-performance computing needs, and I found the support for Parallel Programming in Microsoft .NET Framework 4 and Visual Studio 2010 Ultimate extremely useful and well implemented,” says Mazza.

Using the Microsoft development tools, Mazza creates research solutions such as those that run parallel stochastic simulation (see Figure 2), conduct pedigree analyses of cell cycles, and work with biological networks to discover the importance of biological components and groups of them. He also has developed ways of changing the quantitative part of genetic models to better understand the way that they behave following any modification.

Mazza has found some parallel programming features of .NET 4 particularly useful, including:

• Parallel Language-Integrated Query (PLINQ), a parallel implementation of LINQ to Objects that implements the full set of LINQ standard query operators as extension methods in the T:System.Linq namespace and has additional operators for parallel operations. PLINQ queries scale in the degree of concurrency according to the capabilities of the host computer. In many scenarios, PLINQ can significantly increase the speed of LINQ to Objects queries by using all available cores on the host computer more efficiently, thus increasing performance.

• Parallel Loops, which can be used to perform the same independent operation for each element of a collection or for a fixed number of iterations. The .NET Framework 4 includes both parallel For and parallel ForEach loops.

• BlockingCollection, a thread-safe collection class that provides an implementation of the Producer-Consumer pattern, concurrent adding and tacking of items from multiple threads, and many other features that support bounding and blocking.

In addition, Mazza and his colleagues appreciate several of the specialized features in Visual Studio 2010 Ultimate, such as the Parallel Debugger and Profiler, which they use to find bottlenecks and bugs in the simulations solutions that they develop.

In parallel stochastic simulation, many processes are spawned and run as independently as possible.
In parallel stochastic simulation, many processes are spawned and run as independently as possible.
Researchers from Casa Sollievo della Sofferenza also put the features to work in parallelizing ways of browsing a biological network. “We have a prototype that generates a combinatorial number of groups of vertices and computes some topological indices for them,” says Mazza. “The procedure is embarrassingly parallel, and parallel loops accelerate its performance.” They also use PLINQ and other features to control access to shared data structures and to implement a Producer-Consumer protocol (see Figure 3) in spPeAn, which is a tool that Mazza and colleagues designed for stochastic parallel pedigree analysis.

One of the critical solutions that Mazza helped develop using Microsoft parallel development tools is known as Ocean Workbench, which is an open framework that was designed to analyze biological models through high-performance computing (HPC) techniques. Ocean’s plug-in components share a modular HPC software architecture that is capable of exploiting available hardware, including a multi-core computer, a cluster, or a cloud-based environment.

Mazza plans to use specialized Ocean components to conduct large-scale simulations of heterogeneous biological systems. One such component, Sweeper, makes it possible to replicate the same simulation, but slightly changing some of the parameters of the model each time to monitor the overall behavior of the system. The theory behind this computational process is called Multiple Replication in Parallel and involves high levels of parallelism.

Producer-Consumer–based strategy to rank a biological graph
Producer-Consumer–based strategy to rank a biological graph
The hardware that researchers use is an HPC workstation that consists of four AMD Opteron dual processors, each with 12 cores, for a total of 96 cores and 422.4 gigaflops, and of two NVIDIA Tesla GPUs. Of particular help to Mazza and his colleagues is the Microsoft Compute Cluster Pack, part of the Windows HPC Server 2008 operating system. “By distributing data across the nodes of a cluster, we can simulate parallel handling of time and other such issues,” explains Mazza. “Plus, we can use the Microsoft Compute Cluster Pack to interact directly with the Windows HPC Server 2008 Job Scheduler to launch and manage jobs.”

In the future, the institute plans to extend the bioinformatics group’s research to the cloud by using Windows Azure to make various Ocean components available to colleagues regardless of where they happen to be. “The institute has a big, dispersed team that is working on genetic research projects, and we want to take advantage of Windows Azure to expand the use of Ocean to the widest possible scientific community,” says Mazza.

Benefits
For Casa Sollievo della Sofferenza, using Microsoft development tools to create biological modeling solutions means expanding genetic research into new realms with greater scopes. Researchers hope that the ability to analyze larger data sets—and to do so quickly with easy-to-use tools—will result in scientific innovation that will have a positive effect on the lives of patients.

Enhanced Ability to Conduct Breakthrough Research
Casa Sollievo della Sofferenza can make strides toward solving the mysteries of many diseases, due in large part to the bioinformatics capabilities that aid the institute’s researchers. Mazza credits the ability to make such advancements to the Microsoft parallel computing platform. “With other tools, it’s not as easy to parallelize code, and there are some simulations that we just wouldn’t be able to do had I not had the Microsoft development tools to help me design my own solution framework,” says Mazza. “Thanks to the support for parallel programming in .NET Framework 4, complex biological tasks can be distributed in a meaningful way so that the memory is kept balanced over the compute nodes.”

Greater Researcher Efficiency
Because of their use of parallel development tools from Microsoft, Mazza and his colleagues at Casa Sollievo della Sofferenza can both develop and use biological modeling solutions more quickly and efficiently, thereby speeding the time-to-insight for research projects. “We’re able to deliver prototypes and research results on time for conferences and meetings because Microsoft tools are tightly integrated and confer high usability,” says Mazza. “That’s of great help because they map to all the computational and
organizational steps of our research very closely. Also, by using the support for parallel programming in .NET Framework 4, we can maximize research efficiency because we launch simulations to run concurrently. We could actually launch 200 simulations at once and exploit the multi-core architecture of each processor for each simulation.”

Ease of Use
Researchers of Casa Sollievo della Sofferenza point to .NET 4 and Visual Studio 2010 Ultimate as providing them with an easy way to handle parallel programming. “We’re able to manage the concept of asynchronous tasks because the support for parallel programming in .NET Framework 4 offers a straightforward process that makes it easy to order the execution of many different tasks,” says Mazza. “We have found the system to be easy-to-use, intuitive, and incredibly efficient collection of constructs that significantly improves the practicability of our algorithms.”

More Trustworthy Research Results
When trying to simulate extremely large molecular quantities without parallelization, Mazza and other researchers struggled with the nature of chemical reactions. When applied to highly heterogeneous reactions, classical simulation algorithms often give erroneous results spread in a warped time scale. “By using the support for parallel programming in .NET Framework 4 to partition complex networks, manage shared queues of messages, and generate tasks within the resilient and reliable .NET 4 environment, we safely simulate biological networks in parallel,” says Mazza. “I found it to be extremely handy and powerful for my purposes.”

Microsoft .NET
Microsoft .NET is software that connects people, information, systems, and devices through the use of web services. Web services are a combination of protocols that enable computers to work together by exchanging messages. Web services are based on the standard protocols of XML, SOAP, and WSDL, which allow them to interoperate across platforms and pro-gramming languages.

.NET is integrated across Microsoft products and services, providing the ability to quickly build, deploy, manage, and use connected, secure solutions with web services. These solutions provide agile business integration and the promise of information anytime, anywhere, on any device.

For more information about Microsoft .NET and web services, please visit these websites:
www.microsoft.com/net20
msdn.microsoft.com/webservices

For more information about Microsoft products and services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Information Centre at (877) 568-2495. Customers in the United States and Canada who are deaf or hard-of-hearing can reach Microsoft text telephone (TTY/TDD) services at (800) 892-5234. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information using the World Wide Web, go to:
www.microsoft.com

For more information about Casa Sollievo della Sofferenza and CSS-Mendel, call (39) (06) 4416 0503, or visit the websites at:
www.operapadrepio.it
www.css-mendel.it

Solution Overview




Partner Profile

Casa Sollievo della Sofferenza is an institute for hospitalization, care, and scientific research located in San Giovanni Rotondo, Italy. It has an affiliate institute, CSS-Mendel, located in Rome. Between the two sites, it employs 100 researchers that focus on genetics.


Business Situation

The institute needed a system that was capable of quickly processing huge quantities of genetic data to compare the genome of patients who had certain diseases with reference genomes.


Solution

Researchers used parallel development tools in Microsoft .NET Framework 4 to build parallelized biological solutions to expand their research.


Benefits

• Enhanced ability to conduct breakthrough research

• Greater researcher efficiency

• Ease of use


Software and Services
  • Microsoft Visual Studio 2010 Ultimate
  • Microsoft .NET Framework 4
  • Windows HPC Server 2008 R2

Vertical Industries
Life Sciences

Country/Region
Italy

Languages
English

Partner(s)
Casa Sollievo della Sofferenza

RSS