United States   Change   |   All Microsoft Sites

Windows HPC Server 2008

GPGPU Computing Horizons: Developing and Deploying for Microsoft Windows

  • Acceleware’s cluster solution helps enable Kodak to develop market-leading, high performance image sensor products for consumer, professional and advanced applications – Kodak has already seen significant productivity improvements that amount to greater than 10x.

  • ANSYS leverages the power of GPGPUs to dramatically reduce overall engineering simulation processing time – by as much as half.

  • Impetus AFEA Solver and GPGPU technology is a combination that results in more accurate solutions in far less time – a 12 million degree of freedom Fluid Structure Interaction (FSI) model will run 20 times faster than with a standard 4 core CPU only solver.

  • Using the Manifold System GIS and GPGPU technology, Associated Engineering on behalf of the city of Calgary, Alberta was able to run multiple large-scale surface models, with a wide range of potential conditions, to analyze a city-wide drainage system in just a few hours instead of a few days.

  • By harnessing the power of GPGPUs, SciFinance® parallel codes for derivatives pricing models run 50x-300x faster than serial code.

  • Quantifi has significantly increased productivity across a wide range of financial computations by incorporating GPGPU technology, enabling groundbreaking price/performance levels with gains of up to 100x versus a base line performance matrix.

Increase performance from 10x to 100x or more by harnessing the power of GPGPU acceleration

Microsoft Windows and Microsoft Visual Studio provide the most robust platform and functionality throughout the software development lifecycle

The coupling of general purpose graphics processing units (GPGPUs) and CPUs is a great way to use the enormous power and opportunity of multi-core processing. In this co-processing model, the compute intensive portions of an application use the parallel computing capabilities of the GPU, while the sequential part of an application’s code runs on the CPU.

As a result, a wide range of applications such as options pricing and risk modeling, molecular dynamics, and seismic exploration can take advantage of the massive computational power of GPGPUs to achieve significantly higher levels of performance at lower cost and power levels than traditional general purpose microprocessors. GPGPUs can dramatically increase the performance per watt of workstations and can enable large increases in performance in clusters and servers, with the added benefit of lower power consumption and lower cooling requirements.

Highly productive development tools help you effectively accelerate applications now

With the rapid evolution of development tools, environments and libraries, developers around the world are able to take full advantage of the significant power of GPU acceleration. No longer is parallel programming expertise or in-depth GPU knowledge required. Developers can continue using their preferred programming languages, building upon their skill sets, to realize the immediate benefits that low cost and high performance GPUs offer today.

Integrated Development Environments

  • Visual Studio 2010

    Microsoft Visual Studio

    Visual Studio 2010

    Microsoft’s Visual Studio is a powerful IDE that ensures quality code throughout the entire application lifecycle from design to deployment. With market leader features for parallel development it is not only a great tool for GPGPU programming, but for also creating solutions that reach Windows, the web and beyond. Visual Studio is your ultimate all-in-one solution.

    Microsoft Visual Studio also delivers a highly extensible environment and many parallel programming extensions—DirectCompute, CUDA, AMD Stream, MATLAB, Mathematica, and OpenCL—have either native or third party support in Visual Studio. Visual Studio can be used to develop software for multi-core and parallel applications using DirectCompute, or with CUDA and AMD's APP SDK to target GPGPUs directly.

    Learn more and try Visual Studio »
  • NVIDIA Parallel Nsight

    NVIDIA

    NVIDIA Parallel Nsight

    Parallel Nsight integrates GPU acceleration and debugging into Microsoft Visual Studio 2010, providing developers with a familiar, well-supported set of development tools for CUDA C/C++ and DirectCompute. Parallel Nsight supports a variety of performance tuning tools for CUDA C/C++, OpenCL and DirectCompute to improve overall acceleration and maximize the use of available GPU power.

    Learn more about Parallel Nsight » Start working with Parallel Nsight »

Programming Environments

  • DirectCompute

    DirectCompute

    Microsoft’s DirectCompute is a rich set of APIs that enable general purpose computing on GPUs. Because it is fully integrated with Microsoft DirectX Graphics APIs, a key benefit of DirectCompute is its ability to seamlessly produce immediate graphical visualizations of computed data. Additionally, DirectCompute offers high performance implementations of common algorithms such as Fast Fourier Transform through the Compute Shader Extensions library included in the DirectX SDK. DirectCompute is natively supported in Visual Studio, and runs on both DirectX 10 and DirectX 11 class GPUs.

    Learn more about DirectCompute:

    DirectCompute Lecture Series » DirectX SDK » DirectCompute Hands-on Lab »
  • CUDA

    NVIDIA CUDA

    CUDA

    The CUDA™ architecture enables developers to leverage the massively parallel processing power of over 250 million NVIDIA GPUs, delivering the performance of NVIDIA’s world renowned graphics processor technology to general purpose GPU Computing.

    CUDA enables unprecedented performance using high level programming languages such as C/C++, FORTRAN, and the Microsoft .NET framework, as well as via standard APIs such as DirectCompute and OpenCL.

    With the CUDA architecture, tools and libraries developers are achieving dramatic speedups in fields such as medical imaging and natural resource exploration, creating breakthrough applications in areas such as image recognition.

    Learn more about CUDA »

    CUDA Libraries

    A broad set of special purpose math libraries have been ported to CUDA, some highlights include:

    NVIDIA provided:

    • CUDA BLAS library 1/2/3

    • CUDA FFT library 2D/3D

    • Sparse Matrix Multiply Vector from NVIDIA

    • NPP: NVIDIA Performance Primitives

    Third Party provided:

    • CUDA data parallel primitives library (cuDPP)

    • MAGMA: LAPACK on CUDA GPUs and Multi-core CPUs from Dongarra’s Group

    • CULA Tools: LAPACL on CUDA GPUs from EM Photonics

    • Jacobi-preconditioned of Conjugate Gradient

    • GPULib: Library of mathematical functions for IDL and MATLAB

    • GPU VSIPL signal processing library

    • Computer Vision and imaging library (openCV)

    • OpenCurrent: Open source library of CUDA-accelerated PDE (partial differential equation)

    Learn more about CUDA libraries and sample code »

    CUDA Education

    CUDA is being taught at over 350 universities throughout the world, enabling innovative solutions to some of the most complex computation-intensive challenges.

    Learn more about courses »
  • OpenCL

    OpenCL

    OpenCL

    OpenCL is an open standard for cross-platform, parallel programming of processors from a variety of vendors. OpenCL creates an efficient, close to the metal programming interface that supports a variety of platform independent tools, middleware and applications. OpenCL doesn’t have its own tool set, but is supported by NVIDIA’s Parallel Nsight and AMD’s APP SDK.

    Learn more about OpenCL »

Software Tools and Libraries

  • AMD Accelerated Parallel Processing

    AMD

    AMD Accelerated Parallel Processing

    AMD Accelerated Parallel Processing technology is a set of hardware and software features and tools that enable AMD GPUs and APUs to accelerate compute-intensive applications. The AMD Accelerated Parallel Processing Software Development Kit (SDK) is designed to enable developers to exploit these capabilities and deliver optimal performance on their applications. The SDK supports Microsoft’s Visual Studio 2010.

    The APP SDK preserves developer investments by supporting key industry standards such as OpenCL and Microsoft’s DirectCompute. Applications can be targeted for discrete GPUs, CPUs or APUs to maximize flexibility for current and future deployments.

    The newest release, APP SDK v2.3, supports AMD’s latest CPU, GPU and APU solutions, and is compliant with OpenCL v1.1. The SDK includes a profiler, stream kernel analyzer, BLAS and FFT libraries, and countless other advanced features that enable the development of high performance applications.

    Learn more about the Accelerated Parallel Processing SDK »
  • CAPS

    CASPUR

    CAPS

    Based on C and FORTRAN directives, HMPP offers a high level abstraction of hybrid programming that fully leverages the computing power of stream processors without the complexity associated with GPU programming. The HMPP compiler integrates powerful data-parallel backends for NVIDIA CUDA and AMD CAL/IL that drastically reduce development time. The HMPP runtime ensures application deployment on multi-GPU systems. Software assets are kept independent from both hardware platforms and commercial software. While preserving portability and hardware interoperability, HMPP increases application performance and development productivity.

    HMPP Workbench includes a C and FORTRAN compiler, code generators and a runtime that seamlessly integrate in your environment and make use of the CUDA / OpenCL development tools and drivers. Based on a set of OpenMP™-like directives that preserve legacy codes, HMPP fully leverages the performance offered by most of today’s stream processors and vector units. HMPP offers a standardized interface between your scientific algorithm and fast evolving target code by insulating hardware specific implementation of functions from your legacy code. Complementary to OpenMP and MPI, HMPP lets you develop parallel hybrid applications that mix the best of today’s available parallel tools.

    For more information: www.caps-entreprise.com/ »
  • CULAtools by EM Photonics

    CULATools EM Photonics

    CULAtools by EM Photonics

    CULA is a GPU-accelerated linear algebra library that optimizes NVIDIA’s massively parallel CUDA architecture to dramatically increase the computation speed of sophisticated mathematics. CULA allows developers for a wide range of computationally intense applications, including computational fluid dynamics, electronic design automation, image processing, and electromagnetic simulations, to take advantage of the performance boost of the GPU.

    CULA was developed, debugged, and tested with Microsoft Visual Studio and developer’s tools such as NVIDIA’s Parallel Nsight. CULA is available for Windows and other platforms in a variety of different interfaces to integrate directly into the user’s existing code. Programmers can easily call CULA from their C/C++, FORTRAN, MATLAB, or Python codes, with no CUDA programming experience required.

    For more information: www.culatools.com »
  • Numerical Algorithms Group (NAG)

    NAG

    Numerical Algorithms Group (NAG)

    For over four decades, NAG has been at the forefront of the development and supply of numerical algorithm libraries. The NAG Libraries contain over 1,600 mathematical and statistical routines and are embedded into thousands of applications all over the world in a wide range of markets from finance to pharmaceuticals. They are also used by many of the world’s most prestigious universities, research organizations and supercomputing centers.

    The NAG Libraries are highly flexible across many different programming environments, languages and packages. While developing the numerical libraries, NAG experts make use of different development desktops. For example, when writing code for NVIDIA's Tesla and Fermi architectures on Windows, the Microsoft Visual Studio development environment, together with NVIDIA CUDA, are used to create numerical code for GPGPU platforms. Specifically, Microsoft Visual Studio 2008 is used as the code editor and Parallel Nsight 1.5 as the CUDA Debugger when producing NAG software for NVIDIA hardware.

    For more information: www.nag.com »
  • The Portland Group (PGI)

    Accelerating Excel 2010 Performance

    The Portland Group (PGI)

    PGI® parallel Fortran 2003, C++ and ANSI C compilers and tools for Microsoft Windows operating systems harness the full power of today's high-performance parallel workstations, servers and clusters based on 64-bit multi-core x64 processors from AMD and Intel and CUDA-enabled GPGPUs from NVIDIA for science and engineering applications.

    PGI Workstation™ for Windows and PGI Visual Fortran® for Visual Studio both include PGI's two models for programming GPU accelerators. The PGI Accelerator™ programming model is a high-level implicit model similar to OpenMP for multi-core x64 systems. PGI Accelerator compilers enable the incremental offloading of compute-intensive loops and code regions from a host CPU to a GPU accelerator using simple compiler directives or pragmas. PGI Accelerator directives are treated as comments by other compilers, so programs incorporating them remain 100% standard- compliant and portable. Developed in cooperation with NVIDIA, the CUDA Fortran programming model is an analog to the NVIDIA CUDA C compiler. CUDA Fortran gives expert programmers direct control of all aspects of GPU accelerator programming.

    PGI Unified Binary™ technology, included with all PGI products for Windows, provides the ability to generate a single executable file with code sequences optimized for multiple AMD, Intel and NVIDIA processors. The PGI Unified Binary technology enables Independent Software Vendors (ISVs) and custom applications developers to leverage the latest processor innovations while treating x64 and x64+GPU as a single platform, maximizing flexibility and eliminating the need to target and optimize for separate processors.

    For more information: www.pgroup.com »
  • Rogue Wave

    Roguewave Software

    Rogue Wave

    The IMSL Fortran Numerical Library version 7.0 is the first product in the IMSL Family to support GPGPU computing. Windows x64 versions are fully supported using the Intel C++ compiler and Visual Studio. Supporting NVIDIA GPU hardware, the library links the CUBLAS software to provide hardware acceleration for a broad range of linear algebra functions. Most of the Level 3 BLAS and some Level 2 BLAS functions are supported, and their use is largely transparent to end users who can call their IMSL functions as they always have, but with the matrix math being handed off to the GPU hardware. For large matrices (8000 square) performance is over 16 times faster utilizing the GPU over using just the CPU. Even using a quad core CPU, the advantage is a performance increase over 4x.

    For more information: www.roguewave.com »
  • Wolfram Mathematica

    Wolfram Mathematica

    Wolfram Mathematica

    Mathematica is an environment for computing, developing, and deploying technology solutions. It combines a flexible and free-form programming language with a wide range of symbolic and numeric computational capabilities, production of high-quality visualizations, and a range of immediate deployment options.

    NVIDIA's CUDA architecture can be used from within Mathematica. This increases the performance for operations ranging from computing to modeling and simulation. Mathematica's intuitive CUDA GPU programming features, along with its built-in, ready-to-use examples for common application areas, such as image processing, medical imaging, statistics, and finance, make these performance gains accessible. By increasing the performance for core algorithms, users can boost the speed of their programs up to a factor of 100.

    With Microsoft Windows’ large user base, suite of development and debugging tools like Visual Studio 2010, management tools, and clustering technologies, CUDALink allows for a high degree of automation and control. And, by transparently scaling from desktop development using Microsoft Windows 7 to large clusters using Windows HPC Server 2008 R2 with Wolfram Lightweight Grid Manager, users can dramatically increase application performance and productivity across industry, research, and education.

    For more information: www.wolfram.com/mathematica »
  • Open Source Tools

    Open Source Tools

    CUDA.NET

    CUDA.NET is an open source project to provide access to CUDA functionality through the .NET framework.

    OPlib

    This open source library offers high performance options pricing that can be used to perform Monte Carlo simulations.

    QuantLib

    This open source library for quantitative finance is for developers to use in modeling, trading, and risk management.

Solutions from Software Partners

  • Acceleware

    Acceleware

    Acceleware

    Acceleware was founded in 2004 to provide solutions that enable GPUs for accelerating scientific computing applications. Initial products found great success in the electromagnetics market where 9 of the top 10 cell phone manufacturers use Acceleware to speed-up their antenna design simulations by 30x to 40x with GPUs. The time required to run these simulations dropped from 12 hours to approximately 15 minutes because the engineers were running Windows on workstations that had the power of a GPU supercomputer available locally. The Acceleware EM solution is in wide usage with manufacturers in the fields including medical imaging devices, automotive electronics, aerospace, and consumer electronics.

    In 2008, Acceleware expanded its solution offering into the oil and gas market with accelerated software libraries for powering seismic migrations with a GPU-accelerated Kirchoff Time Migration (KTM). The massive compute requirements of using thousands of cores for many months, characterized by the more advanced technique of Reverse Time Migration (RTM) was next. Acceleware released the industry’s first accelerated RTM that supports many advanced features made practical by incorporating GPU acceleration. The Acceleware RTM library is also available through OEM partners such as Paradigm Geophysical and Tsunami Development.

    Acceleware also offers a range of training and consulting services. These services include advanced multi-core programming training for CUDA, OpenCL and Windows HPC Server. Acceleware also specializes in consulting services for companies who need to port legacy codes to run on multi-core systems or to optimize existing code to gain additional speed.

    For more information: info@acceleware.com »
  • ANSYS

    ANSYS

    ANSYS

    Technology is the lifeblood of ANSYS, and for more than 40 years the main focus has been on the application of computational methods to solve engineering design challenges. Users can analyze designs directly on the desktop or on computing clusters, providing a common platform for fast, efficient cost conscious product development, from design concept to final-stage testing and validation.

    With power-efficient cores and increasingly fast access to memory, GPUs are well suited to accelerate many ANSYS simulations. ANSYS Mechanical can dramatically reduce overall engineering simulation processing time by exploiting the power of GPUs. Performance benchmarks demonstrate that using GPUs in conjunction with a quad core processor has shown that double precision computations can be performed in half the normal turnaround time on typical workloads. The key benefit is that customers can obtain enhanced insight into product behavior faster than ever before, and deliver innovative products that exceed market expectations.

    For more information: www.ansys.com »
  • Impetus AFEA

    Impetus Afea

    Impetus AFEA

    Impetus AFEA has developed the Solver which is based upon non-linear finite element simulation technology that is applicable to a wide range of industrial and manufacturing processes that require simulation of transient dynamic response of structures. The Solver is a robust engineering tool that will help to reduce development cost through accurate modeling of that design. It is truly a 3D simulation tool, the finite element library includes linear, quadratic and cubic order solid elements, SPH (discrete particle) elements for fluid structure interaction and a special particle method for blast simulations.

    The Solver has unique capabilities that allow the simulation to represent fully coupled loading from land mines that are buried in sand interacting with a realistic full-scale vehicle model. The Solver includes several sand models, explosive types and material models that will enable the analyst to accurately model the physics and lead to better designs, while reducing the number of full-scale field tests.

    The Impetus AFEA Solver is the first commercial finite element software to fully utilize GPU technology. For the analyst, this means that a low cost desktop computer equipped with a GPU can provide HPC performance previously found only with large, expensive clusters.

    www.impetus-afea.com »
  • Manifold

    Manifold

    Manifold

    Manifold® System was the first commercial product to ship with GPGPU acceleration for Microsoft Windows. Manifold combines visualization plus database plus analytics to deliver GIS, CAD, spatial analytics, remote sensing, image editing, web serving, graphics editing, data management, and customer development for individuals as well as the largest enterprises on a local desktop, or over the web. Manifold is fully 64-bit, uses localized national languages, and works as a standalone package on desktops or servers or a team player together with other products such as Microsoft SQL Server, IIS, and Excel, or with Visual Studio for custom development.

    Manifold utilizes third generation, heterogeneous parallel CPU and GPGPU technology for maximum GPU performance, often accelerating tasks in desktop Windows machines to run many hundreds of times faster than non-GPGPU accelerated software, and providing a decisive performance advantage in Windows server applications as well. To work with tens or even hundreds of gigabytes of data, Manifold parallelizes data access, automatically using as many CPU cores as are available to fetch data and to dispatch parallelized computations into potentially thousands of GPU cores. Parallelization is transparent, fully automatic, requires no customization and is built into Manifold “point and click” commands as well as tools like Manifold’s GPGPU-enabled SQL, which automatically runs with parallel GPGPU acceleration when GPUs are available and runs in multiple CPU cores if GPUs aren’t available.

    Manifold is so fast that when used together with Microsoft SQL server it transparently gives the speed and power of parallel CPU computation and GPGPU acceleration to SQL Server applications as if it were built in. Manifold’s transparently GPGPU-enabled SQL enables data base professionals to tap into the power of GPGPU acceleration using the same SQL they already know.

    For more information: www.manifold.net/index.shtml »
  • Quantifi

    Quanifi

    Quantifi

    Quantifi is a leading provider of analytics, trading and risk management software for the global capital markets. The suite of integrated pre- and post-trade solutions allows market participants to better value, trade, and risk manage their exposures and respond more effectively to changing market conditions. Founded in 2002, Quantifi has more than 120 top-tier clients including five of the six largest global banks, two of the three largest asset managers, leading hedge funds, insurance companies, pension funds and other financial institutions across 15 countries. Renowned for its client focus, depth of experience, and commitment to innovation, Quantifi is consistently first-to-market with intuitive, award-winning solutions.

    Quantifi solutions are based on the Microsoft Windows platform. The GPU technology implementations provide enhanced computational power at groundbreaking price/performance levels with gains of up to 100X versus Quantifi’s baseline performance metrics. Combined with integrated grid support, Quantifi solutions deliver stable, timely and accurate results for even the most complex OTC products. Quantifi is recognized for providing the most flexible and powerful pricing and risk management tools available today.

    For more information: www.quantifisolutions.com »
  • SciComp

    SciComp

    SciComp

    SciComp helps customers take advantage of GPU computing without having to become experts in the field by enhancing SciFinance®, its expert system for building derivatives pricing and risk models, to include practical rules for automatically generating GPU-enabled codes.

    SciFinance automatically generates C/C++/CUDA-enabled pricing and risk model source code from concise, high level, keyword-rich model specifications. A developer simply needs to add the keyword “CUDA” to a model specification and SciFinance automatically produces optimized GPU-enabled pricing model source code. No CUDA or parallel computing expertise is required and there is no hand-coding.

    Monte Carlo simulations enjoy some of the fastest speed-ups on GPUs with typical accelerations for SciFinance generated CUDA code of between 70X to 300X faster than serial. The SciFinance generated code is compiler ready (Microsoft Visual Studio or other compatible complier) with an identical pricing function argument list to the serial version and numerical results within the MC variance. Complex derivatives pricing code runtimes can be reduced from minutes to seconds. This speed up provides users the ability to generate many more scenarios faster and the opportunity to explore alternative or multiple risk assessment methodologies.

    For more information: www.scicomp.com/parallel_computing/GPU_OpenMP »

Solutions from OEMs

  • Dell

    Dell logo

    Dell

    Dell’s HPC portfolio encompasses a broad range of performance optimized technologies including GPU platforms designed to accelerate your time to results. From blazing fast Precision Workstations to uncompromised performance available in the modular PowerEdge Blade and PowerEdge C systems through to our highly innovative PCIe Expansion Chassis – Dell GPU solution combos are designed for rapid integration and up to 16 TFLOPS of delivered performance.

    The advantages of a Dell and Microsoft-based HPC solution stems from our longstanding collaborations, rigorous validation and ongoing commitment to providing complete solutions for every scale. Desktop, workgroup and large-scale users can derive immediate results from our breadth of solutions with confidence that the entire solution has been integrated, tested and validated and is production ready, right out of the box. From performance gains delivered with our GPU systems to the rich new features designed into Microsoft Windows HPC Server 2008 R2 platform, together we are delivering complete and simplified infrastructures.

    For our users, HPC and technical workloads have never been this easy to execute and the available compute has never been as abundant! With the advances designed into Microsoft Windows HPC Server 2008 R2 combined with Dell’s GPU-enabled technologies and our extended collaborations across platforms and into the cloud – together we are able to provide instant accessibility and improved collaborations. From individual developers through to hyperscale platforms we are committed to delivering the entire ecosystem of researchers a much broader usage model than ever before and invite you to spend some time with us and learn more about how Dell can help enable your discovery!

    For more information, please visit: www.dell.com/hpc »
  • HP

    HP logo

    HP

    HP was one of the first partners to support Microsoft in High Performance Computing worldwide. Our Front Line Partnership, technical, sales and marketing teams have worked together to ensure that companies can take full advantage of the Microsoft HPC operating system environment and HP’s HPC technology leadership. Microsoft’s HPC Server 2008 R2 is available worldwide on HP Cluster platforms. Together, the two companies help customers achieve faster time to insight, analysis and solution.

    At the Tokyo Institute of Technology, a collaboration by HP and Microsoft demonstrated a Windows HPC Server R2 running in a GPU-enabled environment (HP ProLiant SL390s) with performance that exceeded 1 PetaFLOP.

    As a leader in GPU computing, HP shipped its first GPU-enabled system to customers more than three years ago and was:

    • First to design a standard server as a host for NVIDIA GPUs

    • First to integrate GPUs with cluster technology

    • First to deliver 1 TFLOP LINPACK in a 1U chassis

    HP is currently in its second generation of GPU-enabled systems and ships over 13 different GPU-enabled servers in multiple configurations. The most recent GPU-enabled system, the HP ProLiant SL390, provides excellent density: in both the 2U ½ width and 4U ½ width servers. The 2U system supports 3 GPUs and 1 server in half width 2U high for a total of 4 servers and 12 GPUs in 4U. Just one of these GPU-enabled servers delivers 1 TFLOP LINPACK in 1U. The SL390 4U ½ width supports 2 servers and 16 GPUs in 4U.

    For more information, please visit: www.hp.com/go/accelerators/ »
  • IBM

    IBM logo

    IBM

    IBM offers expansion options for existing systems built around NVIDIA’s next generation GPU Computing Modules. GPUs are now available for intensive, highly parallel floating point tasks. These tasks – similar to pure graphics processing – can take full advantage of the compute architecture of the GPU. And because these options can be added to your existing systems, you can protect your current investment and lower your TCO while getting the additional performance you need for your demanding applications.

    For more information, please visit: http://www-03.ibm.com/systems/info/x86servers/optimized/hpc/index.html »

Research Initiatives

  • CASPUR Advanced Research Center, Rome, Italy

    CASPUR

    CASPUR Advanced Research Center, Rome, Italy

    At present, we deploy to the end-users about a dozen of C/M1060 GPUs and a full dedicated cluster of 64 Tesla M2050 (16x S2050) all integrated via InfiniBand QDR interconnect and Microsoft HPC2008S operating systems. The Fermi cluster named “Jazz” has 32 dual socket X5650 front-end servers and being the first shipped in Italy (and among the few assembled in Europe) has reached the remarkable performance of 23 TFLOPS and an outstanding 886 MFLOPS/Watt ratio. For this performance the Jazz cluster is ranked #5 (1st in Europe) in the Little Green 500 list of November, 2010 among the most “sustainable” supercomputers. For its contributions to GPU computing, CASPUR has been nominated a CUDA Research Center by NVIDIA for the years 2010-2011.

    Little Green 500 website: http://www.green500.org/lists/2010/11/little/list.php » NVIDIA CUDA Research Center: http://research.nvidia.com/content/cuda-research-centers »

    Research on New Technologies

    Together with Microsoft at CASPUR, we started a many-core environment on HPC Server 2008 S, CASPUR@XLrate dedicated to biomed applications, the first of its kind in Europe to offer GPU technologies for HPC under the Windows operating systems. Microsoft and CASPUR have also carried out a comprehensive benchmarking activity with excellent results presented in various international conferences in the last two years or so. At present, among other things, CASPUR is a beta-tester of the Parallel Nsight tool in contact with the US team of developers at NVIDIA. Dynamic scheduling of computational kernels across host CPUs and GPUs, based on performance models of the kernels and data transfers and dynamic location of data in host and GPU memories is under investigation in a master thesis work hosted at CASPUR.

    CASPUR@XLrate project website: www.caspur.it/xlrate

    ISC websites: http://www.supercomp.de/isc09/ and http://www.supercomp.de/isc10/

    Research on Community Codes

    Since 2009, CASPUR has given end users a pre-release of many GPU-enabled codes on Quadro e Tesla systems. With the opening of the “Jazz” cluster a comprehensive set of packages will be available in production to all users, by integrating the Fermi systems with the existing multi-core architectures and queuing system either under HPC2008S. Here is a preliminary list of those codes: AbInit; AMBER; AutoDock; BEAST/beagle; BigDFT; CP2K; CUDABLAST; GROMACS; LAMMPS; Kooderive-QuantLib; MATLAB@GPU; NAMD; SCELib; SPECFEM3D. Preliminary results obtained with the Multi-GPU versions of AMBER and NAMD on the “Jazz” cluster attracted the attention of several researchers working at CASPUR in the chemical, biological, medical and materials science fields, among these Prof. V. Barone of the Scuola Normale of Pisa. CASPUR is providing support for preparing and running case studies that extend of more than an order of magnitude the size of the system models and the simulated time length, thus paving the way to address new scientific questions.

    Recent results given at a Microsoft meeting in Italy: http://blogs.msdn.com/b/msr_er/archive/2010/09/17/celebrating-italian-faculty-days.aspx

    Research and Development

    CASPUR is the place where computational sciences found their natural application with a multifaceted numerical production belonging to almost all scientific areas. Among these, it is worth noting the COSMO group, in an EU collaborative project they will convert by using GPU technology the most important European codes for weather forecasting, to improve its capabilities by one order of magnitude or more. Exemplar application of CFD techniques to biological flows aims to improve aortic mechanical heart valves and related implant surgery techniques. In computational biochemistry, we are facing the problem of simulating by millisecond molecular dynamics an anticancer drug against its binary receptor, Topoisomerase I protein and the DNA. The DNA itself is investigated at molecular level to understand what happens after its irradiation by energy gamma rays. The knowledge of mechanism responsible for DNA breaks by low energy electrons, and the subsequent cell damage or death, will then be used to fine-tune modern cancer radiotherapy apparatus. In this last activity the GPU-enabled applications (SCELib and VOLSCAT) under the HPC 2008S R2 have reached their top performance. SCELib3 in fact, a CUDAZone application with 175x the performance of a single core system under Tesla and Linux, has reached the outstanding results of 224x under HPC2008S R2 in single precision. We are currently porting the double precision part of the SCELib code under the Fermi GPU and HPC2008S where we expect to double those performance thus opening the route for a new high-throughput and data-intensive application for cancer research.

    SCELib3 and VOLSCAT codes have been published on CPC 40th anniversary special issue (http://dx.doi.org/10.1016/j.cpc.2009.07.009 and http://dx.doi.org/10.1016/j.cpc.2009.07.013)

  • Data-Intensive Computing at Johns Hopkins

    IDIES - John Hopkins University

    The schematics for the SQL to CUDA integration

    The schematics for the SQL to CUDA integration

    An image of the 2D galaxy correlation based upon computing 600 trillion(!) galaxy pairs

    An image of the 2D galaxy correlation based upon computing 600 trillion(!) galaxy pairs

    Data-Intensive Computing at Johns Hopkins

    At the Institute for Data Intensive Engineering and Science (IDIES) at Johns Hopkins we have been working closely with Microsoft to move data intensive scientific computing to be performed inside the database. Using SQL Server 2008, and its CLR integration features, we have built sophisticated web services that perform image processing, computational calculations and various statistical functions inside the database, callable as User Defined Functions (UDFs) in T-SQL. These techniques have been applied to several scientific domains, in particular astronomy, turbulence simulations, radiation oncology, wireless sensor networks for environmental monitoring and genomics.

    SQL-CUDA Integration

    In order to use the new capabilities offered by the emerging GPGPU architectures, we are experimenting with using this functionality in a cluster of database servers to develop a set of software tools for use in the analysis of large data sets, in particular astronomy and genomics.

    The implementation uses an out-of-process server communicating to both SQL Server and the CUDA-based GPU resource through a traditional IPC mechanism. The server runs as a Windows Service, and maintains control of the threads. It loads a DDL containing the user code for the CUDA part of the function. The SQLCLR procedure needs to first execute a SQL statement on the server that fetches the result set into the shared memory, then it needs to call the DLL ‘hooks’ in the server to execute the CUDA part of the function, with a pointer to the data in the shared memory. The result is returned to the shared memory, and the return values are transferred to SQL from there.

    One of our efforts has been to modify the well-known Smith-Waterman-Gotoh dynamic-programming algorithm for sequence alignment to adapt it for more efficient CUDA implementation[1]. The modification increases the data parallelism of the dynamic-programming computation so as to fully exploit GPU thread parallelism.

    In another effort we have used this framework to compute the galaxy-galaxy correlation function for the Sloan Digital Sky Survey data on very large scales. In a few days of analysis we have computed 600 trillion galaxy pairs over the real and Monte-Carlo catalogs of galaxies, providing us with a result with an unprecedented resolution[2].

    Data-Scope

    We are in the process of building the Data-Scope, a new instrument for data intensive research, a large (5 PB) cluster of servers, containing both extreme I/O performance (450GBytes/sec) and a large number of GPU cards in the server backplanes, providing extremely high throughput coupling between the low level I/O system, and the GPUs. Each server also has a large number of Solid State Disks (SSDs) to improve performance for random access workloads.

    [1] Wilton, R., Szalay, A.S. 2010, “Modified Smith-Waterman-Gotoh Algorithm for CUDA Implementation”, NVIDIA GTC Conference, San Jose, Sep 2010.

    [2] Tian, H., Neyrinck, M., Budavari, T., Szalay, A.S., 2010, “Evidence for Baryon Acoustic Oscillations through Angular Tomography of the SDSS”, Astrophys. J., submitted.

  • Using Computer Simulation to Understand the Dynamics of Complex Systems at UW-Madison

    University of Wisconsin-Madison

    Computers simulation of the Mars Rover operating on coarse granular terrain. Image of the largest dynamics simulation carried to date, in which a light sphere “floats” on more than 1.1 million heavy spheres.

    Top: Computers simulation of the Mars Rover operating on coarse granular terrain. Below: Image of the largest dynamics simulation carried to date, in which a light sphere “floats” on more than 1.1 million heavy spheres.

    Using Computer Simulation to Understand the Dynamics of Complex Systems at University of Wisconsin-Madison

    How did the Mars Rover get stuck in a patch of sand? What strategy should be employed to attempt to regain its mobility? How should a Renaissance-era building in Italy, made up of very loosely coupled bricks, be reinforced to withstand an earth quake? How does an avalanche propagate? How do granular materials mix?

    Professor Dan Negrut at the University of Wisconsin-Madison together with collaborators Mihai Anitescu at Argonne National Lab and Professor Alessandro Tasora at University of Parma, Italy, are developing a computational framework that will address these challenging dynamics problems. By leveraging high performance computers and applying novel algorithms they can predict the dynamics (motion) of more than one million mutually interacting elements. The dynamics of such a system leads to optimization problems with more than 15 million unknowns.

    These intensive computational tasks, in which the dynamics of every component in the system is found by solving a set of differential equations of motion, motivated the assembly of a 5,760 scalar processor GPU-cluster in the Simulation-Based Engineering Lab led by Negrut at the University of Wisconsin (UW)-Madison. The GPU-cluster, assembled entirely by students participating in the project is the main hardware asset of the Modeling, Simulation and Visualization Center at UW-Madison. The cluster will be used in a new High Performance Computing course and by several research groups for computer modeling and simulation in Biochemistry, Chemical Engineering, Chemistry, and Medical Physics. The GPU-cluster is managed under Windows HPC2008 R2 and relies on CUDA software support for parallel execution on the GPU.

    Negrut’s research project draws on new applied mathematics methods and recent breakthroughs in multi-processor technologies (software and hardware) to provide a theoretical and computational foundation that will enable a paradigm shift in mechanical system simulation. The two research thrusts of this project: (i) development of new applied math techniques, and (ii) leveraging of high performance parallel computers for physics-based simulation, will allow for the analysis through simulation of more complex systems in a shorter amount of time. In economic terms, this will benefit engineers and scientists by enhancing their productivity and opening up new classes of practical applications for investigation through computer simulation.

    The research project is sponsored by US National Science Foundation, Argonne National Lab, the Italian Ministry of Education and Research, Microsoft, NVIDIA, US Army TARDEC, and FunctionBay of South Korea.

    For more information: http://sbel.wisc.edu »

Hanweck Associates Risk Management video

Hanweck Associates Risk Management

Technical computing tools for financial services

View video »
Free Trial

180-Day Free Trial

Install and use Microsoft Windows HPC Server in your existing environment

Download the trial software »
GPGPU White Paper

GPGPU Computing Horizons: Developing and Deploying for Microsoft Windows

The best development tools for GPU programming are on the Windows platform, and now that excellence has been extended to development for applications that use GPUs for computation.

Read the white paper »
Parallel Computing

Parallel Computing

View videos, download labs, take advantage of the many resources on MSDN Parallel Computing

View the resources »