Correctness Checking Concepts and Tools for HPC: Call for Action

  • Ganesh Gopalakrishnan | University of Utah

Today’s high performance computing story is one where problems of ever-increasing scale in science and engineering are required to be solved under strict power budgets. This necessitates the use of heterogeneous computing elements (e.g., CPUs and GPUs) and also causes significant shifts in the use of established programming APIs (e.g., MPI mixed with Open MP and CUDA). In addition to detecting defects such as data races and deadlocks in this context, a designer increasingly worries about emerging issues such as resilience, floating-point precision, and even the ability to replay executions. My talk will first give a broad overview of our efforts directed at these problems. It will then focus on our tool GKLEE that helps locate data races in non-trivial CUDA kernels. I will close with two topics: (1) how the same kinds of concurrency errors pertaining to memory orderings are being repeated, and (2) the hope that by emphasizing correctness checking (in addition to the usual fixation on performance tuning) in basic concurrency courses, we might minimize these frequently committed mistakes.

Speaker Details

Ganesh L. Gopalakrishnan has a PhD Computer Science from Stony Brook University in 1986, joining Utah the same year. He was Visiting Assistant Professor at the University of Calgary (1988), and conducted sabbatical work at Stanford University (1995), Intel, Santa Clara(2002), and Utah (2009, developing a Parallel and Concurrent Programming Curriculum Development with Microsoft Research). He is Director of the Center for Parallel Computing at Utah. He was awarded one of the six “Beacons of Excellence” Awards for 2012 by the University of Utah. His currently active projects are: Verification Methods and Tool Frameworks for Parallel and Concurrent Systems; Formal Techniques to Enhance System Resilience; Integrated Static and Dynamic Methods for Concurrent Systems; and Dynamic Analysis of Computational Frameworks of Very Large Scale. He has research grants and contracts from NSF, SRC, and DOE, has published over 160 refereed papers, and graduated 16 PhD students.

    • Portrait of Jeff Running

      Jeff Running

Series: Microsoft Research Talks