The MIMD (Multiple Instruction, Multiple Data) execution model is more flexible than SIMD (Single Instruction, Multiple Data), but SIMD hardware is more scalable. GPU (Graphics Processing Unit) hardware uses a SIMD model with various additional constraints that make it even cheaper and more efficient, but harder to program. Is there a way to get the power and ease of use of MIMD programming models while targeting GPU hardware?
This talk discusses a compiler, assembler, and interpreter system that allows a GPU to implement a richly-featured MIMD execution model that supports message-passing and shared-memory communication, recursion, etc. Through a variety of careful design choices and optimizations, performance per unit circuit complexity executing MIMD code on both NVIDIA and AMD/ATI GPUs can be much higher than for native MIMD hardware. The discussion covers both the methods used and their motivation in terms of the relevant aspects of GPU architecture.