CoddSpeed: Hardware Accelerated Query Processing in Microsoft Fabric
- Matteo Interlandi ,
- Nicolas Bruno ,
- Brandon Haynes ,
- Carlo Curino ,
- Rathijit Sen ,
- Yinan Li ,
- Kaushik Rajan ,
- Bailu Ding ,
- Lukas M Maas ,
- Wei Cui
SIGMOD 2026 Industrial Track |
Organized by SIGMOD
Best Paper
Download BibTexOver the past three decades, Microsoft has developed several multi-billion-dollar data management products. We have now unified them into Microsoft Fabric, a comprehensive SaaS suite for data management based on a modern cloud-native design that delivers best-in-class analytics performance on CPUs. However, Microsoft’s AI investments have transformed our data centers. Hardware accelerators are now plentiful and surpass traditional CPU servers by orders of magnitude in compute, memory, and networking capabilities. Running analytics on hardware accelerators is therefore a business imperative. In this paper, we describe a multi-year effort to take Fabric to the next level by enabling analytics execution on state-of-the-art hardware accelerators. Given the rapid evolution of hardware, we have made hardware independence our guiding principle. We validate our architectural flexibility by demonstrating how we can run Fabric engines (e.g., Data Warehouse) on a variety of compute and network accelerators (e.g., GPUs, FPGAs, ASICs, NVLink, InfiniBand). We present our most mature implementation: a GPU-based execution engine derived from our Tensor Query Processor (TQP) and report key results from a novel data movement system leveraging NVLink and InfiniBand. This version of Fabric outperforms its CPU counterpart by over an order of magnitude across a host of production and benchmark scenarios (e.g., delivering up to 30x on TPC-H 1TB). We share the lessons learned in building one of the first systems in the era of accelerated analytics.