Project Catapult is the technology behind Microsoft’s hyperscale acceleration fabric, and is at the center of a comprehensive set of investments Microsoft is making to build a supercomputing substrate that can accelerate our efforts in networking, security, cloud services and artificial intelligence. Our work in this area started in 2010 in response to:
- Stresses in the silicon ecosystem driven by diminishing rates of CPU improvements
- Growing compute demands of AI applications and services.
Anticipating a paradigm shift to post-CPU technologies in the cloud, a small team was formed to investigate what could be done. Led by Derek Chiou and Doug Burger, the team began to evaluate alternative architectural designs and specialized hardware such as graphics processing units (GPUs), field-programmable gate arrays (FPGAs) and custom application-specific integration circuits (ASICs).
Catapult FPGA Accelerator
Altera Stratix V D5 FPGA with capacity 172k ALMs
x16 PCIe connection (visible along the bottom edge of the card)
4GB of DDR3 memory
2 QSFP connectors
Today’s Project Catapult combines an FPGA integrated into nearly every new Microsoft datacenter server with a unique distributed architecture. The distributed architecture deploys FPGAs as an addition to each datacenter server, rather than a bolt-on isolated cluster, to create an “acceleration fabric” throughout the datacenter. This elastic reconfigurable acceleration fabric provides the flexibility to harness an individual FPGA or up to thousands of FPGAs for a single service.
By exploiting the reconfigurable nature of FPGAs, at the server, the Catapult architecture delivers the efficiency and performance of custom hardware without the cost, complexity and risk of deploying fully customized ASICs into the datacenter. In doing so, we’ve achieved an order of magnitude performance gain relative to CPUs with less than 30 percent cost increase, and no more than 10 percent power increase. The net results deliver substantial savings and an industry-leading 40 gigaops/W energy efficiency for deployed at-scale accelerators.
Project Catapult FPGAs talk to each other and in fact, can become pieces of a larger “brain” of an interconnected machine learning network of servers. In essence, with all these super-fast, networked custom FPGA accelerators, we are building the world’s first hyperscale AI supercomputer. One that can compute machine learning and deep learning algorithms at a combination of speed, efficiency and scale unmatched within the industry.
Today nearly every new server in Microsoft datacenters is equipped with a powerful Catapult FPGA accelerator board.
- 2010: Microsoft researchers meet with Bing executives to propose using FPGAs to accelerate Indexserve.
- 2011: A team of Microsoft software engineers and researchers come together to address a huge processing problem: how to use customized, programmable integrated circuits to accelerate computationally expensive operations in Bing’s Indexserve engine.
- 2012: Large scale pilot of FPGA boards in each of 1,632 servers and wiring them with a custom secondary network.
- 2013: Results of pilot demonstrated positive ROI, allowed latency improvements in ranking while cutting the number of required servers in half. Decision was made to go to production.
- 2014: Publication of paper and decision to merge Bing design with Microsoft’s converged SKU, adding to the v2 architecture that enables configurable clouds.
- 2015: Ramp up to large-scale production in Bing and Azure.
- 2016: “Configurable Cloud” architecture in nearly every new production server. Configurable Cloud paper published (Micro 2016, October)