Project Catapult is the code name for a Microsoft Research (MSR) enterprise-level initiative that is transforming cloud computing by augmenting CPUs with an interconnected and configurable compute layer composed of programmable silicon.
Project Brainwave leverages Project Catapult to enable real-time AI
We are living in an era where information grows exponentially and creates the need for massive computing power to process that information. At the same time, advances in silicon fabrication technology are approaching theoretical limits, and Moore’s Law has run its course. Chip performance improvements no longer keep pace with the needs of cutting-edge, computationally expensive workloads like software-defined networking (SDN) and artificial intelligence (AI). To create a faster, more intelligent cloud that keeps up with growing appetites for computing power, datacenters need to add other processors distinctly suited for critical workloads.
FPGAs offer a unique combination of speed and flexibility
Since the earliest days of cloud computing, we have answered the need for more computing power by innovating with special processors that give CPUs a boost. Project Catapult began in 2010 when a small team, led by Doug Burger and Derek Chiou, anticipated the paradigm shift to post-CPU technologies. We began exploring alternative architectures and specialized hardware such as graphics processing units (GPUs), field-programmable gate arrays (FPGAs), and custom application-specific integrated circuits (ASICs). We soon realized that FPGAs offer a unique combination of speed, programmability, and flexibility ideal for delivering cutting-edge performance and keeping pace with rapid innovation. Though FPGAs have been in use for decades, Microsoft Research (MSR) pioneered their use in cloud computing. MSR proved that FPGAs could deliver efficiency and performance without the cost, complexity, and risk of developing custom ASICs.
FPGA can perform line-rate computation
Project Catapult’s innovative board-level architecture is highly flexible. The FPGA can act as a local compute accelerator, an inline processor, or a remote accelerator for distributed computing. In this design, the FPGA sits between the datacenter’s top-of-rack (ToR) network switches and the server’s network interface chip (NIC). As a result, all network traffic is routed through the FPGA, which can perform line-rate computation on even high-bandwidth network flows.
The first hyperscale supercomputer
Today, nearly every new server in Microsoft datacenters integrates an FPGA into a unique distributed architecture, which creates an interconnected and configurable compute layer that extends the CPU compute layer. Using this acceleration fabric, we can deploy distributed hardware microservices (HWMS) with the flexibility to harness a scalable number of FPGAs—from one to thousands. Conversely, cloud-scale applications can leverage a scalable number of these microservices, with no knowledge of the underlying hardware. By coupling this approach with nearly a million Intel FPGAs deployed in our datacenters, we have built the world’s first hyperscale supercomputer, which can compute machine learning and deep learning algorithms with an unmatched combination of speed, efficiency, and scale.
Leading datacenter transformation by using programmable hardware
Through Project Catapult, Microsoft is leading the industry’s datacenter transformation by using programmable hardware. We were the first to prove the value of FPGAs for cloud computing, first to deploy them at cloud scale, and, with Bing, first to use them to accelerate enterprise-level applications.
Project Brainwave to enable real-time AI
Our leadership in accelerated networking has delivered the world’s fastest cloud network. Today, Project Brainwave is leveraging Project Catapult to enable real-time AI, with blazing fast inferencing performance at a remarkably affordable cost. A growing team of MSR researchers and engineers, in very close partnership with engineering groups such as Bing, Azure Machine Learning, Azure Networking, Azure Cloud Server Infrastructure (CSI), and Azure Storage, continue to push the boundaries of accelerated cloud computing.
Project Catapult’s waves of innovation will continue.
2010: MSR demonstrated the first proof of concept to Bing leadership, with a proposal to use FPGAs at scale to accelerate Web search.
2011: MSR researchers and Bing engineers developed the first prototype; identifying and accelerating computationally expensive operations in Bing’s IndexServe engine.
2012: Project Catapult’s scale pilot of 1,632 FPGA-enabled servers was deployed to a datacenter, by using an early architecture with a custom secondary network.
2013: Results of the pilot demonstrated a dramatic improvement in search latency, running Bing decision-tree algorithms 40 times faster than CPUs alone, and proved the potential to speed up search even while reducing the number of servers. Bing leadership committed to putting Project Catapult in production.
2014: The Catapult v2 architecture introduced the breakthrough of placing FPGAs as a “bump in the wire” on the network path. Work began on accelerating software-designed networking for Azure. Project Catapult’s seminal paper was published.
2015: FPGA-enabled servers were deployed at scale in Bing and Azure datacenters, and Bing first used FPGAs in production to accelerate search ranking. This enabled a 50 percent increase in throughput, or a 25 percent reduction in latency.
2016: Azure launched Accelerated Networking, using FPGAs to enable the world’s fastest cloud network.
FPGAs became a default part of most Azure and Bing server SKUs. MSR began Project Brainwave, focused on accelerating AI and deep learning.
2017: MSR and Bing launched hardware microservices, enabling one web-scale service to leverage multiple FPGA-accelerated applications distributed across a datacenter. Bing deployed the first FPGA-accelerated Deep Neural Network (DNN). MSR demonstrated that FPGAs can enable real-time AI, beating GPUs in ultra-low latency, even without batching inference requests.
2018: Bing and Azure deployed new multi-FPGA appliances into datacenters, shifting the ratio of computing power between CPUs and FPGAs, with multiple Intel Arria 10 FPGAs in each server. MSR, Bing, and Azure Machine Learning partnered to bring Project Brainwave to production for both Microsoft engineering groups and third-party customers. Azure Machine Learning launched the preview of Hardware Accelerated Models, powered by Project Brainwave, delivering ultra-fast DNN performance with ResNet-50, at remarkably low cost—only 21 cents per million images during preview.
This is still the beginning. Project Brainwave is gaining traction across the company, with accelerated models in development for text, speech, vision, and other areas. The company-wide Project Catapult virtual team continues to innovate in deep learning, networking, storage, and other areas.
Some of the world’s leading architects are people that you’ve probably never heard of, and they’ve designed and built some of the world’s most amazing structures that you’ve probably never seen. Or at least you don’t think you have. One of these architects is Dr. Doug Burger, Distinguished Engineer at Microsoft Research NExT. And, if you use a computer, or store anything in the Cloud, you’re a beneficiary of the beautiful architecture that he, and people like him, work on every day.
Every day, thousands of gadgets and widgets whish down assembly lines run by the manufacturing solutions provider Jabil, on their way into the hands of customers. Along the way, an automated optical inspection system scans them for any signs of defects, with a bias toward ensuring that all potential anomalies are detected. It then sends those parts off to be checked manually. The speed of operations leaves manual inspectors with just seconds to decide if the product is really defective, or not.
In December, we announced new intelligent search features which tap into advances in AI to provide people with more comprehensive answers, faster. Today, we’re excited to announce improvements to our current features, and new scenarios that get you to your answer faster. Since December we’ve received a lot of great feedback on our experiences; based on that, we’ve expanded many of our answers to the UK, improved our quality and coverage of existing answers, and added new scenarios.
We are happy to announce that Accelerated Networking (AN) is generally available (GA) and widely available for Windows and the latest distributions of Linux providing up to 30Gbps in networking throughput, free of charge! AN provides consistent ultra-low network latency via Azure’s in-house programmable hardware and technologies such as SR-IOV.
Today we announced new Intelligent Search features for Bing, powered by AI, to give you answers faster, give you more comprehensive and complete information, and enable you to interact more naturally with your search engine. Intelligent answers leverage the latest state of the art machine reading comprehension, backed by Project Brainwave running on Intel’s FPGAs, to read and analyze billions of documents to understand the web and help you more quickly and confidently get the answers you need.
Today at Hot Chips 2017, our cross-Microsoft team unveiled a new deep learning acceleration platform, codenamed Project Brainwave. I’m delighted to share more details in this post, since Project Brainwave achieves a major leap forward in both performance and flexibility for cloud-based serving of deep learning models. We designed the system for real-time AI, which means the system processes requests as fast as it receives them, with ultra-low latency.
When we type in a search query, access our email via the cloud or stream a viral video, chances are we don’t spend any time thinking about the technological plumbing that is behind that instant gratification. Sitaram Lanka and Derek Chiou are two exceptions. They are engineers who spend their days thinking about ever-better and faster ways to get you all that information with the tap of a finger, as you’ve come to expect.
At this year’s Supercomputing 2015 Conference in Austin, Texas, Microsoft is announcing the availability of Project Catapult clusters to academic researchers through the Texas Advanced Computing Center (TACC) at The University of Texas at Austin. Project Catapult, a Microsoft research venture, offers a groundbreaking way to vastly improve the performance and energy efficiency of datacenter workloads.
I’m excited to highlight a breakthrough in high-performance machine learning from Microsoft researchers. Before describing our results, some background may be helpful. The high-level architecture of datacenter servers has been generally stable for many years, based on some combination of CPUs, DRAM, Ethernet, and disks (with solid-state drives a more recent addition). While the capacities and speeds of the components—and the datacenter scale—have grown, the basic server architecture has evolved slowly.
Operating a datacenter at web scale requires managing many conflicting requirements. The ability to deliver computation at a high level and speed is a given, but because of the demands such a facility must meet, a datacenter also needs flexibility. Additionally, it must be efficient in its use of power, keeping costs as low as possible. Addressing often conflicting goals is a challenge, leading datacenter providers to seek constant performance and efficiency improvements and to evaluate the merits of general-purpose versus task-tuned alternatives—particularly in an era in which Moore’s Law is nearing an end, as some suggest.
Microsoft Research Blog | June 16, 2014
First hardware accelerated model powered by Project Brainwave