Project Brainwave is a deep learning platform for real-time AI serving in the cloud. A soft Neural Processing Unit (NPU), based upon a high-performance field-programmable gate array (FPGA), it accelerates deep neural network (DNN) inferencing. Project Brainwave is an early outcome of Project Catapult, a Microsoft Research (MSR) enterprise-level initiative that is transforming cloud computing by augmenting CPUs with an interconnected and configurable compute layer composed of programmable silicon.
Project Brainwave achieves more than an order of magnitude improvement in latency and throughput over state-of-the-art graphics processing units (GPUs) on large recurrent neural networks (RNNs), with no batching. Project Brainwave’s unique ability to deliver real-time AI and ultra-low latency without batching reduces software overhead and complexity, making Project Brainwave a solution for use cases requiring real-time response to individual requests.
Serves state-of-the-art, pre-trained DNN models
Project Brainwave serves state-of-the-art, pre-trained DNN models with high efficiencies at low batch sizes. A high-performance, precision-adaptable FPGA soft processor is at the heart of the system, achieving up to 39.5 teraflops of effective performance. The use of an FPGA means that it is flexible for continuous innovations and improvements, making the infrastructure future-proof.
Project Brainwave exploits FPGAs on a datacenter-scale compute fabric, so a single DNN model can be deployed as a scalable hardware microservice that leverages multiple FPGAs to create web-scale services that process massive amounts of data in real time.
Project Brainwave attains its breakthrough performance by using a single-threaded single instruction, multiple data (SIMD) instruction set architecture paired with a distributed microarchitecture capable of dispatching over 7 million operations from a single instruction.
Trifecta of high performance
To meet the growing computational demands required of deep learning, cloud operators are turning toward specialized hardware for improved efficiency and performance, particularly in deep learning applications that ingest live data streams. Project Brainwave offers the trifecta of high-performance computing: low latency, high throughput, and high efficiency, all while also offering the flexibility of field-programmability.
Because it is based on an FPGA, it can evolve rapidly and be remapped to the FPGA after each improvement, keeping pace with new discoveries and staying current with the requirements of rapidly changing AI algorithms.
Put it in action
Project Brainwave is currently being deployed in Bing’s intelligent search and early applications are available to the public through Azure Machine Learning Hardware Accelerated Models. ResNet 50 is the first model to be made available in the Azure Machine Learning model gallery. There are other accelerated models in development for text, speech, vision, and more.
Every day, thousands of gadgets and widgets whish down assembly lines run by the manufacturing solutions provider Jabil, on their way into the hands of customers. Along the way, an automated optical inspection system scans them for any signs of defects, with a bias toward ensuring that all potential anomalies are detected. It then sends those parts off to be checked manually. The speed of operations leaves manual inspectors with just seconds to decide if the product is really defective, or not.
Today at Hot Chips 2017, our cross-Microsoft team unveiled a new deep learning acceleration platform, codenamed Project Brainwave. I’m delighted to share more details in this post, since Project Brainwave achieves a major leap forward in both performance and flexibility for cloud-based serving of deep learning models. We designed the system for real-time AI, which means the system processes requests as fast as it receives them, with ultra-low latency.
Microsoft Research Blog | August 22, 2017
First hardware accelerated model powered by Project Brainwave