Project Brainwave

Project Brainwave

Publications

Videos

Overview

Project Brainwave is a deep learning platform for real-time AI inference in the cloud and on the edge. A soft Neural Processing Unit (NPU), based on a high-performance field-programmable gate array (FPGA), accelerates deep neural network (DNN) inferencing, with applications in computer vision and natural language processing. Project Brainwave is transforming computing by augmenting CPUs with an interconnected and configurable compute layer composed of programmable silicon.

For example, this FPGA configuration achieved more than an order of magnitude improvement in latency and throughput on RNNs for Bing, with no batching. By delivering real-time AI and ultra-low latency without batching required, software overhead and complexity are reduced.

Learn more about Project Brainwave on:

At Build 2019, Microsoft EVP Scott Guthrie talked about how Project Brainwave DNN inferencing can be used to keep supermarket shelves fully stocked:

Serves state-of-the-art, pre-trained DNN models

With a high-performance, precision-adaptable FPGA soft processor, Microsoft datacenters can serve pre-trained DNN models with high efficiencies at low batch sizes. The use of an FPGA means that it is flexible for continuous innovations and improvements, making the infrastructure future-proof.

Exploiting FPGAs on a datacenter-scale compute fabric, a single DNN model can be deployed as a scalable hardware microservice that leverages multiple FPGAs to create web-scale services. This can process massive amounts of data in real time.

Trifecta of high performance

To meet the growing computational demands of deep learning, cloud operators are turning toward specialized hardware for improved efficiency and performance, particularly for live data streams. Project Brainwave offers the trifecta of high-performance computing: low latency, high throughput, and high efficiency, all while also offering the flexibility of field-programmability.

Because it is based on an FPGA, it can keep pace with new discoveries and stay current with the requirements of rapidly changing AI algorithms.

Put it in action

See Project Brainwave on Intel FPGAs in action on Microsoft Azure and Azure Databox Edge. The FPGAs in the cloud and edge support:

  • Image classification and object detection scenarios
  • Jupyter Notebooks to quickly get started

Using this FPGA-enabled hardware architecture, trained neural networks run quickly and with lower latency. Azure can parallelize pre-trained deep neural networks (DNN) across FPGAs on Azure Kubernetes Service (AKS) to scale out your service. The DNNs can be pre-trained as a deep featurizer for transfer learning or fine-tuned with updated weights. Find out more: https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-accelerate-with-fpgas

People

Original Project Brainwave team

A majority of this team are still collaborating on projects within AI and Advanced Architectures, led by Doug Burger, Technical Fellow. AIArch is a part of the Azure Hardware Systems Group.

In the news

Microsoft blogs

Real-time AI: Microsoft announces preview of Project Brainwave

Every day, thousands of gadgets and widgets whish down assembly lines run by the manufacturing solutions provider Jabil, on their way into the hands of customers. Along the way, an automated optical inspection system scans them for any signs of defects, with a bias toward ensuring that all potential anomalies are detected. It then sends those parts off to be checked manually. The speed of operations leaves manual inspectors with just seconds to decide if the product is really defective, or not.

The AI Blog | May 7, 2018

Microsoft unveils Project Brainwave for real-time AI

Today at Hot Chips 2017, our cross-Microsoft team unveiled a new deep learning acceleration platform, codenamed Project Brainwave.  I’m delighted to share more details in this post, since Project Brainwave achieves a major leap forward in both performance and flexibility for cloud-based serving of deep learning models. We designed the system for real-time AI, which means the system processes requests as fast as it receives them, with ultra-low latency.

Microsoft Research Blog | August 22, 2017