By Marc Pollefeys, Director of Science, HoloLens
It is not an exaggeration to say that deep learning has taken the world of computer vision, and many other recognition tasks, by storm. Many of the most difficult recognition problems have seen gains over the past few years that are astonishing.
Although we have seen large improvements in the accuracy of recognition as a result of Deep Neural Networks (DNNs), deep learning approaches have two well-known challenges: they require large amounts of labelled data for training, and they require a type of compute that is not amenable to current general purpose processor/memory architectures. Some companies have responded with architectures designed to address the particular type of massively parallel compute required for DNNs, including our own use of FPGAs, for example, but to date these approaches have primarily enhanced existing cloud computing fabrics.
But I work on HoloLens, and in HoloLens, we’re in the business of making untethered mixed reality devices. We put the battery on your head, in addition to the compute, the sensors, and the display. Any compute we want to run locally for low-latency, which you need for things like hand-tracking, has to run off the same battery that powers everything else. So what do you do?
You create custom silicon to do it.
First, a bit of background. HoloLens contains a custom multiprocessor called the Holographic Processing Unit, or HPU. It is responsible for processing the information coming from all of the on-board sensors, including Microsoft’s custom time-of-flight depth sensor, head-tracking cameras, the inertial measurement unit (IMU), and the infrared camera. The HPU is part of what makes HoloLens the world’s first–and still only–fully self-contained holographic computer.
Today, Harry Shum, executive vice president of our Artificial Intelligence and Research Group, announced in a keynote speech at CVPR 2017, that the second version of the HPU, currently under development, will incorporate an AI coprocessor to natively and flexibly implement DNNs. The chip supports a wide variety of layer types, fully programmable by us. Harry showed an early spin of the second version of the HPU running live code implementing hand segmentation.
The AI coprocessor is designed to work in the next version of HoloLens, running continuously, off the HoloLens battery. This is just one example of the new capabilities we are developing for HoloLens, and is the kind of thing you can do when you have the willingness and capacity to invest for the long term, as Microsoft has done throughout its history. And this is the kind of thinking you need if you’re going to develop mixed reality devices that are themselves intelligent. Mixed reality and artificial intelligence represent the future of computing, and we’re excited to be advancing this frontier.