Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Catapult: Moving Beyond CPUs in the Cloud

June 16, 2014 | By Microsoft blog editor

Posted by Rob Knies

Field-programmable gate array

Operating a datacenter at web scale requires managing many conflicting requirements. The ability to deliver computation at a high level and speed is a given, but because of the demands such a facility must meet, a datacenter also needs flexibility. Additionally, it must be efficient in its use of power, keeping costs as low as possible.

Addressing often conflicting goals is a challenge, leading datacenter providers to seek constant performance and efficiency improvements and to evaluate the merits of general-purpose versus task-tuned alternatives—particularly in an era in which Moore’s Law is nearing an end, as some suggest.

Microsoft researchers and colleagues from Bing have been collaborating with others from industry and academia to examine datacenter hardware alternatives, and their work, a project known as Catapult, was presented in Minneapolis on June 16 during the 41st International Symposium on Computer Architecture (ISCA).

Their paper, titled A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services" href="" target="_blank">A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services, describes an effort to combine programmable hardware and software that uses field-programmable gate arrays (FPGAs) to deliver performance improvements of as much as 95 percent.

The significance of this work, says Peter Lee, head of Microsoft Research, could be dramatic.

“Going into production with this new technology will be a watershed moment for Bing search,” he says. “For the first time ever, the quality of Bing’s page ranking will be driven not only by great algorithms but also by hardware—incredibly advanced hardware that can be made more highly specialized than anything ever seen before at datacenter scale.”

Microsoft researcher Doug Burger, one of 23 co-authors of the ISCA paper, explains the motivation behind this project.

“We are addressing two problems,” he says. “First, how do we keep accelerating services and reducing costs in the cloud as the performance gains from CPUs continue to flatten?

“Second, we wanted to enable Bing to run computations at a scale that was not possible in software alone, for much better results at lower cost.”

Members of the Project Catapult teamDerek Chiou, a Bing hardware architect, discusses the benefits of the collaboration.

“The partnership between Doug and his team at Microsoft Research and Bing has been fantastic and has resulted in significant results that will have real impact on Bing,” Chiou says. “The factor of two throughput improvement demonstrated in the pilot means we can do the same amount of work with half the number of servers or double the amount of work with the same number of servers—or some mix of the two.

“Those kinds of numbers are especially significant at the scale of a datacenter. The potential benefits go beyond simple dollars. To give some examples, Bing’s ranking could be further enhanced to provide an even better customer experience, power could be saved, and the size of the datacenters could be reduced. The strength of the pilot results have led to Bing deploying this technology in one datacenter for customers, starting in early 2015.”

As the ISCA paper notes, FPGAs have become powerful computing devices in recent years, making them particularly suited for use as fine-grained accelerators.

“We designed a platform that permits the software in the cloud, which is inherently programmable, to partner with programmable hardware,” Burger says. “You can move functions into custom hardware, but rather than burning them into fixed chips [application-specific integrated circuits], we map them to Altera FPGAs, which can run hardware designs but can be changed by reconfiguring the FPGA.

“We’ve demonstrated a ‘programmable hardware’ enhanced cloud, running smoothly and reliably at large scale.”

In the evaluation deployment outlined in the paper, the reconfigurable fabric—interconnected nodes linked by high-bandwidth connections—was tested on a collection of 1,632 servers to measure its efficacy in accelerating the workload of a production web-search service. The results were impressive: a 95 percent improvement in throughput at a latency comparable to a software-only solution. With an increase in power consumption and total per-server cost increase of less than 30 percent, the net results deliver substantial savings and efficiencies.

The results demonstrated the project’s capability to run stably for long periods, and all the stages in the pipeline exceeded the overall throughput goal. In addition, a service to handle failures quickly reconfigures the fabric after errors or machine failures.

The ISCA paper concludes by underscoring the belief that distributed reconfigurable fabrics will play a critical role as server performance increases level off. Such techniques could become indispensable to datacenter managers balancing their conflicting goals.

“This portends a future where systems are specialized dynamically by compiling a good chunk of demanding workloads into hardware,” Burger says. “I would imagine that a decade hence, it will be common to compile applications into a mix of programmable hardware and programmable software.

“This is a radical shift that will offer continued performance improvements past the end of Moore’s Law as we move more and more of our applications and services into hardware.”

Up Next

Systems and networking

Researchers seek to simplify the complex in cloud computing

From February 26–28, researchers gathered in Boston for the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI), one of the top conferences in the networking and systems field. Microsoft, a silver sponsor of the event, was represented by researchers serving on the program committee, as well as those presenting papers, including two research […]

Microsoft blog editor

networking with a personal touch

Systems and networking

Hyperscale cloud reliability and the art of organic collaboration

What does it take to build one of the most reliable hyperscale clouds on the planet? It clearly requires astronomical investments and a vast organization that operates at global scale in near seamless coordination. Yet the breakthroughs that fuel this story emerged organically, from a combination of innovation and mentoring relationships that grew into close […]

Microsoft blog editor

Systems and networking

Remote memories accessed, and created, at SOSP 2017 in Shanghai, China

I spend my day working on problems related to transactions and accessing memory on one computer using the computer processing unit, or CPU, of another computer, a technology known as remote direct memory access, or RDMA. While the technology has been around for many years in high-performance computing, it is just now gaining widespread traction […]

Alex Shamis

Senior Research Software Development Engineer