System Design for Cloud Services

Agenda

Time Session Speaker
8:30
Welcome
Kathryn S. Mckinley, Microsoft Research
8:40
Killer Microseconds and the Tail at Scale
Thomas Wenisch, University of Michigan
Turbocharging Rack-Scale In-Memory Computing with Scale-Out NUMA
Boris Grot, University of Edinburgh
Promising Computing Future Beyond the Limits of CMOS Technology
Douglas Carmean, Microsoft
9:25
Small group discussions
9:35
Question, Answer, Group Thoughts
9:45
Optimal Decentralized Power Management for Large-Scale Computing Clusters
Sherief Reda, Brown University
Intelligent Personal Assistant and its Implication on Future Warehouse Scale Computers
Lingjia Tang, University of Michigan-Ann Arbor
The Art of Sharing Resources Transparently
Sameh Elnikety, Microsoft Research
10:15
Small group discussions
10:25
Question, Answer, Group Thoughts
10:35
Break
11:00
3-minute madness
Anirudh Badam, Microsoft; Ricardo Bianchini, Microsoft; Dilma Da Silva, Texas A&M University; Hadi Esmaeilzadeh, Georgia Institute of Technology; Yuxiong He, Microsoft; Jason Mars, University of Michigan; Kathryn S. McKinley, Microsoft; Todd Mytkowicz, Microsoft; Klara Nahrstedt, University of Illinois, Urbana-Champaigne; Stavros Volos, Microsoft
11:30
Lunch
1:00
Rethinking Systems Management with Game Theory
Benjamin Lee, Duke University
Data Markets in the Cloud: Pricing, Privacy, and Versioning
Adam Wierman, California Institute of Technology
How to Think about Hyperscale Architecture
Doug Burger, Microsoft Research
1:45
Small group discussions
1:55
Question, Answer, Group Thoughts
2:00
Tolerating Holes in Wearable Memories
Karin Strauss, Microsoft Research
Real-time, Intelligent, and Secure Systems for Automated Decision Making
Ion Stoica, University of California-Berkeley
Codesign: from Devices to Hyperscale Datacenters
Marc Tremblay, Microsoft
2:45
Small group discussions
2:55
Question, Answer, Group Thoughts
3:05
Closing: Go forth and compute
Kathryn S. McKinley, Microsoft Research

 

On-demand

Videos

Image attached to System Design for Cloud Services – 3 Minute Madness Link description

System Design for Cloud Services – 3 Minute Madness

Date

July 15, 2016

Speakers

Dilma Da Silva, Klara Nahrstedt, Anirudh Badam, Jason Mars, Michael Taylor, Ricardo Bianchini, Yuxiong He, Kathryn McKinley, Todd Mytkowicz

Affiliation

Texas A&M University, University of Illinois, Microsoft, The University of Michigan, Ann Arbor, Department of Electrical, UCSD, Microsoft, Microsoft, Microsoft, Microsoft

Abstracts

Killer Microseconds and the Tail at Scale

Speaker: Thomas Wenisch, University of Michigan

Online Data Intensive (OLDI) applications, which process terabytes of data with sub-second latencies, are the cornerstone of modern internet services. In this talk, I discuss two system design challenges that make it very difficult to build efficient OLDI applications. (1) Killer Microseconds—today’s CPUs are highly effective at hiding the nanosecond-scale latency of memory accesses and operating systems are highly effective at hiding the millisecond-scale latency of disks. However, modern high-performance networking and flash I/O frequently lead to situations where data are a few microseconds away. Neither hardware nor software offer effective mechanisms to hide microsecond-scale stalls. (2) The Tail at Scale—OLDI services typically rely on sharding data over hundreds of servers to meet latency objectives. However, this strategy mandates waiting for responses from the slowest straggler among these servers. As a result, exceedingly rare events, which have negligible impact on the throughput of a single sever, nevertheless come to dominate the latency distribution of the OLDI service. At 1000-node scale, the 5th ‘9 of the individual server’s latency distribution becomes the 99% latency tail of the entire request. These two challenges cause OLDI operators to execute their workloads inefficiently at low utilization to avoid compounding stalls and tails with queueing delays. There is a pressing need for systems researchers to find ways to hide microsecond-scale stalls and track down and address the rare triggers of 99.999% tail performance anomalies that destroy application-level latency objectives.

Turbocharging Rack-Scale In-Memory Computing with Scale-Out NUMA

Speaker: Boris Grot, University of Edinburgh

Web-scale online services mandate fast access to massive quantities of data. In practice, this is accomplished by sharding the datasets across a pool of servers within a datacenter and keeping each shard in the servers’ main memory to avoid long-latency disk I/O. Accesses to non-local shards take place over the datacenter network, incurring communication delays that are 20-1000x greater than accesses to local memory. In this talk, I will introduce Scale-Out NUMA — a rack-scale architecture with an RDMA-inspired programming model that eliminates chief latency overheads of existing networking technologies and reduces the remote memory access latency to a small factor of local DRAM.

Promising Computing Future Beyond the Limits of CMOS Technology

Speaker: Douglas Carmean, Microsoft

Traditional technology scaling trends have slowed, motivating many to proclaim the end of Moore’s law and the end of CMOS process technology. While the alarmists predict a cataclysmic end to computer systems, as we know them, an evolution to new technologies is more likely. This talk will explore the possibilities of hybrid computing systems that may incorporate quantum, cryogenic and DNA components.

Optimal Decentralized Power Management for Large-Scale Computing Clusters

Speakers: Sherief Reda, Brown University; Lingjia Tang, University of Michigan-Ann Arbor

Power management is a central issue in large-scale computing clusters where a considerable amount of energy is consumed at the expense of a large operational cost. Traditional power management techniques have a centralized design that creates challenges for scalability of computing clusters. We describe a novel framework, DiBA, that achieves optimal power management in a fully decentralized manner. DiBA is a consensus-based algorithm in which each server determines its optimal power consumption locally by communicating its state with neighbors in a cluster until consensus is achieved. We demonstrate the superiority of DiBA using a real cluster and computer simulations.

Intelligent Personal Assistant and its Implication on Future Warehouse Scale Computers

Speaker: Lingjia Tang, University of Michigan-Ann Arbor

As user demand scales for intelligent personal assistants (IPAs) such as Apple’s Siri, Google’s Google Now, and Microsoft’s Cortana, we are approaching the computational limits of current data  center architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications. In this talk, I present the design of Sirius Lucida, an open end-to-end IPA web-service application that accepts queries in the form of voice and images, and responds with natural language. I will then discuss the implications of this type of workload on future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs.

The Art of Sharing Resources Transparently

Speaker: Sameh Elnikety, Microsoft Research

Sharing physical resources among independent large-scale applications allows better resource utilization and therefore reduces costs. However, if not done carefully, sharing becomes dangerous: it degrades the responsiveness of interactive services, and batch workloads do more work taking longer time to complete. In this talk, I will describe some of technical problems that face Microsoft services and platforms such as in Bing Search (internal services) and Azure (external services). I will highlight some of the solutions in particular for latency-sensitive applications and their experimental results. Finally, I will discuss a few subtle problems that arise due to resource competition in large applications.

Rethinking Systems Management with Game Theory

Speaker: Benjamin Lee, Duke University

Datacenter software should share hardware to improve energy efficiency and mitigate energy disproportionality. However, management policies for shared hardware determine whether strategic users participate in consolidated systems. Given strategic behavior, we incentivize participation with mechanisms rooted in algorithmic game theory. First, Resource Elasticity Fairness allocates multiprocessor resources and guarantees sharing incentives, envy-freeness, Pareto efficiency, and strategy-proofness. Second, Repeated Allocation Games allocate heterogeneous processors and guarantee fairness over time. Finally, Computational Sprinting Games allocate performance boosts in datacenters with shared and oversubscribed power supplies, producing an efficient equilibrium. With game theory, we formalize strategic resource competition in shared computer systems.

Data Markets in the Cloud: Pricing, Privacy, and Versioning

Speaker: Adam Wierman, California Institute of Technology

Data is broadly being gathered, bought, and sold in a variety of marketplaces today; however, these markets are in their nascent stages. Data is typically obtained through offline negotiations, but online, dynamic cloud data markets are beginning to emerge. As they do, challenging questions related to pricing and privacy are surfacing. This talk will overview some challenges in this regard and describe a novel perspective related to privacy: privacy is not just in the best interest of the consumer; it actually provides a crucial tool for the data seller as well—one that allows a principled approach for versioning.

How to Think about Hyperscale Architecture

Speaker: Doug Burger, Microsoft Research

The cloud will fundamentally change our industry and field. The market is consolidating on a small number of vendors who are building out massive, global, hyperscale computers. In this short talk I will lay out some principles for the trends that I believe will affect the architecture of these new, worldwide computers.

Tolerating Holes in Wearable Memories

Speakers: Ion Stoica, University of California-Berkeley; Karin Strauss, Microsoft Research; Marc Tremblay, Microsoft

New memory technologies promise denser and cheaper main memory, and may one day displace DRAM. However, many of them experience permanent failures due to wear far more quickly than DRAM. DRAM mechanisms that handle permanent failures rely on very low failure rates and, if directly applied to this new failure model, are extremely inefficient. In this talk, I will discuss our recent work on tolerating wear failures and reducing associated waste by leveraging a managed runtime to abstract away memory layout and work around failures.

Real-time, Intelligent, and Secure Systems for Automated Decision Making

Speaker: Ion Stoica, University of California-Berkeley

To fully realize the value of data, we need the ability to respond and act on the latest data in real-time at global scale, while preserving user privacy and ensuring application security. In this talk I’ll outline the challenges and the research opportunities of real-time automated decision making, and present our plans at Berkeley to tackle these challenges. These efforts are part of the new UC Berkeley RISE (Real-time Intelligent Secure Execution) lab.

Codesign: from Devices to Hyperscale Datacenters

Speaker: Marc Tremblay, Microsoft

This talk will cover how the co-design of devices from the silicon, system and software standpoint, in the context of a fully integrated design team, applies to optimizing hyper-scale datacenters running internal cloud workloads as well as hundreds of thousands of customer workloads running on virtual machines. Simulation results based on these workloads and other benchmarks are presented to improve our understanding of the impact of such technology as large L4 caches and/or high-bandwidth memory.

Biographies

Kathryn S. McKinley, Microsoft Research

Kathryn McKinleyKathryn S. McKinley is a Principal Research at Microsoft. She was previously an Endowed Professor of Computer Science at The University of Texas at Austin. She is interested in creating systems that make programming easy and the resulting programs correct and efficient. She is an IEEE and ACM Fellow.

Thomas Wenisch, University of Michigan

Tom WenischThomas Wenisch is an Associate Professor of Computer Science and Engineering at the University of Michigan, specializing in computer architecture. His prior research includes memory streaming for commercial server applications, multiprocessor memory systems, memory disaggregation, and rigorous sampling-based performance evaluation methodologies. His ongoing work focuses on computational sprinting, server and data center architectures, programming models for byte-addressable NVRAM, and architectures to enable hand-held 3D ultrasound. Wenisch received the NSF CAREER award in 2009. Prior to his academic career, Wenisch was a software developer at American Power Conversion, where he worked on data center thermal topology estimation. He received his Ph.D. in Electrical and Computer Engineering from Carnegie Mellon University.

Boris Grot, University of Edinburgh

Boris GrotBoris Grot is an Assistant Professor in the School of Informatics at the University of Edinburgh. His research seeks to address efficiency bottlenecks and capability shortcomings of processing platforms for big data. Grot received his PhD in Computer Science from The University of Texas at Austin and spent two years as a post-doctoral fellow at EPFL.

Douglas Carmean, Microsoft

Douglas Carmean is currently an Architect at Microsoft exploring the role of advanced technology in the context of future computing ecosystems. Previously, Doug was an Intel Fellow and Director of the Efficient Computing Lab at Intel. He is responsible for creating the vision and concept for the Xeon Phi family products, an architecture for highly parallel workloads based on Intel Architecture processors. Carmean founded a new group at Intel to define, build and productize the Xeon Phi family.

Sherief Reda, Brown University

Sherief RedaSherief Reda is an Associate Professor at the School of Engineering, Brown University. He joined the computer engineering group at Brown after receiving his PhD from UCSD in 2006. His research interests are in the area of computer engineering with emphasis on energy-efficient computing systems, low-power circuit design, and CAD tools. Professor Reda received a number of awards and acknowledgments, including best paper awards in DATE 2002 and ISLPED 2010, a first place award in ISPD VLSI placement contest in 2005, best paper nominations in ICCAD 2005, ASPDAC 2008 and ICCAD 2015. He is a recipient of NSF CAREER award.

Lingjia Tang, University of Michigan-Ann Arbor

Lingjia TangLingjia Tang is an assistant professor of EECS at the University of Michigan. Prior to joining the University of Michigan, she was a research faculty member at UCSD CSE Department 2012-2013.  Her research focuses on computer architecture and software systems, especially such systems for large-scale data centers. Her publication at ASPLOS ’15 and Micro’11 are selected as IEEE Micro Top Picks. She received a best paper award at IEEE/ACM International Conference of Code Generation and Optimization (CGO) 2012. Her publication at International Symposium of Computer Architecture is selected as one of the excellence papers 2011 by Google Research. More information can be found at ClarityLab and Lingjia Tang’s personal website.

Sameh Elnikety, Microsoft Research

Sameh Elnikety is a researcher at Microsoft Research. His research focuses on experimental distributed systems, spanning a number of areas including operating systems, distributed computing and databases. Sameh’s research has impacted several important systems including Azure Machine Learning, Bing, MSN, SQL Azure DB, and Windows scheduling. His work on database replication received the best paper award at Eurosys 2007, and some of the resulting distributed techniques are integrated into MySQL Replication, and soon in SQL Azure DB. Sameh earned his PhD from EPFL in 2007 and MS from Rice in 2003.

Benjamin Lee, Duke University

Benjamin Lee is an associate professor of electrical and computer engineering at Duke University. Dr. Lee received his B.S. from UC Berkeley, his Ph.D. from Harvard, and his post-doctorate from Stanford. He has held visiting positions at Microsoft Research, Intel Labs, and Lawrence Livermore National Lab. Dr. Lee’s research focuses on computer architectures, distributed systems, and algorithmic economics. His research has been honored twice with Top Picks by IEEE Micro, twice with Research Highlights by the Communications of the ACM, and with an ASPLOS Best Paper Award. Dr. Lee has received the NSF CAREER Award and Computing Innovation Fellowship.

Adam Wierman, California Institute of Technology

Adam WiermanAdam Wierman is a Professor in the Department of Computing and Mathematical Sciences at the California Institute of Technology, where he is a founding member of the Rigorous Systems Research Group (RSRG) and maintains a blog called Rigor + Relevance. His research interests center around resource and scheduling decisions in computer systems and services. He received the 2011 ACM SIGMETRICS Rising Star award, the 2014 IEEE Communications Society William R. Bennett Prize, and coauthored best paper awards at ACM SIGMETRICS, IEEE INFOCOM, IFIP Performance (twice), IEEE Green Computing Conference, IEEE Power & Energy Society General Meeting, and ACM GREENMETRICS.

Doug Burger, Microsoft Research

Doug Burger manages the HDX group in MSR NExT. His team is innovating in cloud and silicon architectures, new client devices, architectures for machine learning, and new application models for deep personalization. Prior to joining MSR in 2008, he spent ten years on the faculty at the University of Texas at Austin. He was the co-founder and co-leader of both the TRIPS and Catapult projects.

Karin Strauss, Microsoft Research

Karin StraussKarin Strauss is a researcher in computer architecture at Microsoft Research and an associate affiliate faculty in computer science and engineering at University of Washington. Her research focuses on emerging memory technologies and how to use them reliably and efficiently into systems.

Ion Stoica, University of California-Berkeley

Ion StoicaIon Stoica is a Professor in the EECS Department at University of California at Berkeley. He does research on cloud computing and networked computer systems. Past work includes the Dynamic Packet State (DPS), Chord DHT, Internet Indirection Infrastructure (i3), declarative networks, replay-debugging, and multi-layer tracing in distributed systems. He is an ACM Fellow and has received numerous awards, including the SIGCOMM Test of Time Award (2011), and the ACM doctoral dissertation award (2001). In 2006, he co-founded Conviva, a startup to commercialize technologies for large scale video distribution, and in 2013, he co-founded Databricks a startup to commercialize Apache Spark.

Marc Tremblay, Microsoft

Marc TremblayMarc Tremblay is a Distinguished Engineer in the Silicon and Technology Group at Microsoft. His current role involves defining the strategic silicon roadmap for a broad range of products from devices to servers. His primary sphere of influence covers highly-integrated multi-core server SoCs, accelerators, as well as innovative multi-core SoCs for emerging devices. Marc has published numerous papers on throughput computing, multi-cores, scout threading, transactional memory, speculative multi-threading, Java computing, etc. He is the inventor of approximately 200 patents on those and other topics.

Prior to joining Microsoft in 2009, Marc was the CTO of Microelectronics at Sun Microsystems, where he was a Sun Fellow and SVP. In his role as CTO he was responsible for the technical leadership of over 1200 engineers. Throughout his career, Marc has conceived, initiated, architected, led, defined and shipped a variety of microprocessors such as: superscalar RISC processors (UltraSPARC I/II), bytecode engines (picoJava), VLIW, media and Java-focused (MAJC) as well as the first processor to implement speculative multithreading and transactional memory (ROCK – first silicon). He was nominated as Innovator of the year by EDN. He received his Physics Engineering degree from Laval University in Canada and his M.S. and Ph.D. degrees in Computer Sciences from UCLA.