Scalable and Elastic Transactional Data Stores for Cloud Computing Platforms

  • Sudipto Das

PhD Thesis: UNIVERSITY OF CALIFORNIA, SANTA BARBARA |

2013 ACM SIGMOD Jim Gray Doctoral Dissertation Award, 2012 Lancaster Dissertation Award in Mathematics, Physical Sciences, & Engineering by the UCSB Graduate Division.

Publication

Cloud computing has emerged as a multi-billion dollar industry and as a successful paradigm for web application deployment. Economies-of-scale, elasticity, and pay-per-use pricing are the biggest promises of cloud. Database management systems (DBMSs) serving these web applications form a critical component of the cloud software stack. In order to serve thousands of applications and their huge amounts of data, these DBMSs must scale-out to clusters of commodity servers. Moreover, to minimize their operating costs, such DBMSs must also be elastic, i.e., possess the ability to increase and decrease the cluster size in a live system. This is in addition to serving a variety of applications (i.e., supporting multitenancy) while being self-managing, fault-tolerant, and highly available.

The overarching goal of this dissertation is to propose abstractions, protocols, and paradigms to architect efficient, scalable, and practical DBMSs that address the unique set of challenges posed by cloud platforms. This dissertation shows that with careful choice of design and features, it is possible to architect scalable DBMSs that efficiently support transactional semantics to ease application design and elastically adapt to fluctuating operational demands to optimize the operating cost. This dissertation advances the state-of-the-art by improving two critical facets of transaction processing systems. First, we propose architectures and abstractions to support efficient and scalable transaction processing in DBMSs scaling-out using clusters of commodity servers. The key insight is to co-locate data items frequently accessed together within a database partition and limit transactions to access only a single partition. We propose systems where the partitions—the granules for efficient transactional access—can be statically defined based on the applications’ access patterns or dynamically specified on-demand by the application. Second, we propose techniques to migrate database partitions in a live system to allow lightweight elastic load balancing, enable dynamic resource orchestration, and improve the overall resource utilization. The key insight is to leverage the semantics of the DBMS internals to migrate a partition with minimal disruption and performance overhead while ensuring the transactional guarantees and correctness even in the presence of failures. We propose two different techniques to migrate partitions in decoupled storage and shared nothing DBMS architectures.