Concurrency and Consistency in Distributed Data Storage Systems – Part 1


May 31, 2012


Current generation Web applications run at scales of hundreds of millions of users, each with thousands of objects, requiring petabytes of data storage, even excluding large objects such as images and video. Such applications require unprecedented scales of storage, extremely high availability, coupled with low latency for users across the world. Architecting such systems requires replication of data across data centers distributed globally. Such replication poses challenges with respect to consistency, which application developers need to deal with. Several data storage systems have been developed in recent years to address these challenges.

In this set of two talk, we start with an overview of basic concepts of distributed transactions, atomic commit, and concurrency control with replication. After a brief overview of distributed file systems, we address the main topic of distributed data storage systems. Brewer’s CAP theorem and its variants have forced systems to consider different tradeoffs between availability/latency and consistency. We first study the architecture of distributed data storage systems that give greater importance to consistency of individual data items, focusing on three such systems, Big Table, PNUTS and Megastore, which are widely used and path breaking systems. We then study systems that give greater importance to availability than to consistency, starting with a discussion of weak consistency, and then illustrating practical issues in dealing with weak consistency, such as detection and resolution of inconsistencies, using Amazon Dynamo as an example. We conclude with directions of current and future work.


S. Sudarshan

Prof. S. Sudarshan received the Ph.D. from the Univ. of Wisconsin, Madison in 1992. He was a Member of the Technical Staff in the database research group at AT&T Bell Laboratories, from 1992 to 1995, and he has been at the Indian Institute of Technology (IIT), Bombay since 1995, where he currently holds the position of Institute Chair Professor. Prof. Sudarshan also spent a year on sabbatical at Microsoft Research, USA in 2004-05.

His research interests include keyword querying on structured and semi-structured data, processing and optimization of complex queries, holistic optimization spanning the programming language/database boundary, testing of database applications, and database security. He is also a co-author of the database textbook, Database System Concepts, 6th Ed., by Silberschatz, Korth and Sudarshan, which is one of the standard textbooks in the area, and is widely used across the world. He is on the editorial board of IEEE Transactions on Knowledge and Data Engineering since 2010, and on the editorial board of ACM Transactions on Database Systems since 2005.