Scalable database replication


April 25, 2007


The database service is the performance bottleneck in many Web applications. Database replication addresses this bottleneck, but current approaches of building replicated database systems perform poorly, providing limited scalability. As pointed out in the seminal work of Jim Gray, it is very challenging to provide scalable performance in replicated database systems. The difficulty stems from processing update transactions, whose effects have to be propagated and synchronized to maintain consistency among the replicas. The propagation and synchronization of updates often interact poorly with the single machine implementation of the transactional properties. In addition, transactions are traditionally dispatched in a manner that balances the load on replicas but without consideration to their memory demands, leading to memory contention.

Despite these scalability challenges and earlier projections to the contrary, I will demonstrate that database replication on a cluster of commodity servers can provide scalable performance for e-commerce workloads. I will introduce three general techniques that address these performance bottlenecks. First, I will show that the separation of durability (performed inside database systems) and transaction ordering (performed in the replication middleware) forces durability disk writes to become serial. Uniting durability and ordering in the replication middleware permits group commit and removes this scalability bottleneck. Second, I will show that dispatching transactions in a manner that favors in-memory execution, allows transactions to fit their working sets in main memory, reducing memory contention. Third, I will exploit the way transactions are dispatched to propagate updates to where they will be read rather than to all replicas.

These synergetic techniques are implemented in the replication middleware without changing the underlying database systems. Extensive experimental evaluations show a major improvement in system scalability. Employing these middleware techniques improves the performance of a replicated database system by a factor of five over traditional replication approaches.


Sameh Elnikety

Sameh Elnikety obtained his Ph.D. from EPFL in 2007 and then joined the systems and networking group at Microsoft Research in Cambridge, UK. His research aims at building scalable computer systems. Sameh’s work spans a number of research areas including operating systems, distributed computing and databases. Currently, he is working on two projects: (1) providing system support for querying large graphs (e.g. social networks) on a cluster of machines, and (2) reconfiguring replicated databases autonomically.