Realizing the fault-tolerance promise of cloud storage using locks with intent

Operating Systems Design and Implementation (OSDI) |

Published by USENIX

Cloud computing promises easy development and deployment of
large-scale, fault tolerant, and highly available applications. Cloud
storage services are a key enabler of this, because they provide
reliability, availability, and fault tolerance via internal mechanisms
that developers need not reason about. Despite this, challenges
remain for distributed cloud applications developers. They still need
to make their code robust against failures of the machines running the
code, and to reason about concurrent access to cloud storage by
multiple machines.

We address this problem with a new abstraction, called locks with
intent, which we implement in a client library called Olive. Olive
makes minimal assumptions about the underlying cloud storage, enabling
it to operate on a variety of platforms including Amazon DynamoDB and
Microsoft Azure Storage. Leveraging the underlying cloud storage,
Olive’s locks with intent offer strong exactly-once semantics for a
snippet of code despite failures and concurrent duplicate executions.

To ensure exactly-once semantics, Olive incurs the unavoidable
overhead of additional logging writes. However, by decoupling
isolation from atomicity, it supports consistency levels ranging from
eventual to transactional. This flexibility allows applications to
avoid costly transactional mechanisms when weaker semantics suffice.
We apply Olive’s locks with intent to build several advanced storage
functionalities, including snapshots, transactions via optimistic
concurrency control, secondary indices, and live table
re-partitioning. Our experience demonstrates that Olive eases the
burden of creating correct, fault-tolerant distributed cloud
applications.