Hyder is a transactional indexed-record manager for shared flash. That is, it supports operations on indexed records and transaction operations that bracket the record operations. It is designed to run on a cluster of servers that have shared access to a large pool of network-addressable storage, which stores the indexed records as a multiversion log-structured database. Hyder’s main feature is that it scales out without partitioning the database or application.
In Hyder, the database is stored as a multiversion (i.e., copy-on-write) binary search tree. Each node is a key-value pair. Each update to a node in the tree creates a new node, which in turn requires that new copies be created of all of its ancestors. When a transaction begins, it is given the latest copy of the database root, which defines a static snapshot. The transaction’s updates are stored in a transaction-local cache. When the transaction is ready to commit, its updates are gathered into a record and appended to the store.
Unlike conventional database systems, appending a transaction T’s update record to the log does not necessarily commit T. Instead, when a server receives T’s log record L, it determines whether T actually committed. T’s log record contains a reference R to the last committed transaction in the log that contributed to the snapshot that T read. The server scans the log to determine if any committed transactions in between R and L included a conflicting operation. This is determined by looking at the writeset and optionally the readset (depending on the desired isolation level), which are summarized in each log record. If there are no conflicts, then T committed. Otherwise, it aborted. In effect, this is optimistic concurrency control. Since all servers are scanning the same log, they all make the same decision regarding T. If the server that executed T determines that T aborted, then it re-executes T.
Arnd Christian König