Interacting with storage – be it main memory, local storage, or cloud storage – is one of the hardest challenges faced by application and platform developers. We have a “kitchen sink” of solutions available today, each optimized for a specific workload. The SimpleStore project aims at simplifying the use of storage for modern cloud, edge, serverless, and big data applications. Our recent presentation at HPTS overviews the broader research project. We tackle the problem under two broad umbrellas:
SimpleStore for Compute
We aim to simplify individual object access, update, and read-modify-write, for embedded edge and cloud applications, streaming, and auto-scaling serverless and actor-oriented compute frameworks. Towards this vision, we have been building systems, abstractions, and consistency models. The projects under this category include:
- FASTER: The FASTER project aims to provide an embedded key-value + cache (FasterKV) and log (FasterLog) abstraction over tiered storage, at very high performance.
- CPR: CPR is a new scalable recovery model that provides consistency across caches and storage, in a manner that is applicable to any database or key-value store. We have developed single- and multi-node versions of this model, and it is used for recovery in FASTER.
- Distribution and Scale-Out: We have built CRA, an open-source distributed virtual connection runtime for the modern cloud-edge. CRA has been used with systems like Ambrosia and FASTER to provide resilient and ephemeral storage capabilities. We are also working on making it easier and more efficient to use FASTER in a distributed client-server environment, in the Shadowfax project. Finally, we are working on consistent storage/cache access in distributed serverless and actor environments, with a distributed version of CPR.
SimpleStore for Analytics
We aim to simplify and accelerate access to storage for analytics and more complex querying patterns (beyond point reads) by both applications and database systems. The projects under this category include:
- Qd-tree: In the qd-tree project, we have developed new techniques to leverage workload information to optimize data layouts towards a goal of accelerating modern analytics systems and databases. As future work, we are currently looking into supporting a broader class of workloads and caching layers.
- FishStore: Modern data sources have fixed or flexible schemas. FishStore is a fast ingestion, storage, and retrieval system that supports fast time-based ingestion of data and allows users to impose a complex workload on storage, with no a priori index or data layout selection necessary. FishStore leverages Mison and simdjson for fast partial parsing of JSON data. As future work, we plan to generalize FishStore to arbitrary types of queries over rapidly ingested logs.
- Secondary Indexing: PSF indexing is a concept from FishStore that allows users to define arbitrary “predicated subsets” of data and make them easily accessible for querying in future. We are adding this capability in FASTER C#. Further, based on our experience with FishStore, we are investigating the use of FASTER as the storage layer below a secondary range index such as RocksDB, in order to support range queries.