{"id":620802,"date":"2019-11-19T12:06:10","date_gmt":"2019-11-19T20:06:10","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=620802"},"modified":"2020-06-08T13:52:10","modified_gmt":"2020-06-08T20:52:10","slug":"simplestore","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/simplestore\/","title":{"rendered":"SimpleStore"},"content":{"rendered":"<p>Interacting with storage \u2013 be it main memory, local storage, or cloud storage \u2013 is one of the hardest challenges faced by application and platform developers. We have a \u201ckitchen sink\u201d of solutions available today, each optimized for a specific workload. The SimpleStore project aims at simplifying the use of storage for modern cloud, edge, serverless, and big data applications. Our recent <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/www.hpts.ws\/papers\/2019\/rethinking-storage-hpts.pdf\">presentation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> at HPTS overviews the broader research project. We tackle the problem under two broad umbrellas:<\/p>\n<h2>SimpleStore for Compute<\/h2>\n<p>We aim to simplify individual object access, update, and read-modify-write, for embedded edge and cloud applications, streaming, and auto-scaling serverless and actor-oriented compute frameworks. Towards this vision, we have been building systems, abstractions, and consistency models. The projects under this category include:<\/p>\n<ul>\n<li><strong>FASTER<\/strong>: The <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/FASTER\">FASTER<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> project aims to provide an embedded key-value + cache (FasterKV) and log (FasterLog) abstraction over tiered storage, at very high performance.<\/li>\n<li><strong>CPR<\/strong>: <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/concurrent-prefix-recovery-performing-cpr-on-a-database\/\">CPR<\/a> is a new scalable recovery model that provides consistency across caches and storage, in a manner that is applicable to any database or key-value store. We have developed single- and multi-node versions of this model, and it is used for recovery in FASTER.<\/li>\n<li><strong>Distribution and Scale-Out<\/strong>: We have built <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/CRA\">CRA<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, an open-source distributed virtual connection runtime for the modern cloud-edge. CRA has been used with systems like <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/Ambrosia\">Ambrosia<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/FASTER\">FASTER<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> to provide resilient and ephemeral storage capabilities. We are also working on making it easier and more efficient to use FASTER in a distributed client-server environment, in the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2006.03206\">Shadowfax<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> project. Finally, we are working on consistent storage\/cache access in distributed serverless and actor environments, with a distributed version of CPR.<\/li>\n<\/ul>\n<h2>SimpleStore for Analytics<\/h2>\n<p>We aim to simplify and accelerate access to storage for analytics and more complex querying patterns (beyond point reads) by both applications and database systems. The projects under this category include:<\/p>\n<ul>\n<li><strong>Qd-tree<\/strong>: In the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/qd-tree-learning-data-layouts-for-big-data-analytics\/\">qd-tree<\/a> project, we have developed new techniques to leverage workload information to optimize data layouts towards a goal of accelerating modern analytics systems and databases. As future work, we are currently looking into supporting a broader class of workloads and caching layers.<\/li>\n<li><strong>FishStore<\/strong>: Modern data sources have fixed or flexible schemas. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/FishStore\">FishStore<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> is a fast ingestion, storage, and retrieval system that supports fast time-based ingestion of data and allows users to impose a complex workload on storage, with no a priori index or data layout selection necessary. FishStore leverages <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/www.vldb.org\/pvldb\/vol10\/p1118-li.pdf\">Mison<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/simdjson.org\/\">simdjson<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> for fast partial parsing of JSON data. As future work, we plan to generalize FishStore to arbitrary types of queries over rapidly ingested logs.<\/li>\n<li><strong>Secondary Indexing<\/strong>: PSF indexing is a concept from <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/FishStore\">FishStore<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> that allows users to define arbitrary &#8220;predicated subsets&#8221; of data and make them easily accessible for querying in future. We are adding this capability in FASTER C#. Further, based on our experience with FishStore, we are investigating the use of FASTER as the storage layer below a secondary range index such as <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/rocksdb.org\/\">RocksDB<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, in order to support range queries.<\/li>\n<\/ul>\n<h3><strong>Software Links<\/strong><\/h3>\n<ul>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/FASTER\">https:\/\/github.com\/microsoft\/FASTER<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/CRA\">https:\/\/github.com\/microsoft\/CRA<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/microsoft\/FishStore\">https:\/\/github.com\/microsoft\/FishStore<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Interacting with storage \u2013 be it main memory, local storage, or cloud storage \u2013 is one of the hardest challenges faced by application and platform developers. We have a \u201ckitchen sink\u201d of solutions available today, each optimized for a specific workload. The SimpleStore project aims at simplifying the use of storage for modern cloud, edge, [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13563],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-620802","msr-project","type-msr-project","status-publish","hentry","msr-research-area-data-platform-analytics","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2018-01-01","related-publications":[476583,501413,560121,568476,578338,664842,664920,664935,698887],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Badrish Chandramouli","user_id":31166,"people_section":"Section name 0","alias":"badrishc"},{"type":"user_nicename","display_name":"Yinan Li","user_id":35012,"people_section":"Section name 0","alias":"yinali"},{"type":"user_nicename","display_name":"Sebastian Burckhardt","user_id":33544,"people_section":"Section name 0","alias":"sburckha"},{"type":"user_nicename","display_name":"Johannes Gehrke","user_id":32364,"people_section":"Section name 0","alias":"johannes"},{"type":"guest","display_name":"Ted Hart","user_id":664872,"people_section":"Section name 0","alias":""},{"type":"guest","display_name":"Jae Young Do","user_id":664869,"people_section":"Section name 0","alias":""}],"msr_research_lab":[199565],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/620802","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":11,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/620802\/revisions"}],"predecessor-version":[{"id":664917,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/620802\/revisions\/664917"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=620802"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=620802"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=620802"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=620802"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=620802"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}