{"id":1011459,"date":"2024-03-18T14:03:58","date_gmt":"2024-03-18T21:03:58","guid":{"rendered":""},"modified":"2024-03-18T14:04:00","modified_gmt":"2024-03-18T21:04:00","slug":"introducing-garnet-an-open-source-next-generation-faster-cache-store-for-accelerating-applications-and-services","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/introducing-garnet-an-open-source-next-generation-faster-cache-store-for-accelerating-applications-and-services\/","title":{"rendered":"Introducing Garnet \u2013 an open-source, next-generation, faster cache-store for accelerating applications and services"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1.jpg\" alt=\"Garnet-colored diamond with \"Rich and Extensible API\" at the top, \"Memory + Tiered Storage\" and \"Cluster Mode\" to the right, \"Ultra-Low Latency Pluggable Network Layer\" and \"Fast Checkpointing & Logging\" on the bottom, \"Bare Metal Performance\" and \"Works Everywhere\" to the left.\" class=\"wp-image-1011708\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p>Researchers at Microsoft have been working for nearly a decade to address the increasing demand for data storage mechanisms to support the rapid advances in interactive web applications and services. Our new cache-store system called Garnet, which offers several advantages over legacy cache-stores, has been deployed in multiple use cases at Microsoft, such as those in the Windows & Web Experiences Platform, Azure Resource Manager, and Azure Resource Graph, and is now available as an open-source download at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/garnet\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/github.com\/microsoft\/garnet<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In open sourcing Garnet, we hope to enable the developer community to benefit from its performance gains and capabilities, to build on our work, and to expand the Garnet ecosystem by adding new API calls and features.\u00a0We also hope that the open sourcing will encourage follow-up academic research and open future collaboration opportunities in this important research area.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-cache-store-problem\">The cache-store problem<\/h2>\n\n\n\n<p>The growth of cloud and edge computing has brought an increasing number and range of applications and services that need to access, update, and transform data with higher efficiency, lower latencies, and lower costs than ever before. These applications and services often require significant operational spending on storage interactions, making this one of the most expensive and challenging platform areas today. A cache-store software layer, deployed as a separately scalable remote process, can ease these costs and improve application performance. This has fueled a growing cache-store industry, including many open-source systems, such as Redis, Memcached, KeyDB, and Dragonfly.<\/p>\n\n\n\n<p>Unlike traditional remote cache-stores, which support a simple get\/set interface, modern caches offer rich APIs and feature sets. They support raw strings, analytic data structures such as Hyperloglog, and complex data types such as sorted sets and hash. They allow users to checkpoint and recover the cache, create data shards, maintain replicated copies, and support transactions and custom extensions.<\/p>\n\n\n\n<p>However, existing systems achieve this feature richness at a cost, by keeping the system design simple, which limits the ability to fully exploit the latest hardware capabilities (e.g., multiple cores, tiered storage, fast networks). Further, many of these systems are not explicitly designed to be easily extensible by app developers or to work well on diverse platforms and operating systems.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"introducing-garnet\">Introducing Garnet<\/h2>\n\n\n\n<p>At Microsoft Research, we have been investigating modern key-value database architectures since 2016. Our prior work, the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/03\/faster-sigmod18.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">FASTER<\/a> embedded key-value library, which we <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/FASTER\" target=\"_blank\" rel=\"noopener noreferrer\">open-sourced<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> in 2018, demonstrated orders-of-magnitude better performance than existing systems, while focusing on the simple single-node in-process key-value model.<\/p>\n\n\n\n<p>Starting in 2021, based on requirements from use-cases at Microsoft, we began building a new remote cache-store with all the necessary features to serve as a viable replacement to existing cache-stores. Our challenge was to maintain and enhance the performance benefits that we achieved in our earlier work, but in this more general and realistic network setting.<\/p>\n\n\n\n<p>The result of this effort is Garnet \u2013 a new cache-store that offers several unique benefits:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Garnet adopts the popular RESP wire protocol as a starting point, which makes it possible to use Garnet from unmodified Redis clients available in most programming languages today.<\/li>\n\n\n\n<li>Garnet offers much better scalability and throughput with many client connections and small batches, leading to cost savings for large apps and services.<\/li>\n\n\n\n<li>Garnet demonstrates better client latency at the 99<sup>th<\/sup> and 99.9<sup>th<\/sup> percentiles, which is critical to real-world scenarios.<\/li>\n\n\n\n<li>Based on the latest .NET technology, Garnet is cross-platform, extensible, and modern. It is designed to be easy to develop for and evolve, without sacrificing performance in the common case. We leveraged the rich library ecosystem of .NET for API breadth, with open opportunities for optimization. Thanks to our careful use of .NET, Garnet achieves state-of-the-art performance on both Linux and Windows.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"362\" height=\"271\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_intro.png\" alt=\"Garnet-colored diamond with \"Rich and Extensible API\" at the top, \"Memory + Tiered Storage\" and \"Cluster Mode\" to the right, \"Ultra-Low Latency Pluggable Network Layer\" and \"Fast Checkpointing & Logging\" on the bottom, \"Bare Metal Performance\" and \"Works Everywhere\" to the left.\" class=\"wp-image-1011471\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_intro.png 362w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_intro-300x225.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_intro-360x271.png 360w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_intro-80x60.png 80w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_intro-240x180.png 240w\" sizes=\"auto, (max-width: 362px) 100vw, 362px\" \/><\/figure>\n\n\n\n<p><strong>API features: <\/strong>Garnet supports a wide range of APIs including raw string, analytical, and object operations described earlier. It also implements a cluster mode with sharding, replication, and dynamic key migration. Garnet supports transactions in the form of client-side <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/redis.io\/docs\/interact\/transactions\/\" target=\"_blank\" rel=\"noopener noreferrer\">RESP transactions<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and our own server-side stored procedures in C# and allows users to define custom operations on both raw strings and new object types, all in the convenience of C#, leading to a lower bar for developing custom extensions.<\/p>\n\n\n\n<p><strong>Network, storage, cluster features: <\/strong>Garnet uses a fast and pluggable network layer, enabling future extensions such as leveraging kernel-bypass stacks. It supports secure transport layer security (TLS) communications as well as basic access control. Garnet\u2019s storage layer, called Tsavorite, was forked from OSS FASTER, and includes strong database features such as thread scalability, tiered storage support (memory, SSD, and cloud storage), fast non-blocking <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/01\/cpr-sigmod19.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">checkpointing<\/a>, recovery, operation logging for durability, multi-key transaction support, and better memory management and reuse. Finally, Garnet supports a cluster mode of operation \u2013 more on this later.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"performance-preview\">Performance preview<\/h2>\n\n\n\n<p>We illustrate a few key results comparing Garnet to leading open-source cache-stores. A more detailed performance comparison can be found on our website at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/garnet\/\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/microsoft.github.io\/garnet\/<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<p>We provision two Azure Standard F72s v2 virtual machines (72 vcpus, 144 GiB memory each) running Linux (Ubuntu 20.04), with accelerated TCP enabled. One machine runs different cache-store servers, and the other is dedicated to issuing workloads. We use our own benchmarking tool, called <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/Garnet\/tree\/main\/benchmark\/Resp.benchmark\" target=\"_blank\" rel=\"noopener noreferrer\">Resp.benchmark<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, to generate all results. We compare Garnet to the latest open-source versions of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/redis.io\/\" target=\"_blank\" rel=\"noopener noreferrer\">Redis<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (v7.2), <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/docs.keydb.dev\/\" target=\"_blank\" rel=\"noopener noreferrer\">KeyDB<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (v6.3.4), and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.dragonflydb.io\/\" target=\"_blank\" rel=\"noopener noreferrer\">Dragonfly<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (v6.2.11). We use a uniform random distribution of keys in these experiments (Garnet\u2019s shared memory design benefits even more with skewed workloads). The data is pre-loaded onto each server, and fits in memory in these experiments.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"experiment-1-throughput-with-varying-number-of-client-sessions\">Experiment 1: Throughput with varying number of client sessions<\/h3>\n\n\n\n<p>We start with large batches of GET operations (4096 requests per batch) and small payloads (8-byte keys and values) to minimize network overhead and compare the systems as we increase the number of client sessions. We see from Figure 1 that Garnet exhibits better scalability than Redis and KeyDB, while achieving higher throughput than all three baseline systems (the y-axis is log scale). Note that, while Dragonfly shows similar scaling behavior as Garnet, it is a pure in-memory system. Further, Garnet\u2019s throughput relative to other systems remains strong when the database size (i.e., the number of distinct keys pre-loaded) is significantly larger, at 256 million keys, than what would fit in the processor caches.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"Two clustered column bar graphs comparing the throughput (log-scale) of various systems (Garnet, Redis, KeyDB, and Dragonfly) for a database size of 1024 keys and 256 million keys respectively. The x-axis varies the number of client sessions from 1 to 128. Garnet\u2019s throughput is shown to scale significantly better as the number of client sessions is increased. \" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"622\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig1.png\" alt=\"Two clustered column bar graphs comparing the throughput (log-scale) of various systems (Garnet, Redis, KeyDB, and Dragonfly) for a database size of 1024 keys and 256 million keys respectively. The x-axis varies the number of client sessions from 1 to 128. Garnet\u2019s throughput is shown to scale significantly better as the number of client sessions is increased. \" class=\"wp-image-1011504\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig1.png 1920w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig1-300x97.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig1-1024x332.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig1-768x249.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig1-1536x498.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig1-240x78.png 240w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 1: Throughput (log-scale), varying number of client sessions, for a database size of (a) 1024 keys, and (b) 256 million keys<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"experiment-2-throughput-with-varying-batch-sizes\">Experiment 2: Throughput with varying batch sizes<\/h3>\n\n\n\n<p>We next vary the batch size, with GET operations and a fixed number (64) of client sessions. We experiment with two different database sizes as before. Figure 2 shows that Garnet performs better even with no batching, and the gap increases even for very small batch sizes. Payload sizes are the same as before. Again, the y-axis is log scale.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"Two clustered column bar graphs comparing the throughput (log-scale) of various systems (Garnet, Redis, KeyDB, and Dragonfly) for a database size of 1024 keys and 256 million keys respectively. The x-axis varies the batch size from 1 to 4096. Garnet\u2019s throughput is shown to benefit significantly even from small batch sizes. \" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"2506\" height=\"914\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig2.png\" alt=\"Two clustered column bar graphs comparing the throughput (log-scale) of various systems (Garnet, Redis, KeyDB, and Dragonfly) for a database size of 1024 keys and 256 million keys respectively. The x-axis varies the batch size from 1 to 4096. Garnet\u2019s throughput is shown to benefit significantly even from small batch sizes. \" class=\"wp-image-1011516\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig2.png 2506w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig2-300x109.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig2-1024x373.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig2-768x280.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig2-1536x560.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig2-2048x747.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig2-240x88.png 240w\" sizes=\"auto, (max-width: 2506px) 100vw, 2506px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 2: Throughput (log-scale), varying batch sizes, for a database size of (a) 1024 keys, and (b) 256 million keys<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"experiment-3-latency-with-varying-number-of-client-sessions\">Experiment 3: Latency with varying number of client sessions<\/h3>\n\n\n\n<p>We next measure client-side latencies for the various systems. Figure 3 shows that, as we increase the number of client sessions, Garnet&#8217;s latency (measured in microseconds) at various percentiles stays much more stable and lower as compared to other systems. Here, we issue a mix of 80% GET and 20% SET operations, with no operation batching.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"Three clustered column bar graphs comparing the latency of various systems (Garnet, Redis, KeyDB, and Dragonfly) at median, 99th percentile, and 99.9th percentile respectively. The x-axis varies the number of client sessions from 1 to 128, with no batching, and an operation mix of 80% GET and 20% SET. Garnet\u2019s latency is shown to be stable and generally lower across the board.\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"3518\" height=\"914\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig3.png\" alt=\"Three clustered column bar graphs comparing the latency of various systems (Garnet, Redis, KeyDB, and Dragonfly) at median, 99th percentile, and 99.9th percentile respectively. The x-axis varies the number of client sessions from 1 to 128, with no batching, and an operation mix of 80% GET and 20% SET. Garnet\u2019s latency is shown to be stable and generally lower across the board.\" class=\"wp-image-1011528\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig3.png 3518w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig3-300x78.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig3-1024x266.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig3-768x200.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig3-1536x399.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig3-2048x532.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig3-240x62.png 240w\" sizes=\"auto, (max-width: 3518px) 100vw, 3518px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 3: Latency, varying number of client sessions, at (a) median, (b) 99<sup>th<\/sup> percentile, and (c) 99.9<sup>th<\/sup> percentile<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"experiment-4-latency-with-varying-batch-sizes\">Experiment 4: Latency with varying batch sizes&nbsp;<\/h3>\n\n\n\n<p>Garnet\u2019s latency is optimized for adaptive client-side batching and many sessions querying the system. We increase the batch sizes from 1 to 64 and plot latency at different percentiles below with 128 active client connections. We see in Figure 4 that Garnet\u2019s latency is low across the board. As before, we issue a mix of 80% GET and 20% SET operations.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"Three clustered column bar graphs comparing the latency of various systems (Garnet, Redis, KeyDB, and Dragonfly) at median, 99th percentile, and 99.9th percentile respectively. The x-axis varies the batch size from 1 to 64, with 128 client sessions connected, and an operation mix of 80% GET and 20% SET. Garnet\u2019s latency is shown to be stable and generally lower across the board.\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig4.png\"><img loading=\"lazy\" decoding=\"async\" width=\"3529\" height=\"914\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig4.png\" alt=\"Three clustered column bar graphs comparing the latency of various systems (Garnet, Redis, KeyDB, and Dragonfly) at median, 99th percentile, and 99.9th percentile respectively. The x-axis varies the batch size from 1 to 64, with 128 client sessions connected, and an operation mix of 80% GET and 20% SET. Garnet\u2019s latency is shown to be stable and generally lower across the board.\" class=\"wp-image-1011531\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig4.png 3529w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig4-300x78.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig4-1024x265.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig4-768x199.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig4-1536x398.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig4-2048x530.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet_Fig4-240x62.png 240w\" sizes=\"auto, (max-width: 3529px) 100vw, 3529px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 4: Latency, varying batch sizes, at (a) median, (b) 99<sup>th<\/sup> percentile, and (c) 99.9<sup>th<\/sup> percentile<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"other-experiments\">Other experiments<\/h2>\n\n\n\n<p>We have also experimented with other features and operation types and found Garnet to perform and scale well. Our <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/garnet\/docs\" target=\"_blank\" rel=\"noopener noreferrer\">documentation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> has more details, including how to run these experiments so that you can see the benefits for your own use cases.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"garnet-s-design-highlights\">Garnet\u2019s design highlights<\/h2>\n\n\n\n<p>Garnet\u2019s design re-thinks the entire cache-store stack \u2013 from receiving packets on the network, to parsing and processing database operations, to performing storage interactions. We build on top of years of research, with over 10 research papers published over the last decade. Figure 5 shows Garnet\u2019s overall architecture. We highlight a few key ideas below.<\/p>\n\n\n\n<p>Garnet\u2019s network layer inherits a shared memory design inspired by our prior research on <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/achieving-high-throughput-and-elasticity-in-a-larger-than-memory-store\/\">ShadowFax<\/a>. TLS processing and storage interactions are performed on the IO completion thread, avoiding thread switching overheads in the common case. This approach allows CPU cache coherence to bring the data to the network, instead of traditional shuffle-based designs, which require data movement on the server.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"2532\" height=\"2560\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Figure-5-scaled.jpg\" alt=\"Overall architecture of Garnet. Shows multiple network sessions passing through a parsing and API implementation layer. The storage API is transformed into read, upsert, delete, and read-modify-write operations on the storage layer. Storage consists of a main store and an object store, which both feed into a unified operations log. The log may be relayed to remote replicas. \" class=\"wp-image-1011537\" style=\"width:610px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Figure-5-scaled.jpg 2532w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Figure-5-297x300.jpg 297w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Figure-5-1013x1024.jpg 1013w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Figure-5-768x777.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Figure-5-1519x1536.jpg 1519w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Figure-5-2025x2048.jpg 2025w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Figure-5-178x180.jpg 178w\" sizes=\"auto, (max-width: 2532px) 100vw, 2532px\" \/><figcaption class=\"wp-element-caption\">Figure 5: Overall architecture of Garnet<\/figcaption><\/figure>\n\n\n\n<p>Garnet\u2019s storage design consists of two Tsavorite key-value stores whose fates are bound by a unified operation log. The first store, called the \u201cmain store,\u201d is optimized for raw string operations and manages memory carefully to avoid garbage collection. The second, and optional, \u201cobject store\u201d is optimized for complex objects and custom data types, including popular types such as Sorted Set, Set, Hash, List, and Geo. Data types in the object store leverage the .NET library ecosystem for their current implementations. They are stored on the heap in memory (which makes updates very efficient) and in a serialized form on disk. In the future, we plan to investigate using a unified index and log to ease maintenance.<\/p>\n\n\n\n<p>A distinguishing feature of Garnet\u2019s design is its narrow-waist Tsavorite storage API, which is used to implement the large, rich, and extensible RESP API surface on top. This API consists of read, upsert, delete, and atomic read-modify-write operations, implemented with asynchronous callbacks for Garnet to interject logic at various points during each operation. Our storage API model allows us to cleanly separate Garnet\u2019s parsing and query processing concerns from storage details such as concurrency, storage tiering, and checkpointing.&nbsp;<\/p>\n\n\n\n<p>Garnet further adds support for multi-key transactions based on two-phase locking. One can either use RESP client-side transactions (MULTI\/EXEC) or use our server-side transactional stored procedures in C#.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"cluster-mode\">Cluster mode<\/h2>\n\n\n\n<p>In addition to single-node execution, Garnet supports a cluster mode, which allows users to create and manage a sharded and replicated deployment. Garnet also supports an efficient and dynamic key migration scheme to rebalance shards. Users can use standard Redis cluster commands to create and manage Garnet clusters, and nodes perform gossip to share and evolve cluster state. Overall, Garnet\u2019s cluster mode is a large and evolving feature, and we will cover more details in subsequent posts.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"looking-ahead\">Looking ahead<\/h2>\n\n\n\n<p>As Garnet is deployed in additional scenarios, we will continue to share those details in future articles. We also look forward to continuing to add new features and improvements to Garnet, as well as working with the open-source community.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"project-contributors\">Project contributors<\/h3>\n\n\n\n<p><strong>Garnet Core:<\/strong> <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/badrishc\/\">Badrish Chandramouli<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/vazois\/\">Vasileios Zois<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/lumaas\/\">Lukas Maas<\/a>, Ted Hart, Gabriela Martinez Sanchez, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yrajas\/\">Yoganand Rajasekaran<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/talzacc\/\">Tal Zaccai<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/darrenge\/\">Darren Gehring<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/irinasp\/\">Irina Spiridonova<\/a>.\u00a0<\/p>\n\n\n\n<p><strong>Collaborators:<\/strong> Alan Yang, Pradeep Yadav, Alex Dubinkov, Venugopal Latchupatulla, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/knutmr\/\">Knut Magne Risvik<\/a>, Sarah Williamson, Narayanan Subramanian, Saurabh Singh, Padmanabh Gupta, Sajjad Rahnama, Reuben Bond, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/raaboulh\/\">Rafah Hosn<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/surajitc\/\">Surajit Chaudhuri<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/johannes\/\">Johannes Gehrke<\/a>, and many others.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Garnet is a cache-store system that addresses growing demand for data storage to support interactive web applications and services. Offering several advantages over legacy cache-stores, Garnet is now available as an open-source download. <\/p>\n","protected":false},"author":37583,"featured_media":1011708,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Badrish Chandramouli","user_id":"31166"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13563],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1011459","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-data-platform-analytics","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[957177],"related-projects":[982563,473268],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Badrish Chandramouli","user_id":31166,"display_name":"Badrish Chandramouli","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/badrishc\/\" aria-label=\"Visit the profile page for Badrish Chandramouli\">Badrish Chandramouli<\/a>","is_active":false,"last_first":"Chandramouli, Badrish","people_section":0,"alias":"badrishc"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-960x540.jpg\" class=\"img-object-cover\" alt=\"Garnet-colored diamond with &quot;Rich and Extensible API&quot; at the top, &quot;Memory + Tiered Storage&quot; and &quot;Cluster Mode&quot; to the right, &quot;Ultra-Low Latency Pluggable Network Layer&quot; and &quot;Fast Checkpointing &amp; Logging&quot; on the bottom, &quot;Bare Metal Performance&quot; and &quot;Works Everywhere&quot; to the left.\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/03\/Garnet-BlogHeroFeature-1400x788-1.jpg 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/badrishc\/\" title=\"Go to researcher profile for Badrish Chandramouli\" aria-label=\"Go to researcher profile for Badrish Chandramouli\" data-bi-type=\"byline author\" data-bi-cN=\"Badrish Chandramouli\">Badrish Chandramouli<\/a>","formattedDate":"March 18, 2024","formattedExcerpt":"Garnet is a cache-store system that addresses growing demand for data storage to support interactive web applications and services. Offering several advantages over legacy cache-stores, Garnet is now available as an open-source download.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1011459","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/37583"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1011459"}],"version-history":[{"count":37,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1011459\/revisions"}],"predecessor-version":[{"id":1015962,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1011459\/revisions\/1015962"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1011708"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1011459"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1011459"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1011459"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1011459"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1011459"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1011459"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1011459"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1011459"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1011459"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1011459"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1011459"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}