{"id":1171325,"date":"2026-05-13T10:19:59","date_gmt":"2026-05-13T17:19:59","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1171325"},"modified":"2026-05-13T10:29:03","modified_gmt":"2026-05-13T17:29:03","slug":"mimalloc-a-high-performance-scalable-memory-allocator-for-the-modern-era","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/mimalloc-a-high-performance-scalable-memory-allocator-for-the-modern-era\/","title":{"rendered":"mimalloc:\u00a0A\u00a0new,\u00a0high-performance,\u00a0scalable memory allocator for the modern era"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-1024x576.jpg\" alt=\"Three white line icons\u2014a monitor with code brackets, interlocking gears, and a speedometer\u2014displayed on a purple\u2011to\u2011blue gradient background with a subtle textured pattern.\" class=\"wp-image-1171567\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1.jpg 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"padding-bottom:0; padding-top:0\" class=\"wp-block-msr-immersive-section alignfull row wp-block-msr-immersive-section\">\n\t\n\t<div class=\"container\">\n\t\t<div class=\"wp-block-msr-immersive-section__inner wp-block-msr-immersive-section__inner--narrow\">\n\t\t\t<div class=\"wp-block-columns mb-10 pb-1 pr-1 is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\" style=\"box-shadow:var(--wp--preset--shadow--outlined)\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h2 class=\"wp-block-heading h3\" id=\"at-a-glance\">At a glance<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Today\u2019s critical services and applications are often highly concurrent, using hundreds of threads. They also operate at large memory scales, frequently hundreds of gigabytes, especially when using large language models.<\/li>\n\n\n\n<li>mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free. It is relatively small (~12K lines), with clear internal data structures, and is easy to build and integrate into other projects. It provides bounded worst-case allocation times (up to OS primitives), bounded space overhead, low internal fragmentation, and minimal contention by relying almost exclusively on atomic operations.<\/li>\n\n\n\n<li>mimalloc is available on <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/mimalloc\" type=\"link\" id=\"https:\/\/github.com\/microsoft\/mimalloc\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and has over 12K stars.<\/li>\n<\/ul>\n<\/div>\n<\/div>\t\t<\/div>\n\t<\/div>\n\n\t<\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"mimalloc\">mimalloc<\/h2>\n\n\n\n<p>At the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/research-software-engineering-rise\/\" type=\"link\" id=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/research-software-engineering-rise\/\">RiSE group at Microsoft Research (MSR)<\/a>, we conduct fundamental research into formal methods, programming languages, and software engineering (including emerging agentic systems), with a particular focus on systems that can be provably correct, secure, and performant. The mimalloc memory allocator was initially designed in 2020 as a fast allocator for the state-of-the-art <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/leanprover.github.io\/\" type=\"link\" id=\"http:\/\/leanprover.github.io\/\" target=\"_blank\" rel=\"noopener noreferrer\">Lean<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/koka-lang.github.io\/koka\/doc\/book.html\" type=\"link\" id=\"https:\/\/koka-lang.github.io\/koka\/doc\/book.html\" target=\"_blank\" rel=\"noopener noreferrer\">Koka<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> programming languages developed at RiSE, both of which use novel compiler-guided reference counting (see <em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/06\/perceus-pldi21.pdf\" type=\"link\" id=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/06\/perceus-pldi21.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Perceus<\/a><\/em>).<\/p>\n\n\n\n<p>The\u00a0scalable design of\u00a0mimalloc\u00a0has\u00a0also proved to work exceedingly well for large services at Microsoft. Through close cooperation with product\u00a0teams,\u00a0mimalloc\u00a0has\u00a0significantly improved\u00a0the response times\u00a0in services such as\u00a0Bing..\u00a0Today,\u00a0mimalloc\u00a0is widely used in large services and applications, both within and outside Microsoft. It serves as the allocator for\u00a0NoGIL\u00a0CPython\u00a03.13+, is integrated into Unreal Engine, and is used in games such as<em>\u00a0<\/em>Death Stranding<em>.<\/em>\u00a0<\/p>\n\n\n\n<p>The project is open source on GitHub, with over 12K stars. Its Rust wrapper alone sees over 100K downloads per day.<\/p>\n\n\n\n<p>mimalloc is effective across a wide range of scenarios; from small-scale applications like Koka or Lean, to large services with memory footprints exceeding 500 GiB and hundreds of threads.<\/p>\n\n\n\n<p>Despite this range, the codebase is still\u00a0compact,\u00a0at around 12K\u00a0lines of C.\u00a0Reflecting its research origins,\u00a0mimalloc emphasizes\u00a0clear internal\u00a0data structures\u00a0with strong invariants, making it easier to\u00a0understand\u00a0and reason about than many industry allocators. As Fred Brooks already remarked in his famous book\u00a0<em>The Mythical Man-Month<\/em>: \u201c<em>S<\/em>how me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I\u00a0won\u2019t\u00a0need your flowchart;\u00a0it\u2019ll\u00a0be obvious.\u201d<\/p>\n\n\n\n<p>As a result, mimalloc has been ported to many platforms\u2014Windows, macOS, Linux, FreeBSD, NetBSD, DragonFly, and various consoles\u2014, and is easy to build and integrate into other projects. For example, the clear data structures enabled Sam Gross and others to adopt mimalloc as the concurrent allocator for NoGIL CPython. The design also makes itrelatively straightforward to implement cyclic garbage collection on top of this.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-fast-path\">The Fast Path<\/h2>\n\n\n\n<p>As with other scalable allocators (such as tcmalloc and jemalloc), a core design principle of mimalloc is that each thread maintains its own thread-local heap, which we call a \u201ctheap\u201d. Each theap owns a set of mimalloc \u201cpages,\u201d which are usually 64 KiB. Each mimalloc page contains blocks of a fixed size, organized into size classes to reduce internal fragmentation. By giving each thread its own theap and set of mimalloc pages, memory allocation and deallocation typically proceed without synchronization. Atomic operations are only required when a thread frees a block allocated by another thread.<\/p>\n\n\n\n<p>Moreover, in practice, most allocations are quite small, often less than 1 KiB. For such small allocations, mimalloc provides a fast path where the main allocation function looks like:<\/p>\n\n\n\n<pre><code><span style=\"color:teal\">void<\/span>* mi_malloc( <span style=\"color:teal\">size_t<\/span> size )  \n{ \n  <span style=\"color:teal\">mi_theap_t<\/span>* <span style=\"color:blue\">const<\/span> theap = mi_get_thread_local_theap(); \n  <span style=\"color:blue\">if<\/span> (size > <span style=\"color:teal\">MI_MAX_SMALL_SIZE<\/span>) <span style=\"color:blue\">return<\/span> mi_malloc_generic(theap,size);  <span style=\"color:darkgreen\">\/\/ slow generic path <\/span>\n \n  <span style=\"color:blue\">const<\/span> <span style=\"color:teal\">size_t<\/span> index = (size + <span style=\"color:blue\">sizeof<\/span>(<span style=\"color:teal\">void<\/span>*))\/<span style=\"color:blue\">sizeof<\/span>(<span style=\"color:teal\">void<\/span>*);           <span style=\"color:darkgreen\">\/\/ round size <\/span>\n  <span style=\"color:teal\">mi_page_t<\/span>* <span style=\"color:blue\">const<\/span> page = theap->small_pages[index];                    \n \n  <span style=\"color:teal\">mi_block_t<\/span>* <span style=\"color:blue\">const<\/span> block = page->free;                                <span style=\"color:darkgreen\">\/\/ head of free list <\/span>\n  <span style=\"color:blue\">if<\/span> (block == <span style=\"color:teal\">NULL<\/span>) <span style=\"color:blue\">return<\/span> mi_malloc_generic(theap,size);             <span style=\"color:darkgreen\">\/\/ slow generic path <\/span>\n \n  page->free = block->next;                                            <span style=\"color:darkgreen\">\/\/ pop free list <\/span>\n  page->used++;                                        \n  <span style=\"color:blue\">return<\/span> block; \n}<\/code><\/pre>\n\n\n\n<p>By using thread-local theaps, we need no atomic operations or thread synchronization. We also try to minimize the number of branches. In particular, the thread-local theap is never <code>NULL<\/code>, and we initialize it with a special empty theap with all empty pages. This way, we do not need a separate check if the theap is <code>NULL<\/code>. Similarly, the pointers in the <code>small_pages<\/code> array are never <code>NULL<\/code>, and we use again special empty pages (with page-<code>>free==NULL<\/code>) to avoid a separate check. Finally, pages are initialized with a free listrather than a separate bump pointer, avoiding special cases and enabling allocation by simply popping blocks from the free list. On x64, this code now translates into few instructions with just two uncommon branches:<\/p>\n\n\n\n<pre><code>mi_malloc: \n  movq <span style=\"color:blue\">%rdi<\/span>, <span style=\"color:blue\">%rsi<\/span>             <span style=\"color:darkgreen\">;<\/span><span style=\"color:darkgreen\"> rsi = size<\/span>\n  movq _mi_theap_default@GOTTPOFF(<span style=\"color:blue\">%rip<\/span>), <span style=\"color:blue\">%rax<\/span> \n  movq <span style=\"color:blue\">%fs<\/span>:(<span style=\"color:blue\">%rax<\/span>), <span style=\"color:blue\">%rdi<\/span>       <span style=\"color:darkgreen\">;<\/span><span style=\"color:darkgreen\"> rdi = thread local theap<\/span>\n  cmpq <span class=\"constant\" style=\"color:purple\">$1024<\/span>, <span style=\"color:blue\">%rsi<\/span>            <span style=\"color:darkgreen\">;<\/span><span style=\"color:darkgreen\"> size > MI_MAX_SMALL_SIZE?<\/span>\n  ja <span class=\"constant\" style=\"color:purple\">.LBB0_generic<\/span>\n\n  leaq <span class=\"constant\" style=\"color:purple\">7<\/span>(<span style=\"color:blue\">%rsi<\/span>), <span style=\"color:blue\">%rax<\/span>          <span style=\"color:darkgreen\">;<\/span><span style=\"color:darkgreen\"> round to sizeof(void*)<\/span>\n  andq <span class=\"constant\" style=\"color:purple\">$-8<\/span>, <span style=\"color:blue\">%rax<\/span>\n  movq <span class=\"constant\" style=\"color:purple\">232<\/span>(<span style=\"color:blue\">%rdi<\/span>,<span style=\"color:blue\">%rax<\/span>), <span style=\"color:blue\">%rcx<\/span>   <span style=\"color:darkgreen\">;<\/span><span style=\"color:darkgreen\"> rcx = heap->small_pages[index]<\/span>\n  movq <span class=\"constant\" style=\"color:purple\">8<\/span>(<span style=\"color:blue\">%rcx<\/span>), <span style=\"color:blue\">%rax<\/span>          <span style=\"color:darkgreen\">;<\/span><span style=\"color:darkgreen\"> block = rax = page->free<\/span>\n  testq <span style=\"color:blue\">%rax<\/span>, <span style=\"color:blue\">%rax<\/span>            <span style=\"color:darkgreen\">;<\/span><span style=\"color:darkgreen\"> block == NULL?<\/span>\n  je <span class=\"constant\" style=\"color:purple\">.LBB0_generic<\/span>\n  \n  movq (<span style=\"color:blue\">%rax<\/span>), <span style=\"color:blue\">%rdx<\/span>           <span style=\"color:darkgreen\">;<\/span><span style=\"color:darkgreen\"> page->free = block->next<\/span>\n  movq <span style=\"color:blue\">%rdx<\/span>, <span class=\"constant\" style=\"color:purple\">8<\/span>(<span style=\"color:blue\">%rcx<\/span>)\n  incw <span class=\"constant\" style=\"color:purple\">16<\/span>(<span style=\"color:blue\">%rcx<\/span>)               <span style=\"color:darkgreen\">;<\/span><span style=\"color:darkgreen\"> page->used++<\/span>\n  retq \n\n<span class=\"constant\" style=\"color:purple\">.LBB0_generic<\/span>:\n  jmp _mi_malloc_generic@PLT  <span style=\"color:darkgreen\">;<\/span><span style=\"color:darkgreen\"> tailcall <\/span>\n<\/code><\/pre>\n\n\n\n<p>Similarly, mimalloc provides a fast path for freeing blocks. In practice, most blocks are freed by the same thread that allocated the block. We can optimize that case by checking whether the current thread ID matches the thread ID stored in the corresponding mimalloc page. If so, we can just push our block on the page\u2019s free list without requiring atomic operations or locks:<\/p>\n\n\n\n<pre><code><span style=\"color:teal\">void<\/span> mi_free(<span style=\"color:teal\">void<\/span>* p)  \n{ \n  <span style=\"color:teal\">mi_page_t<\/span>* <span style=\"color:blue\">const<\/span> page = mi_ptr_page(p);         <span style=\"color:darkgreen\">\/\/ get the page meta-data that contains p <\/span>\n  <span style=\"color:blue\">if<\/span> (page==<span style=\"color:teal\">NULL<\/span>) <span style=\"color:blue\">return<\/span>; \n \n  <span style=\"color:blue\">if<\/span> (mi_thread_id() == page->thread_id) {        <span style=\"color:darkgreen\">\/\/ do we own this page? <\/span>\n    <span style=\"color:teal\">mi_block_t<\/span>* <span style=\"color:blue\">const<\/span> block = (<span style=\"color:teal\">mi_block_t<\/span>*)p; \n    block->next = page->local_free;               <span style=\"color:darkgreen\">\/\/ push on the `local_free` list <\/span>\n    page->local_free = block;                      \n    <span style=\"color:blue\">if<\/span> (--page->used == <span class=\"constant\" style=\"color:purple\">0<\/span>) mi_page_free(page);    <span style=\"color:darkgreen\">\/\/ is the entire page free? <\/span>\n  } \n  <span style=\"color:blue\">else<\/span> { \n    mi_free_cross_thread(page, p);                <span style=\"color:darkgreen\">\/\/ free in a page owned by another thread <\/span>\n  } \n} <\/code><\/pre>\n\n\n\n<p>The&nbsp;<code>mi_ptr_page<\/code>&nbsp;function in the latest&nbsp;mimalloc&nbsp;v3&nbsp;retrieves&nbsp;page metadata using an on-demand allocated map of the entire memory. In&nbsp;earlier&nbsp;versions this was faster&nbsp;using alignment&nbsp;tricks.&nbsp;However,&nbsp;in practice, invalid pointers are&nbsp;often&nbsp;passed to&nbsp;<code>mi_free<\/code>&nbsp;when overriding&nbsp;<code>free<\/code>&nbsp;globally.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Using a separate map enables&nbsp;such&nbsp;cases to be detected&nbsp;efficiently and return&nbsp;<code>NULL<\/code>&nbsp;when the pointer is invalid. In particular,&nbsp;<code>mi_ptr_page(NULL) ==&nbsp;NULL<\/code>, which&nbsp;avoids&nbsp;an extra branch by testing only if the&nbsp;page is&nbsp;<code>NULL<\/code>.&nbsp;Additionally,&nbsp;used&nbsp;count&nbsp;is&nbsp;used&nbsp;to&nbsp;efficiently&nbsp;detect&nbsp;when all blocks in a page have been freed.&nbsp;<\/p>\n\n\n\n<p>When a block is freed across threads, we enter the&nbsp;<code>mi_free_cross_thread<\/code>&nbsp;function\u2014the&nbsp;first path that requires atomic operations:&nbsp;<\/p>\n\n\n\n<pre><code><span style=\"color:teal\">void<\/span> mi_free_cross_thread(<span style=\"color:teal\">mi_page_t<\/span>* page, <span style=\"color:teal\">mi_block_t<\/span>* block)  \n{ \n  <span style=\"color:teal\">mi_block_t<\/span>* tfree = mi_atomic_load(&page->thread_free);  <span style=\"color:darkgreen\">\/\/ head of the thread free list <\/span>\n  <span style=\"color:blue\">do<\/span> { \n    block->next = tfree;                                   <span style=\"color:darkgreen\">\/\/ push our block in front <\/span>\n  } <span style=\"color:blue\">while<\/span> (!mi_atomic_compare_and_swap(&page->thread_free, &tfree <span style=\"color:darkgreen\">\/*<\/span><span style=\"color:darkgreen\">expect<\/span><span style=\"color:darkgreen\">*\/<\/span>, block <span style=\"color:darkgreen\">\/*<\/span><span style=\"color:darkgreen\">new<\/span><span style=\"color:darkgreen\">*\/<\/span>))  \n}<\/code><\/pre>\n\n\n\n<p>The block can be freed by pushing it onto the thread-free list of the page. Since this is multi-threaded, it requiresan atomic compare-and-swap operation to push the block atomically. Still, on modern hardware such operations are efficient when uncontended, as their operation is integrated with the cache coherence protocol (MOESI).<\/p>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"999693\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">Spotlight: Event Series<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/microsoft-research-forum\/past-episodes\/?OCID=msr_researchforum_MCR_Blog_Promo\" aria-label=\"Microsoft Research Forum\" data-bi-cN=\"Microsoft Research Forum\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/05\/Research-Forum-hero_1400x788.jpg\" alt=\"Research Forum | abstract background with colorful hexagons\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">Microsoft Research Forum<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"microsoft-research-forum\" class=\"large\">Join us for a continuous exchange of ideas about research in the era of general AI. Watch the latest episodes on demand.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/microsoft-research-forum\/past-episodes\/?OCID=msr_researchforum_MCR_Blog_Promo\" aria-describedby=\"microsoft-research-forum\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"Microsoft Research Forum\" target=\"_blank\">\n\t\t\t\t\t\t\tWatch on-demand\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h2 class=\"wp-block-heading\" id=\"free-list-mayhem\">Free list mayhem<\/h2>\n\n\n\n<p>There are three free lists per page: the free list for allocations, the <code>local_free<\/code> list for freed blocks, and the <code>thread_free<\/code> (atomic) list for blocks that were freed across threads. This guarantees that after a fixed number of allocations, the free list is exhausted, ensuring we occasionally take the slower generic allocation path. This is also used to clean up the free lists by moving thread-local and local free lists back to the main free list. (Note: Actual implementation requires more care to handle cases where the owning thread never allocates again or is blocked for a long time).<\/p>\n\n\n\n<p>Thus, mimalloc has three free lists per (64 KiB) mimalloc page, and effectively that means that a program can easily have thousands of free lists. This is essential to the scalability and cache locality of mimalloc.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"766\" height=\"342\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/btree1.png\" alt=\"A height-balanced tree\" class=\"wp-image-1171406\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/btree1.png 766w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/btree1-300x134.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/btree1-240x107.png 240w\" sizes=\"auto, (max-width: 766px) 100vw, 766px\" \/><figcaption class=\"wp-element-caption\">A height-balanced tree<\/figcaption><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"891\" height=\"686\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/btree2.jpg\" alt=\"Photo of a random tree\" class=\"wp-image-1171407\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/btree2.jpg 891w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/btree2-300x231.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/btree2-768x591.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/btree2-234x180.jpg 234w\" sizes=\"auto, (max-width: 891px) 100vw, 891px\" \/><figcaption class=\"wp-element-caption\">A randomized tree<\/figcaption><\/figure>\n<\/div>\n<\/div>\n\n\n\n<p>For this design, we took inspiration from randomized algorithms. For example, to balance a binary tree we can use smart strategies based on weight or depth, and perform specific rotations to keep it balanced. Such algorithms are usually quite complicated. However, we can also simplify the process and randomly decide on splits during insertion, and by sheer chance, we also end up with trees that are balanced enough.<\/p>\n\n\n\n<p>Similarly, many multi-threaded allocators rely on sophisticated concurrent data structures to synchronize access to shared free lists. In contrast, mimalloc uses a per-page thread-free list, where any thread can push a block using a simple atomic compare-and-swap.<\/p>\n\n\n\n<p>Because there are thousands of such lists, the probability that multiple threads concurrently free blocks to the same page is low. As a result, most push operations are uncontended atomic updates.<\/p>\n\n\n\n<p>By organizing these lists per 64 KiB mimalloc page, cache locality is improved, as allocation tends to stay within the same page until it is full, regardless of freed objects in other pages.<\/p>\n\n\n\n<p>In contrast, consider a design with a single free list per thread or process. When allocating a new structure while freeing objects of the same size\u2014a common pattern in workloads such as tree transformations\u2014allocation may reuse recently freed blocks scattered throughout memory, leading to reduced locality.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"sharing-between-threads\">Sharing between threads<\/h2>\n\n\n\n<p>There is a fundamental tension between scalability and efficient memory sharing between threads. To scale optimally, we would give each thread exclusive ownership to its own pages to minimize any thread synchronization. On the other hand, that may lead to wasted memory: suppose a thread has large quantities of free blocks and another thread needs to allocate blocks of that size \u2013without being able to share or steal those pages, we need to allocate fresh memory instead. In the other extreme, we could share all pages between all threads with a single lock: now memory use is optimal, but we no longer scale. The following benchmark results illustrate this tension:<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1425\" height=\"912\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-windows.png\" alt=\"chart, line chart\" class=\"wp-image-1171397\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-windows.png 1425w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-windows-300x192.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-windows-1024x655.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-windows-768x492.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-windows-240x154.png 240w\" sizes=\"auto, (max-width: 1425px) 100vw, 1425px\" \/><figcaption class=\"wp-element-caption\">1.1x commit, 56 Gib total<\/figcaption><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1388\" height=\"907\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-mimalloc-v2.png\" alt=\"chart, line chart\" class=\"wp-image-1171399\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-mimalloc-v2.png 1388w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-mimalloc-v2-300x196.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-mimalloc-v2-1024x669.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-mimalloc-v2-768x502.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-mimalloc-v2-240x157.png 240w\" sizes=\"auto, (max-width: 1388px) 100vw, 1388px\" \/><figcaption class=\"wp-element-caption\">4x commit, 262 GiB total<\/figcaption><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1462\" height=\"902\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-mimalloc-v3.png\" alt=\"chart, line chart, scatter chart\" class=\"wp-image-1171400\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-mimalloc-v3.png 1462w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-mimalloc-v3-300x185.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-mimalloc-v3-1024x632.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-mimalloc-v3-768x474.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/bench-mimalloc-v3-240x148.png 240w\" sizes=\"auto, (max-width: 1462px) 100vw, 1462px\" \/><figcaption class=\"wp-element-caption\">1.3x commit, 262 GiB Total<\/figcaption><\/figure>\n<\/div>\n<\/div>\n\n\n\n<p>The benchmark runs many tasks\u00a0for a fixed amount of time\u00a0using the Windows thread\u00a0pool with about 800 active threads. The tasks\u00a0alternate between\u00a0allocation,\u00a0deallocation, and\u00a0brief\u00a0blocking periods,\u00a0simulating typical service workloads. In the graphs, the blue line\u00a0represents\u00a0the total live data, while the red line\u00a0represents\u00a0total committed memory by the allocator. The ideal situation is to have the red line as close as possible to the blue line. This is\u00a0almost the\u00a0case for the first graph,\u00a0which uses the standard\u00a0\u00a0system\u00a0allocator: at the end there is just 1.1x more committed than live data\u00a0\u2013 an excellent result! However, over the benchmark duration,\u00a0it allocated a total of only 56 GiB data.<\/p>\n\n\n\n<p>Contrast that with another highly concurrent allocator in the second graph, which was able to allocate 262 GiB over the benchmark duration\u2014almost 4x as much. However, it also committed 4x more memory than the live data. In real workloads with larger memory footprints, such a ratio can quickly become unacceptable. Here we see that the standard allocator didn\u2019t scale as well, but showed better cross-thread memory sharing.<\/p>\n\n\n\n<p>The final graph shows the most recent mimalloc allocator. Like the second allocator, it allocates 262 GiB over the benchmark duration, while reducing committed memory to 1.3xthe live data, which achieves scalability and efficient memory sharing between threads. Similar to work-stealing in modern thread pool implementations, mimalloc uses a \u201cpage stealing\u201d technique, allowing threads to take ownership of pages without expensive cross-thread synchronization.<\/p>\n\n\n\n<p>These improvements were made in close collaboration with the Azure Cosmos DB team at Microsoft. A precise description is beyond the scope of this blog, but we will publish a technical report soon\u2014stay tuned.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free. It is relatively small (~12K lines), with clear internal data structures, and is easy to build and integrate into other projects. It provides bounded worst-case allocation times (up to OS primitives), bounded space overhead, low internal fragmentation, and minimal contention by relying almost exclusively on atomic operations.<\/p>\n","protected":false},"author":43868,"featured_media":1171567,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Daan Leijen","user_id":"31497"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13560],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1171325","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-programming-languages-software-engineering","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[144812],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Daan Leijen","user_id":31497,"display_name":"Daan Leijen","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/daan\/\" aria-label=\"Visit the profile page for Daan Leijen\">Daan Leijen<\/a>","is_active":false,"last_first":"Leijen, Daan","people_section":0,"alias":"daan"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-960x540.jpg\" class=\"img-object-cover\" alt=\"Three white line icons\u2014a monitor with code brackets, interlocking gears, and a speedometer\u2014displayed on a purple\u2011to\u2011blue gradient background with a subtle textured pattern.\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/05\/Mimalloc-BlogHeroFeature-1400x788-1.jpg 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/daan\/\" title=\"Go to researcher profile for Daan Leijen\" aria-label=\"Go to researcher profile for Daan Leijen\" data-bi-type=\"byline author\" data-bi-cN=\"Daan Leijen\">Daan Leijen<\/a>","formattedDate":"May 13, 2026","formattedExcerpt":"mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free. It is relatively small (~12K lines), with clear internal data structures, and is easy to build and integrate into other projects. It provides bounded worst-case allocation times (up&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1171325","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/43868"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1171325"}],"version-history":[{"count":41,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1171325\/revisions"}],"predecessor-version":[{"id":1171879,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1171325\/revisions\/1171879"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1171567"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1171325"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1171325"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1171325"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1171325"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1171325"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1171325"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1171325"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1171325"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1171325"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1171325"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1171325"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}