DyLeCT: Achieving Huge-page-like Translation Performance For Hardware-compressed Memory

ISCA |

As DRAM scaling slows, a promising solution is to logically scale up memory capacity through hardware-memory compression, where the CPU-side memory controller (MC) dynamically compresses and packs data more densely into DRAM. However, this requires introducing a new layer of hardware-managed address translation in the MC; for large and irregular workloads that already suffer from frequent virtual address translation misses in the TLB, adding a new translation can double the translation misses (e.g., by adding a new miss in the MC per TLB miss). Worse, while TLB misses can be drastically reduced by using huge pages, no work has explored huge-page-like translation in the MC for hardware memory compression.

This paper explores how to achieve huge-page-like translation performance in the MC for hardware memory compression. To minimize data movement, we let the MC still manage everything at page granularity, instead of huge page. We propose dynamically shortening the translations for hot pages to only a few bits (e.g., 2) by dynamically migrating them to locations encodable by these few bits at the cost of displacing cold pages to locations that require full-length translations to encode. As translation caches favor hot pages, switching to space-efficient short translations for hot pages alone is enough to pack many translations into the cache, so that just 128KB can provide similar (e.g., up to 2GB) total translation reach as a TLB that entirely uses huge page entries. Evaluations show our proposal — Dynamic Length Compressed-Memory Translations (DyLeCT) — improves average performance by 10.5% over the prior art.