Faster Servers, Services with FlashStore
By Doug Gantenbein
Memory has its faults—and not only the human variety.
Hard drives, for instance, can hold terabytes cheaply. But they’re slow. Random-access memory (RAM) is fast but expensive, and data in RAM disappear the instant the power goes off. Flash memory is faster than hard drives and cheaper than RAM, and it retains information. But the way it handles data writes handicaps its usefulness in server environments.
Still, the distinct advantages of flash—particularly its ability to retain data when power is off, as well as its speed—led two Microsoft Research scientists, Sudipta Sengupta and Jin Li, to develop what they call FlashStore. It’s a flash-based “bridge” between RAM and a hard drive that helps to overcome the handicaps of each while maximizing flash memory’s capabilities and minimizing its weaknesses.
FlashStore operates as a key-value store and uses a “key” to access the “value” associated with each piece of a data record. It supports the operations of read, write, update, and deletion of such data records. A third collaborator on FlashStore, Biplob Debnath, was a research intern at Microsoft and now works for EMC, a network-storage and data-recovery firm.
FlashStore has potentially significant usefulness across a range of computing applications, from server farms to cloud applications. It already has shown it can speed up online gaming for Xbox LIVE players, as well as data-intensive server applications.
Flash memory sits conveniently in the huge gap between RAM and a hard disk in terms of both cost and performance. With its properties of low power consumption, physical ruggedness, and small size, flash has enabled new experiences with many consumer electronic devices for more than a decade. But it is only recently that flash is seeing widespread adoption in desktop and server applications, in the form of solid-state drives. New applications of flash involve different storage-access patterns from those in typical consumer devices and, because of flash’s device properties, pose new challenges to its ability to deliver sustained high throughput and low latency.
“Flash is great technology, but it requires intelligent software to make the best use of it,” says Sengupta, a researcher at Microsoft Research Redmond. “That’s where our work comes in.” Sengupta and Li have written several papers for technical conferences on intelligent software design for utilizing flash memory, one of which is FlashStore: High Throughput Persistent Key-Value Store, which appeared in the 36th International Conference on Very Large Data Bases in September 2010.
Peculiarities of Flash Memory
In FlashStore, flash “sits” between a hard drive and RAM, acting as a high-speed holding area for frequently used data. FlashStore works by overcoming flash memory’s drawbacks. These relate to the way in which flash stores and manages data. In flash memory, data is not as easily overwritten as it is on a hard drive. Flash memory starts in an erased state, then collects data in page-sized units at a time, where a page can vary in size from 512 bytes to eight kilobytes, depending on the device.
But if you need to erase data, you can’t do it a page at a time. Typically, an erase block consists of 32 to 64 data pages. Li compares the way flash works to a line of water buckets.
“The write operation is like pouring water into a series of buckets,” says Li, a principal researcher with Microsoft Research Redmond’s Communication and Collaboration Systems group. “The cheapest flash devices are designed so that if you need to erase a bank—the block—you have to drain the water from all the buckets. You can’t drain just one bucket.”
To handle data writes and reads, the memory controller in a flash-memory device uses a mapping system called Flash Translation Layer (FTL), which maps the logical page address of the incoming data to the actual physical page location of the data in flash memory. If a page of data is rewritten, it is written to a new location with an updated FTL entry.
During a process known in computing circles as “garbage collection,” the flash controller takes valid data from a memory block, as well as new data, and writes both to a new memory block. The freed memory block then can be erased for storing new data—in effect, draining all the buckets in one memory block. But this process isn’t efficient, in large part because of the requirement to reclaim space with pages that are rendered invalid by such operations. In time, repeated erase cycles also degrade the life expectancy of each flash memory block.
Flash also writes only to pages—nothing smaller. Data written to a page that is less than the page size, typically 512 bytes to eight kilobytes, make poor use of that page. Any attempt to write a small chunk of data within a page means the controller will have to move the existing data to another page, along with the new data.
Flash works fine on consumer electronic devices such as cameras and digital music players, where each photo or music file is at least megabytes in size, and data are written to flash mostly sequentially.
But flash performance drops drastically when data writes are random, which is the case for many desktop computing and server applications. Even today’s high-performance, flash-based disk-replacement devices experience a dramatic drop in access latencies and throughput when performing random write operations.
Then there is the cost aspect of flash memory, which is about 10 times more expensive than hard disk per GB—and about 10 times less expensive than RAM. Flash makes most sense for applications that can identify and exploit the sweet spot between cost and performance.
To make flash work efficiently enough to be more attractive on the price-performance tradeoff for heavy-duty server and cloud computing, FlashStore’s creators had to create a more flash-friendly data structure while avoiding random data writes to flash. They did that by giving FlashStore three important design features.
First, FlashStore introduces flash as a cache in the memory hierarchy between RAM and hard disk so that the flash memory holds a “working set” of data—information a computer is most apt to need. FlashStore tracks data use so that, as data becomes less frequently accessed, it eventually is sent back to the hard drive so that new, fresh data can take its place. That makes better use of flash while still getting much faster retrievals than are possible with a hard drive.
Second, FlashStore is designed to eliminate random writes. It organizes data in a log structure on flash so that new data sent to flash does not lead to random writes and, hence, is not subject to garbage collection by the device. FlashStore aggregates small writes from the application into a write buffer in memory and writes to flash when there is enough data to fill a flash page. By making writes sequential to flash, FlashStore makes much better use of flash-device architecture. If the application needs strict data-durability guarantees, FlashStore can aggregate writes in memory up to a timeout interval provided by the application or when a flash page worth of data is amassed, whichever comes first.
Finally, FlashStore is designed to use RAM efficiently by employing a specialized RAM index to access the data on flash. It uses a hash-table-based index in RAM that is designed to save space along both the vertical and horizontal dimensions—the number of slots and the size of each slot, respectively. To reduce the number of slots, it uses a variant of cuckoo hashing that achieves high hash-table load factors while keeping lookup times fast. To reduce the size of each slot, it stores a compact key signature of about two bytes in each slot instead of the full key, which could be tens of bytes or larger, depending on the application. Each slot also stores a pointer, typically four bytes, that points to the full key-value pair record on flash. During key access, the pointer to flash is followed only if the signature in the hash-table slot matches that of the key being searched. These techniques keep the RAM usage frugal—at about six bytes per key-value pair, independent of the key-value pair size—and key-access times fast, involving at most one flash read per lookup on the average.
In continuing work on a system called SkimpyStash, a project that will be described in a research paper to be delivered during the 2011 conference of the Association for Computing Machinery’s Special Interest Group on Management of Data, the Microsoft Research scientists have reduced the memory requirements of FlashStore even more, about six-fold, to about one byte per key-value pair. SkimpyStash reduces memory usage by making a tradeoff with key-access times and by using multiple flash reads per lookup, which is still orders of magnitude faster than hard-disk lookup times. The tradeoff choice between memory usage and key lookup times in SkimpyStash is adjustable and can be made by the application using a simple parameter.
FlashStore’s results are impressive. Server systems using FlashStore can show as much as several tens of factors of increased data throughput compared with an ordinary RAM/hard-drive configuration and existing software that is not flash-aware, such as a key-value store like Berkeley DB. When compared with simple hard-disk replacement with flash but no changes in software, as in using a flash-unaware key-value store, FlashStore demonstrated several factors of improvement in performance—underscoring the impact of using flash-aware data structures and algorithms in FlashStore. It could enable the creation of much less expensive and more power-efficient computing designs. In some cases, Sengupta says, system designers buy arrays of hard disks that remain only partially filled with data, because they need additional disk heads for higher input/output (IO) operations per second and not for disk space. FlashStore can be used to absorb the IO intensive operations at the flash layer, while leaving the hard drive to handle operations involving large data size that amortize disk-seek time overheads.
“You could replace 10 to 20 hard drives with one flash drive using FlashStore for such applications,” he says. “That gives you a capital-expenditure savings, power savings, and operational-expenditure savings, as well, and you also get much faster throughput: an all-win situation on the three metrics of price, power, and performance.”
Applications That Benefit
FlashStore is showing its potential in practical applications. Xbox LIVE Primetime 1 vs. 100, for example, operates a challenging computing environment—when a player initiates actions within a game environment, those actions must be logged and communicated to all other participants in a timely manner to maintain the responsiveness of the gaming experience. Game activity can scale up or down rapidly, depending on the number of online players. A game’s ability to track and update thousands of player actions relies heavily on the back-end system’s ability to map what the players are doing and retrieve changes in game play.
The Xbox LIVE Primetime 1 vs. 100 back end uses database servers running on hard drives to log and update game-state changes. But, Sengupta says, this results in a lot of redundant hardware—and latency issues as the drives retrieve and disseminate data. A flash-based system would be ideal, he says, because flash’s speed could accelerate and improve the responsiveness of online game play. It also could hold relevant data until it is no longer needed, then send it to a hard drive for post-game analytics.
The FlashStore team replayed Xbox LIVE Primetime 1 vs. 100 game traces to test whether FlashStore could boost performance. It did—performing operations 60 times faster than a standard RAM/hard-drive configuration and five times as fast as a flash-unaware key-value store running on a top-of-the line commercial flash-based drive.
FlashStore also achieved significantly better results than a RAM/hard drive or a flash drive when tested with the task of performing data-chunk indexing for data deduplication, a method of reducing storage-capacity needs by eliminating redundant data. The FlashStore research team also has built a complete flash-assisted-storage deduplication system, called ChunkStash. The team reported its design and evaluation in a research paper entitled ChunkStash: Speeding Up Inline Storage Deduplication Using Flash Memory, presented in 2010 during the USENIX Annual Technical Conference.
A FlashStore-like application also might find utility in areas such as ad-sponsored online searches, where relevant advertisements need to be summoned in response to certain queries with tight time constraints on advertisement selection and ranking. While RAM-based data processing is a choice for such latency-sensitive applications, a flash-based solution such as FlashStore could provide acceptable performance in many cases at a fraction of the hardware cost.
Flash memory has come a long way from its beginnings in consumer electronic devices such as digital cameras and portable music players. It is seeing widespread deployment in desktops and servers, spanning consumer, enterprise, and cloud applications. These developments present the computing industry with opportunities to identify new applications for flash memory—as well as challenges across the software and hardware stacks to get maximum performance at lowest cost.
“That,” Sengupta says, “shows the relevance and timeliness of our efforts.”