Pelican: A building block for exascale cold data storage
Pelican aims to store infrequently accessed (cold) data as inexpensively as possible.
The amount of data stored is growing at a huge rate, but not all of it is “hot,” i.e. frequently accessed. There is little reason to store cold data in the same high-performance, high-cost systems as hot data. Our goal was to design a storage system—called Pelican—specifically to take advantage of the needs of cold data workload.
Resource constraints for storing large amounts of data:
- Hardware cost
- Power cost
The design of Pelican, a rack-scale hard disk based storage unit, is to optimize for lower total cost of ownership by trading off access latency; only 8% of its drives can spin concurrently. Because cold storage by definition means it is rarely accessed, this trade-off makes sense. In effect, Pelican is designed to “right provision” its storage—its server, power, cooling and interconnect bandwidth resources are designed to support cold data workloads.
The challenge with this approach is that constraining the number of drives that can spin is a complex resource management problem, which it solves with a unique data layout and IO scheduling scheme.
In this project, we evaluate the performance of a prototype Pelican, and compare it against a traditional storage rack using a cross-validated simulator. We show that compared to an over-provisioned storage rack, Pelican performs well for cold workloads, providing high throughput with access latency and drive failure rates.
Pelican: Rack-scale co-design
- Hardware & software co-designed:
- Power, Cooling, Mechanical, HDDs & Software.
- Trade latency for lower cost.
- Massive density, low per-drive overhead.
- 1152 3.5” HDDs per 52U.
- 2 servers, PCIe bus stretched rack-wide.
- 4x 10G links out of rack.
- Only 8% of disks can spin.
Once updates to the rack design are complete, we expect to deploy this system in Windows Azure Storage datacenters worldwide.