Large-Scale Heterogeneous Storage Optimization under Resource Capacity Constraints
- Bojun Huang ,
- Thomas Moscibroda
Large-scale data centers often adopt more than one type of storage device, each with different storage capacity, I/O capability, and cost. Optimizing the performance-to-cost efficiency of such heterogeneous storage systems is of great practical importance (Cap-Ex), and it is a classic problem in computer system design. The Vector-Sum Model (VSM) is a mental model widely-used by system administrators for this task, due to its conceptual simplicity. The model encompasses various commonly-used rules-of-thumb, such as the five-minute rule or various Knapsack-based heuristics.
In this paper we revisit the Vector-Sum Model and study heterogeneous storage using a new form of optimization diagrams. These diagrams give raise to a near-optimal solution to the problem, which subsumes the existing rules-of-thumb used in practice. Our solution also explains that these heuristics are indeed optimal under their respective assumptions, while they become sub-optimal in more general cases. Specifically, our analysis implies that the recent adoption of SSD in data centers may challenge the quality of these commonly-used heuristics, and that our new optimization approach can sustain data center-scale workloads at lower total purchasing cost. Finally, we show that, although the commonly-used I/O metrics of storage are non-additive, we can use regression techniques to transform the metric into an additive form. Experiments using web search production workloads show that the Vector-Sum Model becomes more accurate after the metric transformation.