Abstract

Many large-scale online services use structured storage to persist metadata and sometimes data. The structured storage is typically provided by standard database servers such as Microsoft’s SQL Server. It is important to understand the workloads seen by these servers, both for provisioning server hardware as well as to exploit opportunities for energy savings and server consolidation. In this paper we analyze disk I/O traces from production servers in four internet services as well as servers running TPC benchmarks. We show using a range of load metrics that the services differ substantially from each other and from standard TPC benchmarks. Online services also show significant diurnal patterns in load that can be exploited for energy savings or consolidation. We argue that TPC benchmarks do not capture these important characteristics and argue for developing benchmarks that can be parameterized with workload features extracted from live production workload traces.