Toward Progress Indicators on Steroids for Big Data Systems.
Recently we have witnessed an explosive growth in the complexity, diversity, number of deployments, and capabilities of big data processing systems (see Figure 11 ). This growth shows no sign of slowing; if anything, it is accelerating, as more users bring more diverse data sets and more system builders strive to provide a richer variety of systems within which to process these data sets. To process a large amount of data, big data systems typically use massively parallel software running on tens, hundreds, or even thousands of servers. Most of these systems target commodity hardware based clusters, including MapReduce, Hyracks, Dryad, and Microsoft SQL Azure, just to name a few. They achieve scalable performance through exploiting data parallelism.