Abstract

As datasets grow to tera- and petabyte sizes, exploratory data visualization becomes very difficult: a screen is limited to a few million pixels, and main memory to a few tens of millions of data points. Yet these very large scale analyses are of tremendous interest to industry and academia. This paper discusses some of the major challenges involved in data analytics at scale, including issues of computation, communication, and rendering. It identifies techniques for handling large scale data, grouped into “look at less of it,” and “look at it faster.” Using these techniques involves a number of difficult design tradeoffs for both the ways that data can be represented, and the ways that users can interact with the visualizations.