Instalytics: Storage for Big Data

Established: March 1, 2017

Instalytics (Intelligent Store-powered Analytics) is a vertically integrated infrastructure stack that enables efficient big data analytics in large-scale data centers, by careful co-design of the storage layer (cluster file system) with the compute layer (query engine and job scheduler).

As an example of the benefits from such co-design, Instalytics amplifies the well-known benefits of data partitioning in analytics systems; instead of traditional partitioning on one dimension, Instalytics enables data to be simultaneously partitioned on four different dimensions at the same storage cost, enabling a larger fraction of queries to benefit from partition filtering and joins without network shuffle   To achieve this, Instalytics uses compute-awareness to customize the 3-way replication that the cluster file system employs for availability. A new heterogeneous replication layout enables Instalytics to preserve the same recovery cost and availability as traditional replication.  Another example of using compute-awareness is that the file system in Instalytics  exposes a new sliced-read API that improves performance of joins by enabling multiple compute nodes to read slices of a data block efficiently through coordinated request scheduling and selective caching at the storage nodes.

People

People

Portrait of Muthian Sivathanu

Muthian Sivathanu

Partner Researcher

Portrait of Kaushik Rajan

Kaushik Rajan

Principal Researcher