Tiered Storage

Established: May 1, 2016

Running analytics computations in a cloud setting with disaggregated storage and compute is particularly challenging.  The inherent variability experienced when transferring data from storage nodes to compute substrate is known to adversely affect performance.  In this project, we focus on building a tiered substrate where data is seamlessly migrated from storage tier to compute tier and cached in the compute tier to provide predictable performance.  We have been building abstractions to HDFS to allow it to tier over external data stores such as Azure and S3 (see HDFS-9806 (opens in new tab)).