Impression Store: Compressive Sensing-based Storage for Big Data Analytics
- Jiaxing Zhang ,
- Ying Yan ,
- Liang Jeff Chen ,
- Minjie Wang ,
- Thomas Moscibroda ,
- Zheng Zhang
HotCloud 2014: 6th USENIX Workshop on Hot Topics in Cloud Computing, Philadelphia, PA |
Published by USENIX - Advanced Computing Systems Association
For many big data analytics workloads, approximate results suffice. This begs the question, whether and how the underlying system architecture can take advantage of such relaxations, thereby lifting constraints inherent in today’s architectures. This position paper explores one of the possible directions. Impression Store is a distributed storage system with the abstraction of big data vectors. It aggregates updates internally and responds to the retrieval of top-K high-value entries. With proper extension, Impression Store supports various aggregations, top-K queries, outlier and major mode detection. While restricted in scope, such queries represent a substantial and important portion of many production workloads. In return, the system has unparalleled scalability; any node in the system can process any query, both reads and updates. The key technique we leverage is compressive sensing, a technique that substantially reduces the amount of active memory state, IO, and traffic volume needed to achieve such scalability.