Flexible Transformations For Learning Big Data
- Azalia Mirhoseini ,
- Ebrahim Songhori ,
- Bita Darvish Rouhani ,
- Farinaz Koushanfar
Special Interest Group for the Computer Systems Performance Evaluation Conference, (SIGMETRICS) |
Published by ACM
This paper proposes a domain-specific solution for iterative learning of big and dense (non-sparse) datasets. A large host of learning algorithms, including linear and regularized regression techniques, rely on iterative updates on the data connectivity matrix in order to converge to a solution. The performance of such algorithms often severely degrade when it comes to large and dense data. Massive dense datasets not only induce obligatory large number of arithmetics, but they also incur unwanted message passing cost across the processing nodes. Our key observation is that despite the seemingly dense structures, in many applications, data can be transformed into a new space where sparse structures become revealed. We propose a scalable data transformation scheme that enables creating versatile sparse representations of the data. The transformation can be tuned to benefit the underlying platform’s cost and constraints. Our evaluations demonstrate significant improvement in energy usage, runtime, and memory footprint, within guaranteed user-defined error bounds.