CompAct: On-chip Compression of Activations for Low Power Systolic Array Based CNN Acceleration
- Jun Zhang ,
- Amol Ambardekar ,
- Siddharth Garg ,
- Shuayb Zarar
ACM Transactions on Embedded Computing Systems (TECS) |
On-chip SRAMs, and specifically SRAMs used to hold activation inputs, are a significant contributor to the energy consumption of tightly-coupled systolic-array based convolutional neural network accelerators like the tensor processing unit (TPU). In this paper, we propose CompAct, an architecture that combines lossless compression techniques with \textit{a prior} information about TPU execution schedule to significantly reduce the amount of data and number of accesses to activation SRAMs. Our approach lowers memory-access costs both when data is sparse and when it contains repeating sequences. Based on synthesis results from a 45nm CMOS process, we demonstrate activation SRAM energy reductions of up to 68% for AlexNet and VGG-16 benchmarks, and up to 51% reduction in total chip energy.