CompAct: On-chip Compression of Activations for Low Power Systolic Array Based CNN Acceleration

Jun Zhang; Amol Ambardekar; Siddharth Garg; Shuayb Zarar

CompAct: On-chip Compression of Activations for Low Power Systolic Array Based CNN Acceleration

Jun Zhang ,
Amol Ambardekar ,
Siddharth Garg ,
Shuayb Zarar

ACM Transactions on Embedded Computing Systems (TECS) | October 2019

Download BibTex

On-chip SRAMs, and specifically SRAMs used to hold activation inputs, are a significant contributor to the energy consumption of tightly-coupled systolic-array based convolutional neural network accelerators like the tensor processing unit (TPU). In this paper, we propose CompAct, an architecture that combines lossless compression techniques with \textit{a prior} information about TPU execution schedule to significantly reduce the amount of data and number of accesses to activation SRAMs. Our approach lowers memory-access costs both when data is sparse and when it contains repeating sequences. Based on synthesis results from a 45nm CMOS process, we demonstrate activation SRAM energy reductions of up to 68% for AlexNet and VGG-16 benchmarks, and up to 51% reduction in total chip energy.