CompAct: On-chip Compression of Activations for Low Power Systolic Array Based CNN Acceleration

  • Jun Zhang ,
  • Amol Ambardekar ,
  • Siddharth Garg ,
  • Shuayb Zarar

ACM Transactions on Embedded Computing Systems (TECS) |

Related File

On-chip SRAMs, and specifically SRAMs used to hold activation inputs, are a significant contributor to the energy consumption of tightly-coupled systolic-array based convolutional neural network accelerators like the tensor processing unit (TPU). In this paper, we propose CompAct, an architecture that combines lossless compression techniques with \textit{a prior} information about TPU execution schedule to significantly reduce the amount of data and number of accesses to activation SRAMs. Our approach lowers memory-access costs both when data is sparse and when it contains repeating sequences. Based on synthesis results from a 45nm CMOS process, we demonstrate activation SRAM energy reductions of up to 68% for AlexNet and VGG-16 benchmarks, and up to 51% reduction in total chip energy.