Optimizing Dynamic Neural Networks with Brainstorm

Weihao Cui; Zhenhua Han; Lingji Ouyang; Yichuan Wang; Ningxin Zheng; Lingxiao Ma; Yuqing Yang; Fan Yang; Jilong Xue; Lili Qiu; Lidong Zhou; Quan Chen; Haisheng Tan; Minyi Guo

Optimizing Dynamic Neural Networks with Brainstorm

Weihao Cui ,
Zhenhua Han ,
Lingji Ouyang ,
Yichuan Wang ,
Ningxin Zheng ,
Lingxiao Ma ,
Yuqing Yang ,
Fan Yang ,
Jilong Xue ,
Lili Qiu ,
Lidong Zhou ,
Quan Chen ,
Haisheng Tan ,
Minyi Guo

OSDI'23 | July 2023

Download BibTex

Dynamic neural networks (NNs), which can adapt sparsely activated sub-networks to inputs during inference, have shown significant advantages over static ones in terms of accuracy, computational efficiency, and adaptiveness. However, existing deep learning frameworks and compilers mainly focus on optimizing static NNs with deterministic execution, missing optimization opportunities brought by non-uniform distribution of activation in dynamic NNs. The key to optimizing dynamic NNs is the traceability of how data are dynamically dispatched to different paths at inference. Such dynamism often happens at sub-tensor level (e.g., conditional dispatching tokens of a tensor), thus hard for existing tensor-centric frameworks to trace due to misaligned expression granularity.

In this paper, we present Brainstorm, a deep learning framework for optimizing dynamic NNs, which bridges the gap by unifying how dynamism should be expressed. Brainstorm proposes (1) Cell, the key data abstraction that lets model developers express the data granularity where dynamism exists, and (2) Router, a unified interface to let model developers express how Cells should be dynamically dispatched. Brainstorm handles efficient execution of routing actions. This design allows Brainstorm to collect profiles of fine-grained dataflow at the correct granularity. The traceability further opens up a new space of dynamic optimization for dynamic NNs to specialize their execution to the runtime dynamism distribution. Extensive evaluations show Brainstorm brings up to 11.7× speedup (3.29× on average) or leads to 42% less memory consumption for popular dynamic neural networks with the proposed dynamic optimizations.