Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning
With the growing complexity of deep neural networks (DNNs), developing DNN programs with intricate control flow logic (e.g., loops, branches, and recursion) has become increasingly essential. However, executing such DNN programs efficiently on accelerators is challenging. Current DNN frameworks typically process control flow on the CPU, while offloading the remaining computations to accelerators like GPUs. This often introduces significant synchronization overhead between CPU and the accelerator, and prevents global optimization across control flow scopes.
To address this challenge, we propose Cocktailer, a new DNN compiler that co-optimizes the execution of control flow and data flow on hardware accelerators. Cocktailer provides the uTask abstraction to unify the representation of DNN models, including both control flow and data flow. This allows Cocktailer to expose a holistic scheduling space for rescheduling control flow to the lower-level hardware parallelism of accelerators. Cocktailer uses a heuristic policy to find efficient schedules and is able to automatically move control flow into kernels of accelerators, enabling optimization across control flow boundaries. Evaluations demonstrate that Cocktailer can accelerate DNN models with control flow by up to 8.2× over the fastest one of the state-of-the-art DNN frameworks and compilers.