Flexible High-resolution Object Detection on Edge Devices with Tunable Latency

  • ,
  • Zhiqi Lin ,
  • Yuanchun Li ,
  • Yuanchao Shu ,
  • Yunxin Liu

The 27th Annual International Conference On Mobile Computing And Networking (MobiCom '21)) |

Published by ACM | Organized by Microsoft

Object detection is a fundamental building block of video analytics applications. While Neural Networks (NNs)-based object detection models have shown excellent accuracy on benchmark datasets, they are not well positioned for high-resolution images inference on resource-constrained edge devices. Common approaches, including down-sampling inputs and scaling up neural networks, fall short of adapting to video content changes and various latency requirements. This paper presents Remix, a flexible framework for high-resolution object detection on edge devices. Remix takes as input a latency budget, and come up with an image partition and model execution plan which runs off-the-shelf neural networks on non-uniformly partitioned image blocks. As a result, it maximizes the overall detection accuracy by allocating various amount of compute power onto different areas of an image. We evaluate Remix on public dataset as well as real-world videos collected by ourselves. Experimental results show that Remix can either improve the detection accuracy by 18%-70% for a given latency budget, or achieve up to 5.5x inference speedup with accuracy on par with the state-of-the-art NNs.