The 3rd Workshop on Hot Topics in Video Analytics and Intelligent Edges

The 3rd Workshop on Hot Topics in Video Analytics and Intelligent Edges

About

We are all living in the golden era of AI that is being fueled by game-changing systemic infrastructure advancements. Among numerous applications, video analytics in particular, has shown tremendous potential to impact science and society due to breakthroughs in machine learning, copious training data, and pervasive deployment of video capture devices.

Analyzing live video streams is arguably the most challenging of domains for “systems-for-AI”. Unlike text or numeric processing, video analytics require higher bandwidth, consume considerable compute cycles for processing, necessitate richer query semantics, and demand tighter security & privacy guarantees. Video analytics has a symbiotic relationship with edge compute infrastructure. Edge computing makes compute resources available closer to the data sources (e.g., cameras and smartphones). All aspects of video analytics call to be designed “green-field”, from vision algorithms, to the systems processing stack and networking links, and hybrid edge-cloud infrastructure. Such a holistic design will enable the democratization of live video analytics such that any organization with cameras can obtain value from video analytics.

Call for Papers

This workshop calls for research on various issues and solutions that can enable live video analytics with the role for edge computing. Topics of interest include (but not limited to) the following:

  • Low-cost video analytics
  • Deployment experience with large array of cameras
  • Storage of video data and metadata
  • Interactive querying of video streams
  • Network design for video streams
  • Hybrid cloud architectures for video processing
  • Scheduling for multi-tenant video processing
  • Training of vision neural networks
  • Edge-based processor architectures for video processing
  • Energy-efficient system design for video analytics
  • Intelligent camera designs
  • Vehicular and drone-based video analytics
  • Tools and datasets for video analytics systems
  • Novel vision applications
  • Video analytics for social good
  • Secure processing of video analytics
  • Privacy-preserving techniques for video processing
  • Emerging forms of immersive video streams, e.g., 360-degree or volumetric video

Submission Instructions:
Submissions must be original, unpublished work, and not under consideration at another conference or journal. Submitted papers must be no longer than five (5) pages, including all figures, tables, followed by as many pages as necessary for bibliographic references. Submissions should be in two-column 10pt ACM format with authors names and affiliations for single-blind peer review. The workshop also solicits the submission of research, platform, and product demonstrations. Demo submission should be a summary or extended abstract describing the research to be presented, maximum one (1) page with font no smaller than 10 point size, in PDF file format. Demo submission title should begin with “Demo:”.

Authors of accepted papers are expected to present their work at the workshop. Papers accepted for presentation will be published in the MobiCom Workshop Proceedings, and available at the ACM Digital Library. You may find these templates useful in complying with formatting requirements.

Submission site: https://hotedgevideo21.hotcrp.com/.

Important Dates:
Paper Submissions Deadline: May 21 June 4, 2021
Acceptance Notification: June 30, 2021
Camera-ready Papers Due: July 31, 2021
Workshop Date: Jan. 31, 2022

Organizers

Program Committee:

Program

The workshop will be held virtually on Jan. 31st. All times below are in Central Standard Time.

08:00 – 08:10 Opening remarks

08:10 – 09:10 Keynote I
Speaker: Inseok Hwang, POSTECH

09:10 – 09:40 Break

09:40 – 10:40 Session 1

Towards Memory-Efficient Inference in Edge Video Analytics
Arthi Padmanabhan (Microsoft & UCLA), Anand Padmanabha Iyer (Microsoft), Ganesh Ananthanarayanan (Microsoft), Yuanchao Shu (Microsoft), Nikolaos Karianakis (Microsoft), Guoqing Harry Xu (UCLA), Ravi Netravali (Princeton University)

Decentralized Modular Architecture for Live Video Analytics at the Edge
Sri Pramodh Rachuri (Stony Brook University), Francesco Bronzino (Université Savoie Mont Blanc), Shubham Jain (Stony Brook University)

The Case for Admission Control of Mobile Cameras into the Live Video Analytics Pipeline
Francescomaria Faticanti (Fondazione Bruno Kessler & University of Trento), Francesco Bronzino (Université Savoie Mont Blanc), Francesco De Pellegrini (University of Avignon)

10:40 – 11:00 Break

11:00 – 11:40 Session 2

Enabling High Frame-rate UHD Real-time Communication with Frame-Skipping
Tingfeng Wang (Beijing University of Posts and Telecommunications), Zili Meng (Tsinghua University), Mingwei Xu (Tsinghua University), Rui Han (Tencent), Honghao Liu (Tencent)

Characterizing Real-Time Dense Point Cloud Capture and Streaming on Mobile Devices
Jinhan Hu (Arizona State University), Aashiq Shaikh (Arizona State University), Alireza Bahremand (Arizona State University), Robert LiKamWa (Arizona State University)

11:40 – 13:00 Lunch

13:00 – 14:00 Keynote II – TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Devices
Song Han, MIT

Today’s AI is too big. Deep neural networks demand extraordinary levels of data and computation, and therefore power, for training and inference. This severely limits the practical deployment of AI in edge devices. The explosive growth of video requires video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive. We propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance for video understanding. The key idea of TSM is to shift part of the channels along the temporal dimension, thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. TSM achieves a high frame rate of 74fps and 29fps for online video recognition on Jetson Nano and mobile phone. TSM has higher scalability compared to 3D networks, enabling large-scale Kinetics training in 15 minutes. We hope such TinyML techniques can make video understanding smaller, faster, more efficient for both training and deployment.

Bio: Song Han is an assistant professor at MIT’s EECS. He received his PhD degree from Stanford University. His research focuses on efficient deep learning computing. He proposed “deep compression” technique that can reduce neural network size by an order of magnitude without losing accuracy, and the hardware implementation “efficient inference engine” that first exploited pruning and weight sparsity in deep learning accelerators. His team’s work on hardware-aware neural architecture search that bring deep learning to IoT devices was highlighted by MIT News, Wired, Qualcomm News, VentureBeat, IEEE Spectrum, integrated in PyTorch and AutoGluon, and received many low-power computer vision contest awards in flagship AI conferences (CVPR’19, ICCV’19 and NeurIPS’19). Song received Best Paper awards at ICLR’16 and FPGA’17, Amazon Machine Learning Research Award, SONY Faculty Award, Facebook Faculty Award, NVIDIA Academic Partnership Award. Song was named “35 Innovators Under 35” by MIT Technology Review for his contribution on “deep compression” technique that “lets powerful artificial intelligence (AI) programs run more efficiently on low-power mobile devices.” Song received the NSF CAREER Award for “efficient algorithms and hardware for accelerated machine learning” and the IEEE “AIs 10 to Watch: The Future of AI” award.

14:00 – 14:40 Session 3

Auto-SDA: Automated Video-based Social Distancing Analyzer
Mahshid Ghasemi (Columbia University), Zoran Kostic (Columbia University), Javad Ghaderi (Columbia University), Gil Zussman (Columbia University)

Demo: Cost Effective Processing of Detection-driven Video Analytics at the Edge
Md Adnan Arefeen (University of Missouri Kansas City), Md Yusuf Sarwar Uddin (University of Missouri-Kansas City)