The workshop will be held in a hybrid mode on March 28th. All times below are in Central Standard Time.
08:00 – 08:10 Opening remarks
08:10 – 09:10 Keynote I – Life-immersive AI for Family Interaction across Space and Time
Inseok Hwang (opens in new tab), POSTECH
Computer-mediated interaction services connect people over a distance, and often enrich it further with various media sharing. However, we address that people in such an interaction are often ‘locked in a frame’—which includes an interaction mode, a point in time, or a context of either person. We observe that such lock-ins make it difficult to shape the interaction to be mutually symmetric and empathetic to each other. In this talk, I will present a semantic-equivalent melding of space and time to create a new form of empathetic family interaction. As initial attempts, I will introduce two prototype systems, HomeMeld and MomentMeld, which aims to meld space and time by applying AI, respectively. HomeMeld provides a sense of living together to a family living apart with AI-driven autonomous robotic avatars navigating semantic-equivalent location of the other in one’s house. MomentMeld utilizes an ensemble of visual AI to expand the area of interaction topic from the present, by matching semantic-equivalent photos of each other taken in different times. In-the-wild experiments reveal that HomeMeld and MomentMeld open new forms of empathetic family interaction by computationally melding space and time.
Bio: Inseok Hwang is an Assistant Professor of the Department of Computer Science and Engineering at POSTECH. Before joining POSTECH in 2020, he spent six years as a Research Staff Member at IBM Research in Austin, Texas since 2014. His main research theme lies in “intelligent computing infusing real-life”, with special focuses on mobile computing, human-centered systems, and applied AI. He is a recipient of the Best Paper Award from ACM CSCW and multiple Best Demo Awards from ACM MobiSys. He has been actively serving on technical program committees and editorial boards of premier venues in mobile and human-centered computing. He is a prolific inventor of 89 U.S. patents issued to date, in recognition of which he was appointed as an IBM Master Inventor. He obtained his Ph.D. in Computer Science from KAIST in 2013.
09:10 – 09:40 Break
09:40 – 10:40 Session 1
Towards Memory-Efficient Inference in Edge Video Analytics
Arthi Padmanabhan (Microsoft & UCLA), Anand Padmanabha Iyer (Microsoft), Ganesh Ananthanarayanan (Microsoft), Yuanchao Shu (Microsoft), Nikolaos Karianakis (Microsoft), Guoqing Harry Xu (UCLA), Ravi Netravali (Princeton University)
Decentralized Modular Architecture for Live Video Analytics at the Edge
Sri Pramodh Rachuri (Stony Brook University), Francesco Bronzino (Université Savoie Mont Blanc), Shubham Jain (Stony Brook University)
The Case for Admission Control of Mobile Cameras into the Live Video Analytics Pipeline
Francescomaria Faticanti (Fondazione Bruno Kessler & University of Trento), Francesco Bronzino (Université Savoie Mont Blanc), Francesco De Pellegrini (University of Avignon)
10:40 – 11:00 Break
11:00 – 11:40 Session 2
Enabling High Frame-rate UHD Real-time Communication with Frame-Skipping
Tingfeng Wang (Beijing University of Posts and Telecommunications), Zili Meng (Tsinghua University), Mingwei Xu (Tsinghua University), Rui Han (Tencent), Honghao Liu (Tencent)
Characterizing Real-Time Dense Point Cloud Capture and Streaming on Mobile Devices
Jinhan Hu (Arizona State University), Aashiq Shaikh (Arizona State University), Alireza Bahremand (Arizona State University), Robert LiKamWa (Arizona State University)
11:40 – 13:00 Lunch
13:00 – 14:00 Keynote II – TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Devices
Song Han (opens in new tab), MIT
Today’s AI is too big. Deep neural networks demand extraordinary levels of data and computation, and therefore power, for training and inference. This severely limits the practical deployment of AI in edge devices. The explosive growth of video requires video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive. We propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance for video understanding. The key idea of TSM is to shift part of the channels along the temporal dimension, thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. TSM achieves a high frame rate of 74fps and 29fps for online video recognition on Jetson Nano and mobile phone. TSM has higher scalability compared to 3D networks, enabling large-scale Kinetics training in 15 minutes. We hope such TinyML techniques can make video understanding smaller, faster, more efficient for both training and deployment.
Bio: Song Han is an assistant professor at MIT’s EECS. He received his PhD degree from Stanford University. His research focuses on efficient deep learning computing. He proposed “deep compression” technique that can reduce neural network size by an order of magnitude without losing accuracy, and the hardware implementation “efficient inference engine” that first exploited pruning and weight sparsity in deep learning accelerators. His team’s work on hardware-aware neural architecture search that bring deep learning to IoT devices was highlighted by MIT News, Wired, Qualcomm News, VentureBeat, IEEE Spectrum, integrated in PyTorch and AutoGluon, and received many low-power computer vision contest awards in flagship AI conferences (CVPR’19, ICCV’19 and NeurIPS’19). Song received Best Paper awards at ICLR’16 and FPGA’17, Amazon Machine Learning Research Award, SONY Faculty Award, Facebook Faculty Award, NVIDIA Academic Partnership Award. Song was named “35 Innovators Under 35” by MIT Technology Review for his contribution on “deep compression” technique that “lets powerful artificial intelligence (AI) programs run more efficiently on low-power mobile devices.” Song received the NSF CAREER Award for “efficient algorithms and hardware for accelerated machine learning” and the IEEE “AIs 10 to Watch: The Future of AI” award.
14:00 – 14:40 Session 3
Auto-SDA: Automated Video-based Social Distancing Analyzer
Mahshid Ghasemi (Columbia University), Zoran Kostic (Columbia University), Javad Ghaderi (Columbia University), Gil Zussman (Columbia University)
Demo: Cost Effective Processing of Detection-driven Video Analytics at the Edge
Md Adnan Arefeen (University of Missouri Kansas City), Md Yusuf Sarwar Uddin (University of Missouri-Kansas City)