Intelligent Multimedia



The Intelligent Multimedia (IM) group aims to build seamless yet efficient media applications and systems through breakthroughs in fundamental theory and innovations in algorithm and system technology. We address the problems of intelligent media content sensing, processing, analysis, delivery, and the generic scalability issues of media computing systems. Current focus is on various topics related to deep video analytics.

Areas of Focus:

Deep Video Analytics

Video is the biggest big data that contains an enormous amount of information. We are leveraging computer vision and deep learning to develop both cloud-based and edge-based intelligence engines that can turn raw video data into insights to facilitate various applications and services. Target application scenarios include video augmented reality, smart home surveillance, business (retail store, office) intelligence, public security, video storytelling and sharing, etc. We have taken a human centric approach where a significant effort has been focused on understanding human, human attributes and human behaviors. Our research has contributed to a number of video APIs offered in Microsoft Cognitive Services (, Azure Media Analytics Services, and Windows Machine Learning.


Project Titanium aims at bringing new computing experiences through enriched cloud-client computing. While data and programs can be provided as services from the cloud, the screen, referring to the entire collection of data involved in user interface, constitutes the missing third dimension. Titanium will address the problems of adaptive screen composition, representation, and processing, following the roadmap of Titanium Screen, Titanium Remote, Titanium Live, and Titanium Cloud. As “Titanium” suggests, it will provide a light-weight yet efficient solution towards ultimate computing experiences in the cloud plus service era.


Project Mira aims at enabling multimedia representation and processing towards perceptual quality rather than pixel-wise fidelity through a joint effort of signal processing, computer vision, and machine learning. In particular, it seeks to build systems not only incorporating this newly developed vision and learning technologies into compression but also inspiring new vision technologies by looking at the problem from the view of signal processing. By bridging vision and signal processing, this project is expected to offer a fresh frame of mind to multimedia representation and processing.

Latest News

(Feb. 2019) Congratulations to Bi and Wenxuan for the acceptance of the paper entitled “Learning to Update for Object Tracking with Recurrent Meta-learner” by the IEEE Transactions on Image Processing!!
(Jan. 2019) Congratulations to Pengfei and Cuiling for the acceptance of the paper entitled “View Adaptive Neural Networks for High Performance Skeleton-based Human Action Recognition” by the IEEE Transactions on Pattern Analysis and Machine Intelligence!!
(Jan. 2019) Congratulations to Xiaolin, Cuiling, and Xiaoyan for the acceptance of the paper entitled “Temporal-Spatial Mapping for Action Recognition” by the IEEE Transactions on Circuits and Systems for Video Technology!!
(Sept. 2018) Congratulations to Anfeng and Chong for winning the second place, among 72 submissions/entries, in the 6th Visual Object Tracking Challenge VOT2018 ( ) real-time tracker sub-challenge, held in conjunction with ECCV2018!!