Intelligent Multimedia Group

The Intelligent Multimedia (IM) group aims to build seamless yet efficient multimedia applications and services through breakthroughs in fundamental theory and innovations in algorithm and system technology. We address the problems of intelligent multimedia content sensing, processing, analysis, services, and the generic scalability issues of multimedia computing systems. Current research focus is on video analytics to support intelligent cloud and intelligent edge media services. Current research interests include, but are not limited to, object detection, tracking, semantic segmentation, human pose estimation, people re-ID, action recognition, depth estimation, SLAM, scene understanding, multimodality analysis, etc.

Areas of Focus:

Deep Video Analytics

Video is the biggest big data that contains an enormous amount of information. We are leveraging computer vision and deep learning to develop both cloud-based and edge-based intelligence engines that can turn raw video data into insights to facilitate various applications and services. Target application scenarios include video augmented reality, smart home surveillance, business (retail store, office) intelligence, public security, video storytelling and sharing, etc. We have taken a human centric approach where a significant effort has been focused on understanding human, human attributes and human behaviors. Our research has contributed to a number of video APIs offered in Microsoft Cognitive Services (https://www.microsoft.com/cognitive-services), Azure Media Analytics Services, Windows Machine Learning, Office Media (Stream/Teams), and Dynamics/Connected Store.

– Video API R&D, 3 technologies (intelligent motion detection, face detection/tracking, face redaction), deployed in Microsoft Cognitive Services and Azure Media Services (2016)
 Announcing: Motion detection for Azure Media Analytics (opens in new tab) (2016)
 Announcing face and emotion detection for Azure Media Analytics | Azure Blog and Updates | Microsoft Azure (opens in new tab) (2016)
 Announcing Face Redaction for Azure Media Analytics | Azure Blog and Updates | Microsoft Azure (opens in new tab) (2016)
 Redact faces with Azure Media Analytics | Microsoft Docs (opens in new tab)

– Developed, released/deployed human pose estimation (2019.5) and object tracking (2019.10) technologies as vision skills on the Windows Machine Learning platform.
 微软发布Windows Vision Skills预览版，轻松调用计算机视觉 (opens in new tab)
 NuGet Gallery | Microsoft.AI.Skills.Vision.ObjectTrackerPreview 0.0.0.3 (opens in new tab)

– Speech denoising technologies deployed in Microsoft Stream 1.0 (GA, 2020.6) and 2.0 (Internal Preview 2020.12)
 从嘈杂视频中提取超清人声，语音增强模型PHASEN已加入微软视频服务 (opens in new tab)

– Multi object tracking (FairMOT), Multiview 3D pose estimation (VoxelPose), person re-ID technologies shipped to the Microsoft Dynamics/Connected Store Product. (2020, and ongoing)
 从FairMOT到VoxelPose，揭秘微软以“人”为中心的最新视觉理解成果 (opens in new tab)

– Screen content understanding (element detection/screen tree) technologies shipped to Microsoft’s mobile robotic process automation (RPA) product (2020, and ongoing)

Open Source Projects:

. Human Pose Estimation: VoxelPose
Cross View Fusion for 3D Human Pose Estimation
Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach

. Object Tracking: SA-Siam
SPM-Tracker Siamese network based tracker (a comprehensive PyTorch based toolbox that supports a series of Siamese-network-based tracking methods like SiamFC / SiamRPN / SPM)
A Simple Baseline for One-Shot Multi-Object Tracking (2.2K stars)

. Re-Identification: Semantics-aligned representation learning for person re-identification (SAN)

. Action Recognition: View adaptive neural networks Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition

. Domain Generalization/Adaptation: Style Normalization and Restitution for Domain Generalization and Adaptation

Titanium (past project)

Project Titanium aims at bringing new computing experiences through enriched cloud-client computing. While data and programs can be provided as services from the cloud, the screen, referring to the entire collection of data involved in user interface, constitutes the missing third dimension. Titanium will address the problems of adaptive screen composition, representation, and processing, following the roadmap of Titanium Screen, Titanium Remote, Titanium Live, and Titanium Cloud. As “Titanium” suggests, it will provide a light-weight yet efficient solution towards ultimate computing experiences in the cloud plus service era.

Mira (past project)

Project Mira aims at enabling multimedia representation and processing towards perceptual quality rather than pixel-wise fidelity through a joint effort of signal processing, computer vision, and machine learning. In particular, it seeks to build systems not only incorporating this newly developed vision and learning technologies into compression but also inspiring new vision technologies by looking at the problem from the view of signal processing. By bridging vision and signal processing, this project is expected to offer a fresh frame of mind to multimedia representation and processing.

(Aug. 2021) Congratulations to Yifu, Chunyu, Xin, Cuiling for the following accepted papers!:1. Y. Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking,” to appear in International Journal of Computer Vision;2. Xin Jin, Cuiling Lan, Wenjun Zeng, and Zhibo Chen, “Style Normalization and Restitution for Domain Generalization and Adaptation,” to appear in IEEE Trans. on Multimedia.

*

(July 2021) Congratulations to Xin Jin, Cuiling, Rongchang, Chunyu, Yucheng, Guangting, Chong for the following papers accepted by ICCV 2021!:1. Re-energizing Domain Discriminator with Sample Relabeling for Adversarial Domain Adaptation (Xin Jin, Cuiling Lan, Wenjun Zeng, Zhibo Chen);2. An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation (Rongchang Xie, Chunyu Wang, Wenjun Zeng, Yizhou Wang);3. Self-Supervised Visual Representations Learning by Contrastive Mask Prediction (Yucheng Zhao, Guangting Wang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha).

*

(July 2021) Congratulations to Kecheng, Cuiling and Zhizheng for the following paper accepted by ACM Multimedia 2021 !: Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Jiawei Liu, Zhizheng Zhang, and Zheng-Jun Zha, “Pose-Guided Feature Learning with Knowledge Distillation for Occluded Person Re-Identification,” to appear in ACM Multimedia, 2021.

*

(June 2021) Congratulations to Zhizheng and Cuiling for the following papers accepted by IJCAI 2021 !: 1. Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Zhibo Chen, and Shih-Fu Chang, “Uncertainty-Aware Few-Shot Image Classification”; 2: Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, and Tao Qin, “Generalizing to Unseen Domains: A Survey on Domain Generalization” (survey track)

*

(March 2021) Congratulations to authors of the following papers accepted by CVPR2021: 1. Guoqiang Wei, Cuiling Lan, Wenjun Zeng, Zhibo Chen, “MetaAlign: Coordinating Domain Alignment and Classification for Unsupervised Domain Adaptation”;2. Guangting Wang, Yizhou Zhou, Chong Luo, Wenxuan Xie, Wenjun Zeng, Zhiwei Xiong, “Unsupervised Visual Representation Learning by Tracking Patches in Video”; 3. Xiaotian Chen, Yuwang Wang, Xuejin Chen, Wenjun Zeng, “S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation.” (Oral Paper)

*

(Jan. 2021) Style Normalization and Restitution for Domain Generalization and Adaptation (opens in new tab) is open sourced, and is also on arxiv (opens in new tab)

*

(Dec. 2020) Congratulations to Kecheng, Cuiling, Zhizheng, and Zheng for their following papers accepted by AAAI2021: 1. Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Zhizheng Zhang, Zheng-Jun Zha, “Exploiting Sample Uncertainty for Domain Adaptive Person Re-Identification”;2. Xiao Wang, Zheng Wang, Toshihiko Yamasaki, Wenjun Zeng, “Very Important Person Localization in Unconstrained Conditions: A New Benchmark”

*

(Dec. 2020) Congratulations to Zhe and Chunyu for their following paper accepted by Inter. Journal of Computer Vision: Zhe Zhang, Chunyu Wang, Weichao Qiu, Wenhu Qin, and Wenjun Zeng, “AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild.”

*

(July 2020) Congratulations to Hanyue, Chunyu, Xin, Cuiling for their following papers accepted by ECCV2020: 1. Hanyue Tu, Chunyu Wang, and Wenjun Zeng, “End-to-End Estimation of Multi-Person 3D Poses from Multiple Cameras” (Oral Paper) ;2. Xin Jin, Cuiling Lan, Wenjun Zeng, Zhibo Chen, “Global Distance-distributions Separation for Unsupervised Person Re-identification”

*

(April 2020) Congratulations to Chuanxin, Chong, Zhiyuan, Wenxuan, Yucheng for their following papers accepted by IJCAI: 1. Joint Time-Frequency and Time Domain Learning for Speech Enhancement (Chuanxin Tang, Chong Luo, Zhiyuan Zhao, Wenxuan Xie, Wenjun Zeng);2. Multi-Scale Group Transformer for Long Sequence Modeling in Speech Separation (Yucheng Zhao, Chong Luo, Zheng-Jun Zha, Wenjun Zeng)

*

(March 2020) Congratulations to Guangting, Chong, and Yizhou for their following CVPR papers accepted as Oral Papers: 1. Tracking by Instance Detection: A Meta-Learning Approach (Guangting Wang, Chong Luo, Xiaoyan Sun, Zhiwei Xiong, Wenjun Zeng);2. Spatiotemporal Fusion in 3D CNNs: A Probabilistic View (Yizhou Zhou, Xiaoyan Sun, Chong Luo, Zheng-Jun Zha, Wenjun Zeng)

*

(Feb. 2020) Congratulations to Yizhou Zhou, Xiaoyan Sun, Chong Luo, Xin Jin, Cuiling Lan, Zhizheng Zhang, Guangting Wang, Zhe Zhang, Chunyu Wang, Pengfei Zhang, for the acceptance of their papers by CVPR 2020!! The Intelligent Multimedia Group has a total of 8 papers accepted.

*

(July 2019) Congratulations to Junsheng and Yuwang for the acceptance of their papers entitled “Unsupervised High-Resolution Depth Learning from Videos with Dual Networks” and “Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments” by ICCV 2019!!

*

(July 2019) Congratulations to Haibo, Chunyu, and Jingdong for the acceptance of their paper entitled “Cross View Fusion for 3D Human Pose Estimation” by ICCV 2019!!

*

(July 10, 2019) Dr. Wenjun Zeng presented as a panelist at the industry panel on “From Papers to Products: Bridging the Gap between Multimedia Research and Practical Applications” at the 2019 IEEE Inter. Conf. Multimedia & Expo held in Shanghai, July 8-12!!

*

(July 2019) Congratulations to our collaborator (as part of the MSRA Collaborative Research Program) Prof. Wei-shi Zheng and his team at Sun Yat-Sen University for the acceptance of their paper entitled “Predicting Future Instance Segmentation with Contextual Pyramid ConvLSTMs” by ACM Multimedia 2019!!

*

(July 2019) Congratulations to Guoqiang and Cuiling for the acceptance of their paper entitled “View Invariant 3D Human Pose Estimation” by IEEE Trans. on Cir. and Sys. for Video Technology!

*

(April 2019) Congratulations to Peng, Chunyu, and Jingdong for the acceptance of their paper entitled “Object Detection in Videos by High Quality Object Linking” by IEEE Transactions on Pattern Analysis and Machine Intelligence!!

*

(Feb. 2019) Congratulations to Zhizheng and Cuiling for the acceptance of their paper entitled “Densely Semantically Aligned Person Re-Identification” by CVPR 2019!!

*

(Feb. 2019) Congratulations to Guangting and Chong for the acceptance of their paper entitled “SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking” by CVPR 2019!!

*

(Feb. 2019) Congratulations to Yizhou and Xiaoyan for the acceptance of their paper entitled “Context-Reinforced Semantic Segmentation” by CVPR 2019!!

*

(Feb. 2019) Congratulations to Bi and Wenxuan for the acceptance of the paper entitled “Learning to Update for Object Tracking with Recurrent Meta-learner” by the IEEE Transactions on Image Processing!!

*

(Jan. 2019) Congratulations to Pengfei and Cuiling for the acceptance of the paper entitled “View Adaptive Neural Networks for High Performance Skeleton-based Human Action Recognition” by the IEEE Transactions on Pattern Analysis and Machine Intelligence!!

*

(Jan. 2019) Congratulations to Xiaolin, Cuiling, and Xiaoyan for the acceptance of the paper entitled “Temporal-Spatial Mapping for Action Recognition” by the IEEE Transactions on Circuits and Systems for Video Technology!!

*

(Dec. 2018) Dr. Wenjun Zeng served on the judging committee of the AI Challenger Global AI Contest (https://challenger.ai/?lan=en ).

*

(Oct. 2018) Congratulations to Dr. Wenjun Zeng for receiving the 2018 Industrial Distinguished Leader Award from APSIPA (Asia Pacific Signal and Information Processing Association, www.apsipa.org)!!

*

(Sept. 2018) Congratulations to Anfeng and Chong for winning the second place, among 72 submissions/entries, in the 6th Visual Object Tracking Challenge VOT2018 (http://www.votchallenge.net/vot2018/ ) real-time tracker sub-challenge, held in conjunction with ECCV2018!!