Microsoft at CVPR 2020

Microsoft at CVPR 2020


Microsoft is proud to be a Diamond Sponsor of CVPR 2020. Make sure to catch Satya Nadella’s Fireside Chat at 9:00 PDT on Tuesday, June 16. Stop by our virtual booth to chat with our experts to learn more about our research and open opportunities.

Oral presentations

Tuesday, June 16

Oral 1.1A – 3D From a Single Image and Shape-From-X (1)
10:50 – 10:55 PDT
ActiveMoCap: Optimized Viewpoint Selection for Active Human Motion Capture
Sena Kiciroglu, Helge Rhodin, Sudipta Sinha, Mathieu Salzmann, Pascal Fua
Video >

Oral 1.2A – 3D From Multiview and Sensors (1)
12:10 – 12:15 PDT
TextureFusion: High-Quality Texture Acquisition for Real-Time RGB-D Scanning
Joo Ho Lee, Hyunho Ha, Yue Dong, Xin Tong, Min H. Kim
Video >

Oral 1.2C – Efficient Training and Inference
12:30 – 12:35 PDT
Towards Efficient Model Compression via Learned Global Ranking
Ting-Wu Chin, Ruizhou Ding, Cha Zhang, Diana Marculescu
Video >

Oral 1.3A – 3D From a Single Image and Shape-From-X (2); 3D From Multiview and Sensors (2)
14:40 – 14:45 PDT
Why Having 10,000 Parameters in Your Camera Model Is Better Than Twelve
Thomas Schöps, Viktor Larsson, Marc Pollefeys, Torsten Sattler
Video >

Oral 1.3C – Low-Level and Physics-Based Vision
14:25 – 14:30 PDT
Bringing Old Photos Back to Life
Ziyu Wan, Bo ZhangDongdong Chen, Pan Zhang, Dong Chen, Jing Liao, Fang Wen
Video >

14:30 – 14:35 PDT
A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising
Kaixuan Wei, Ying Fu, Jiaolong Yang, Hua Huang
Video >

Wednesday, June 17

Oral 2.1A – 3D From Multiview and Sensors (3)
10:15 – 10:20 PDT
RoutedFusion: Learning Real-Time Depth Map Fusion
Silvan Weder, Johannes SchönbergerMarc Pollefeys, Martin R. Oswald
Video >

Oral 2.1B – Face, Gesture, and Body Pose (1)
10:00 – 10:05 PDT
ReDA:Reinforced Differentiable Attribute for 3D Face Reconstruction
Wenbin ZhuHsiangTao WuZeyu ChenNoranart VesdapuntBaoyuan Wang
Video >

10:20 – 10:25 PDT
Face X-ray for More General Face Forgery Detection
Lingzhi Li, Jianmin BaoTing ZhangHao YangDong ChenFang WenBaining Guo
Video >

10:55 – 11:00 PDT
Advancing High Fidelity Identity Swapping for Forgery Detection
Lingzhi Li, Jianmin BaoHao YangDong ChenFang Wen
Video >

Oral 2.2B – Motion and Tracking (1)
12:00 – 12:05 PDT
LSM: Learning Subspace Minimization for Low-level Vision
Chengzhou Tang, Lu Yuan, Ping Tan
Video >

12:20 – 12:25 PDT
MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask
Shengyu Zhao, Yilun Sheng, Yue Dong, Eric Chang, Yan Xu
Video >

12:25 – 12:30 PDT
Tracking by Instance Detection: A Meta-Learning Approach
Guangting Wang, Chong LuoXiaoyan Sun, Zhiwei Xiong, Wenjun Zeng
Video >

Oral 2.1C – Image and Video Synthesis (1)
10:30 – 10:35 PDT
Cross-domain Correspondence Learning for Exemplar-based Image Translation
Pan Zhang, Bo ZhangDong ChenLu YuanFang Wen
Video >

10:35 – 10:40 PDT
Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning
Yu Deng, Jiaolong YangDong ChenFang WenXin Tong
Video >

Oral 2.3A – Face, Gesture, and Body Pose (3); Motion and Tracking (2)
14:15 – 14:20 PDT
Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking
Jin Gao, Weiming Hu, Yan Lu
Video >

Oral 2.4C – Transfer/Low-Shot/Semi/Unsupervised Learning (2)
16:10 – 16:15 PDT
HyperSTAR: Task-Aware Hyperparameters for Deep Networks
Gaurav Mittal, Chang Liu, Nikolaos Karianakis, Victor FragosoMei Chen, Yun Fu
Video >

Thursday, June 18

Oral 3.1B – Video Analysis and Understanding
9:05 – 9:10 PDT
Spatiotemporal Fusion in 3D CNNs: A Probabilistic View
Yizhou Zhou, Xiaoyan SunChong Luo, Zheng-Jun Zha, Wenjun Zeng
Video >

Oral 3.1C – Vision & Language
9:30 – 9:35 PDT
SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions
Ramprasaath Ramasamy Selvaraju, Purva Tendulkar, Devi Parikh, Eric HorvitzMarco RibeiroBesmira NushiEce Kamar
Video >

9:40 – 9:45 PDT
Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation
Necati Cihan Camgoz, Simon Hadfield, Oscar Koller, Richard Bowden
Video >

Oral 3.2A – Recognition (Detection, Categorization) (2)
11:25 – 11:30 PDT
Dynamic Convolution: Attention over Convolution Kernels
Yinpeng ChenXiyang DaiMengchen LiuDongdong ChenLu YuanZicheng Liu
Video >

Oral 3.2C – Machine Learning Architectures and Formulations
11:40 – 11:45 PDT
Local Context Normalization: Revisiting Local Normalization
Anthony Ortiz, Caleb Robinson, Md Mahmudulla Hassan, Dan Morris, Olac Fuentes, Christopher Kiekintveld, Nebojsa Jojic
Video >


Tuesday, June 16

Poster 1.1 – 3D From a Single Image and Shape-From-X; Action and Behavior Recognition; Adversarial Learning | 10:00 – 12:00 PDT

Leveraging Photometric Consistency over Time for Sparsely Supervised Hand-Object Reconstruction – #58
Yana Hasson, Bugra TekinFederica Bogo, Ivan Laptev, Marc Pollefeys, Cordelia Schmid
Video >

Self-Supervised Human Depth Estimation From Monocular Videos – #66
Feitong Tan, Hao Zhu, Zhaopeng Cui, Siyu Zhu, Marc Pollefeys, Ping Tan
Video >

Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning – #71
Tianlong Chen, Sijia Liu, Shiyu Chang, Yu Cheng, Lisa Amini, Zhangyang Wang
Video >

Geometry-Aware Satellite-to-Ground Image Synthesis for Urban Areas – #87
Xiaohu Lu, Zuoyue Li, Zhaopeng Cui, Martin R. Oswald, Marc Pollefeys, Rongjun Qin
Video >

Weakly-Supervised Action Localization by Generative Attention Modeling – #102
Baifeng ShiQi Dai, Yadong Mu, Jingdong Wang
Video >

Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition – #112
Pengfei Zhang, Cuiling LanWenjun Zeng, Junliang Xing, Jianru Xue, Nanning Zheng
Video >

Poster 1.2 – 3D From Multiview and Sensors; Computational Photography; Efficient Training and Inference Methods for Networks | 12:00 – 14:00 PDT

DIST: Rendering Deep Implicit Signed Distance Function With Differentiable Sphere Tracing – #77
Shaohui Liu, Yinda Zhang, Songyou Peng, Boxin Shi, Marc Pollefeys, Zhaopeng Cui
Video >

Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach – #95
Zhe Zhang, Chunyu Wang, Wenhu Qin, Wenjun Zeng
Video >

gDLS*: Generalized Pose-and-Scale Estimation Given Scale and Gravity Priors – #96
Victor FragosoJoseph Degol, Gang Hua
Video >

Poster 1.3 — 3D From a Single Image and Shape-From-X; 3D From Multiview and Sensors; Image Retrieval; Datasets and Evaluation; Low-Level and Physics-Based Vision | 14:00 – 16:00 PDT

Style Normalization and Restitution for Generalizable Person Re-identification – #69
Xin Jin, Cuiling LanWenjun Zeng, Zhibo Chen, Li Zhang
Video >

Relation-aware Global Attention for Person Re-identification – #73
Zhizheng Zhang, Cuiling LanWenjun Zeng, Xin Jin, Zhibo Chen
Video >

Single Image Reflection Removal through Cascaded Refinement – #110
Chao Li, Yixiao Yang, Kun He, Stephen Lin, John Hopcroft
Video >

Poster 1.4 — Scene Analysis and Understanding; Medical, Biological and Cell Microscopy; Transfer/Low-Shot/Semi/Unsupervised Learning | 16:00 – 18:00 PDT

Unsupervised Instance Segmentation in Microscopy Images via Panoptic Domain Adaptation and Task Re-Weighting – #55
Dongnan Liu, Donghao Zhang, Yang Song, Fan Zhang, Lauren O’Donnell, Heng Huang, Mei Chen, Weidong Cai
Video >

Reliable Weighted Optimal Transport for Unsupervised Domain Adaptation – #70
Renjun Xu, Pelen Liu, Liyan Wang, Chao Chen, Jindong Wang
Video >

Wednesday, June 17

Poster 2.1 – 3D From Multiview and Sensors; Face, Gesture, and Body Pose; Image and Video Synthesis | 10:00 – 12:00 PDT

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation – #53
Bowen Cheng, Bin Xiao, Jingdong Wang, Honghui Shi, Thomas Huang, Lei Zhang
Video >

Learning Texture Transformer Network for Image Super-Resolution – #93
Fuzhi Yang, Huan YangJianlong Fu, Hongtao Lu, Baining Guo
Video >

Deep Shutter Unrolling Network – #108
Peidong Liu, Zhaopeng Cui, Viktor Larsson, Marc Pollefeys
Video >

Poster 2.2 – Face, Gesture, and Body Pose; Motion and Tracking; Representation Learning | 12:00 – 14:00 PDT

A Transductive Approach for Video Object Segmentation – #84
Zhirong Wu, Yizhuo Zhang, Houwen PengStephen Lin
Video >

Poster 2.3 – Face, Gesture, and Body Pose; Motion and Tracking; Image and Video Synthesis; Nearal Generative Models; Optimization and Learning Methods | 14:00 – 16:00 PDT

Deep 3D Portrait from a Single Image – #36
Sicheng Xu, Jiaolong YangDong ChenFang Wen, Yu Deng, Yunde Jia, Xin Tong
Video >

BachGAN: High-Resolution Image Synthesis from Salient Object Layout – #102
Yandong Li, Yu ChengZhe Gan, Licheng Yu, Liqiang Wang, Jingjing Liu
Video >

Thursday, June 18

Poster 3.1 — Recognition (Detection, Categorization); Video Analysis and Understanding; Vision + Language | 9:00 – 11:00 PDT

Rethinking Classification and Localization for Object Detection – #49
Yue Wu, Yinpeng ChenLu YuanZicheng LiuLijuan WangHongzhi Li, Yun Fu
Video >

Memory Enhanced Global-Local Aggregation for Video Object Detection – #64
Yihong Chen, Yue CaoHan Hu, Liwei Wang
Video >

Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-based Person Re- identification – #71
Zhizheng Zhang, Cuiling LanWenjun Zeng, Zhibo Chen
Video >

Violin: A Large-Scale Dataset for Video-and-Language Inference – #120
Jingzhou Liu, Wenhu Chen, Yu ChengZhe Gan, Licheng Yu, Yiming Yang, Jingjing Liu
Video >

Poster 3.3 — Recognition (Detection, Categorization); Segmentation, Grouping and Shape; Vision Applications and Systems; Vision & Other Modalities; Transfer/Low-Shot/Semi/Unsupervised Learning | 15:00 – 17:00 PDT

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training – #96
Weituo Hao, Chunyuan LiXiujun Li, Lawrence Carin Duke, Jianfeng Gao
Video >

MMTM: Multimodal Transfer Module for CNN Fusion – #111
Hamid Vaezi Joze, Amirreza Shaban, Michael Iuzzolino, Kazuhito Koishida
Video >

Poster 3.4 – Miscellaneous | 17:00 – 19:00 PDT

Density-Aware Graph for Deep Semi-Supervised Visual Recognition – #9
Suichan Li, Bin Liu, Dongdong Chen, Qi Chu, Lu Yuan, Nenghai Yu
Video >

PFCNN: Convolutional Neural Networks on 3D Surfaces Using Parallel Frames – #27
Yuqi Yang, Shilin Liu, Hao PanYang LiuXin Tong
Video >

MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation – #38
Rongchang Xie, Chunyu Wang, Yizhou Wang
Video >


June 14 | Full Day

International Workshop and Challenge on Computer Vision for Physiological Measurement
Co-Organizer: Daniel McDuff

Joint workshop on Long Term Visual Localization, Visual Odometry and Geometric and Learning-based SLAM
Co-Organizers: Marc PollefeysJohannes L. Schönberger, Pablo Speciale

The 1st International Workshop on Agriculture-Vision: Challenges & Opportunities for Computer Vision in Agriculture
Invited speakers and panelists: Ranveer Chandra, Sudipta Sinha

VizWiz Grand Challenge: Describing Images from Blind People
Co-Organizers: Ed Cutrell, Meredith Morris
Invited Speaker: Meredith Morris
Video >
Speaker panel video >
Panel discussion video >

Workshop on Fair, Data-Efficient and Trusted Computer Vision
Invited Speaker: Debadeepta Dey

June 14 | Afternoon

Women in Computer Vision (WiCV)
Co-Organizer: Azadeh Mobasher

June 15 | Full Day

3D Scene Understanding for Vision, Graphics, and Robotics
Invited Speaker: Marc Pollefeys

Fourth Workshop on Computer Vision for AR/VR
Invited Speaker: Jamie Shotton
Video >

New Trends in Image Restoration and Enhancement Workshop and Challenges (NTIRE)
Program Committee Members: Stephen Lin, Wenjun Zeng

June 19 | Morning

Image Matching: Local Features and Beyond
Co-Organizer: Johannes L. Schönberger

June 19 | Full Day

16th IEEE Workshop on Perception Beyond the Visible Spectrum
Program Committee Member: Katsu Ikeuchi

Learning From Unlabeled Videos
Co-Organizer: Yale Song

Computer Vision for Microscopy Image Analysis
Chair: Mei Chen
Program Committee Members: Hao JiangGuarav Mittal, Xi Yin

First Workshop on Deep Learning Foundations of Geometric Shape Modeling and Reconstruction
Co-Organizer: Yang Liu

Extreme classification in computer vision
Co-Organizer: Manik Varma

Language & Vision with applications to Video Understanding
Co-Organizer: Licheng Yu

The 3rd Workshop and Prize Challenge: Bridging the Gap between Computational Photography and Visual Recognition (UG2+) in conjunction with IEEE CVPR 2020
Invited Speaker: Xi Yin

Visual Learning with Limited Labels
Accepted Paper: ePillID Dataset: A Low-Shot Fine-Grained Benchmark for Pill Identification Naoto Usuyama, Natalia Larios Delgado, Amanda K. Hall, Jessica Lundin
Video >

Workshop on Multimodal Learning
Invited Speaker: Andrew Fitzgibbon


Monday, June 15

13:15 – 17:00 PDT
Recent Advances in Vision-and-Language Research
Co-organizers: Zhe Gan, Yu Cheng, Luowei Zhou, Linjie Li, Yen-Chun Chen, JJ Liu


Print your own copy of Alchemy with Friends to play at home.

Share your favorite card combinations using #AlchemyFriends on Twitter, Facebook, or Instagram. We now have three versions of the game available for you to play at home!

Animated illustration of how to play #AlchemyFriends


Microsoft Research blog