Abstract image with blue, purple, and orange tiles moving upward

October 27, 2022

Creation for Rich World Workshop

China Standard Time (GMT +8)

Location: Virtual

As our world goes more and more digitalized (especially in the era of metaverse), there is an increasing interest to automatically create content (visual, audio, text, etc) to provide media-rich and immersive experiences.

This workshop invites top researchers and academics in this area to discuss the recent progress, share the thoughts, and envision the future of content creation for rich world. Discussion topics include:

Speech, music, sound, and spatial audio creation
Image, video, scene, and environment creation
Digital human creation

The goal of this workshop is to discuss, communicate, share, and learn with each other on the topics of content creation, which can help better understand this area and come up with future research opportunities.

Speakers

Tadas Baltrusaitis

Principal Scientist
Microsoft Mixed Reality and AI Lab

Jesse Engel

Staff Research Scientist
Google Research

Jiatao Gu

Researcher
Apple

Xiaoguang Han

Assistant Professor
The Chinese University of Hong Kong, Shenzhen

Seungyong Lee

Professor
POSTECH

Yang Liu

Principal Researcher
Microsoft Research Asia

Wei Ping

Principal Research Scientist
NVIDIA

Alexander Richard

Research Scientist
Meta Reality Labs Research

Xu Tan

Principal Research Manager
Microsoft Research Asia

Xin Tong

Partner Research Manager
Microsoft Research Asia

Baoyuan Wang

Cofounder & VP
Xiaobing.ai

Agenda

Time (CST)	Session	Speaker
09:30 AM	Welcome & Overview of Workshop	Xu Tan Principal Research Manager Microsoft Research Asia
	Session 1: Speech/Music/Sound and Spatial Audio Creation
09:45 AM	Explore the Limit of Zero-shot Audio Synthesis with Large-scale GAN Training Abstract: In this talk, I will present some state-of-the-art results for raw audio synthesis. We will compare different family of methods for the universal vocoding task and introduce BigVGAN that can generalize well under various unseen conditions in zero-shot setting.	Wei Ping (opens in new tab) Principal Research Scientist NVIDIA
10:05 AM	Magenta: Empowering Creativity in the Age of Machine Learning Abstract: Since 2016, the Magenta research group has investigated the role of machine learning in empowering the creativity of artists, musicians, and novices alike. In this talk, we’ll examine recent advances by the group the role of interpretable decompositions for music understanding and generation, including state-of-the-art models in music transcription (MT3), Composition (Perceiver AR), synthesis (Spectrogram Diffusion), and user interaction (DDSP-VST, MIDI-DDSP). Finally, we’ll explore how the combination of expressive generative models and intuitive controls can power a new generation of creative tools.	Jesse Engel (opens in new tab) Staff Research Scientist Google Research
10:25 AM	Neural Audio Rendering for Social Telepresence Abstract: These days, physical distance between people is one of the biggest obstacles to maintaining meaningful social relationships with family, friends, and co-workers. Even with today’s technology, remote communication is limited to a two-dimensional audio-visual experience and lacks the availability of a shared, three-dimensional space in which people can interact with each other over the distance. Our mission at Reality Labs Research (RLR) in Pittsburgh is to develop a telepresence system that is indistinguishable from reality, i.e., a system that provides photo- and phono-realistic social interactions in VR. Highly realistic spatial audio rendering is a key ingredient to achieve the desired level of realism. While computer graphics has long moved from traditional rendering to neural rendering, the audio community is just in the early stages of this process. I will discuss the advantages of neural sound rendering and outline the challenges in data collection for those typically data-hungry machine learning approaches. I will further demonstrate that realism and accuracy of neural spatial audio methods exceed traditional signal processing. In the future, these technologies will help build a realistic virtual environment with lifelike avatars that allow for authentic social interactions, connecting people all over the world, anywhere and at any time.	Alexander Richard (opens in new tab) Research Scientist Meta Reality Labs Research
10:45 AM	Panel Discussions	Host: Jesse Engel (opens in new tab) Panelists: Wei Ping (opens in new tab), Alexander Richard (opens in new tab), Xu Tan
11:30 AM	Lunch break
	Session 2: Image/Video/Scene Creation
12:30 PM	Semantic Instance Reconstruction for 3D Scene Understanding Abstract: 3D scene understanding and reconstruction plays very important roles in many application scenarios, like robot perception and also AR/VR etc. Currently, most of existing works treated the 3D scene as a whole stuff for reconstruction. In this talk, I will introduce our recent techniques that conducted reconstruction together with instance understanding, which is termed as “Semantic Instance Reconstruction”. The main content includes three published works: Total3DUnderstanding (CVPR 2020), RfD-Net (CVPR 2021) and InstPIFu (ECCV 2022). They are all aiming to semantic instance reconstruction, but Total3D focuses on reconstruction from single images while RfD-Net takes point cloud as input, InstPIFu targets high-fidelity single-view Reconstruction. Furthermore, I will also introduce our recent work on 3D table-scene dataset.	Xiaoguang Han (opens in new tab) Assistant Professor The Chinese University of Hong Kong, Shenzhen
12:50 PM	Towards High-fidelity 3D Shape and Scene Generation Abstract: Digital 3D contents equipped with high-fidelity shape geometry, scene layout, and visual appearance are the basis for building the digital world and facilitating 3D understanding, interaction, and exploration. The main barriers to generating vibrant 3D content include insufficient number of labeled 3D data, diversity of 3D representations, and lack of editability. In this talk, we will present a set of our 3D generation work that incorporates neural-based 3D representations and generative models to overcome these barriers and create high-fidelity 3D shapes and scenes.	Yang Liu (opens in new tab) Principal Researcher Microsoft Research Asia
1:10 PM	Diffusion Models for Image and Neural Field Generation Abstract: Diffusion probabilistic models have quickly become the de-facto choice for generative modeling of images, text or 3D geometry. In this talk, we will introduce some of our recent works on how to generalize diffusion models for more efficient and controllable image generation, as well as general field data.	Jiatao Gu (opens in new tab) Researcher Apple
1:30 PM	Panel Discussions	Host: Xin Tong Panelists: Xiaoguang Han (opens in new tab), Yang Liu (opens in new tab), Jiatao Gu (opens in new tab)
2:15 PM	Short break
	Session 3: Digital Human Creation
2:25 PM	From Human to AI Being Intelligence: Challenges and Opportunities Abstract: Digital human or AI being (named by Xiaobing.ai) has already shown tremendous potential value for helping digital transformations in various industry domains. It is one of the interesting testbeds for building next generations of artificial intelligence. In this talk, from an industry perspective, we will discuss the fundamental technical challenges and share some learning experiences of developing different AI beings under Xiaoice avatar framework.	Baoyuan Wang (opens in new tab) Cofounder & VP Xiaobing.ai
2:45 PM	MLPs for Reconstruction and Control of Explicit and Detailed 3D Human Models Abstract: MLPs (Multi-Layer Perceptrons) have been widely used for reconstructing implicit 3D representations of human models. In this talk, in contrast, I will focus on utilizing MLPs for reconstructing and controlling explicit polygonal representations of 3D human models. I will first present an MLP-based framework for building a deformable surface model, which takes a latent code and produces a 3D caricature model. The framework captures the variations of 3D caricatures in a compact parameter space and provides a useful data-driven toolkit for handling 3D caricature deformations. I will then present LaplacianFusion, a novel framework that reconstructs a detailed and controllable 3D clothed human model from a point cloud sequence. In the framework, an MLP is used to learn and predict Laplacian coordinates representing the details on the body surface. The talk will be concluded with discussion on the pros and cons of implicit and explicit representations when they are combined with MLPs for reconstruction and control of 3D human models.	Seungyong Lee (opens in new tab) Professor POSTECH
3:05 PM	Synthetic Data is All You Need: Face Analysis Using Synthetic Data Alone Abstract: In this talk I will demonstrate how synthetic data alone can be used to perform face-related computer vision. The community has long enjoyed the benefits of synthesizing training data with graphics, but the domain gap between real and synthetic data has remained a problem, especially for human faces. I will show that it is possible to synthesize data with minimal domain gap, so that models trained on synthetic data generalize to real in-the-wild datasets. I will describe how to combine a procedurally-generated parametric 3D face model with a comprehensive library of hand-crafted assets to render training images with unprecedented realism and diversity. I will show that models trained using synthetic data alone can both match and exceed real data in accuracy as well as open up new approaches where manual labelling would be impossible.	Tadas Baltrusaitis Principal Scientist Microsoft Mixed Reality and AI lab
3:25 PM	Panel Discussions	Host: Xu Tan Panelists: Baoyuan Wang (opens in new tab), Seungyong Lee (opens in new tab), Tadas Baltrusaitis, Xin Tong
4:10 PM	Next Steps/Closing Remarks

Workshop organizers

Xu Tan, Microsoft Research Asia
Xin Tong, Microsoft Research Asia
Miran Lee (opens in new tab), Microsoft Research Asia

Microsoft’s Event Code of Conduct

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. This includes events Microsoft hosts and participates in, where we seek to create a respectful, friendly, and inclusive experience for all participants. As such, we do not tolerate harassing or disrespectful behavior, messages, images, or interactions by any event participant, in any form, at any aspect of the program including business and social activities, regardless of location.

We do not tolerate any behavior that is degrading to any gender, race, sexual orientation or disability, or any behavior that would violate Microsoft’s Anti-Harassment and Anti-Discrimination Policy, Equal Employment Opportunity Policy, or Standards of Business Conduct. In short, the entire experience at the venue must meet our culture standards. We encourage everyone to assist in creating a welcoming and safe environment. Please report (opens in new tab) any concerns, harassing behavior, or suspicious or disruptive activity to venue staff, the event host or owner, or event staff. Microsoft reserves the right to refuse admittance to or remove any person from company-sponsored events at any time in its sole discretion.

Report a concern