MSRA Academic Day 2019

MSRA Academic Day 2019

About

The Academic Day 2019 event brings together the intellectual power of researchers from across Microsoft Research Asia and the academic community to attain a shared understanding of the contemporary ideas and issues facing the field of tech. Together, we will advance the frontier of technology towards an ideal world of computing.

Through our Microsoft Research Outreach Programs, Microsoft Research Asia has been actively collaborating with academic institutions to promote and progress further development in computer science and other technology domains. We have an ever-expanding partnership with leading universities across the Asia Pacific region to advance state-of-the-art research through various programs and initiatives.

We are excited for “Microsoft Research Asia Academic Day 2019” to facilitate comprehensive and insightful exchanges between Microsoft Research Asia and the academic community.

Program Chairs

  • Miran Lee

    Outreach Director

  • Portrait of Susan Dumais

    Yongqiang Xiong

    Principal Research Manager

  • Yunxin Liu

    Principal Research Manager

  • Tao Qin

    Senior Principal Research Manager

  • Wenjun Zeng

    Senior Principal Research Manager

Agenda

November 7

Workshop on System and Networking for AI

Abstract: We live in a world of connected entities including various systems (ranging from big cloud and edge systems to individual memory and disk systems) networked together. Innovations in systems and networking are key driving forces in the era of big data and artificial intelligence, to empower advanced intelligent algorithms with reliable, secure, scalable and efficient computing capacity to process huge volumes of data. We have witnessed the significant progress in cloud systems, and recently, edge computing, in particular AI on Edge, has attracted increasing attention from both academia and industry. This workshop aims to report and discuss the most recent progress and trends on general system and networking area, especially on various infrastructure support for machine learning systems.

Event owners: Yunxin Liu, Yongqiang Xiong

Time (CST) Workshops Speaker Location
2:00 PM–2:10 PM Welcome and introductions Yunxin Liu & Yongqiang Xiong, Microsoft Research Dong Zhi Men, Microsoft Tower 1-1F
2:10 PM–3:25 PM Research at Microsoft (25 mins per talk x3)
  • Peng Cheng, Microsoft Research
  • Ting Cao, Microsoft Research
  • Quanlu Zhang, Microsoft Research
3:25 PM–4:40 PM Research talks (25 mins per talk x3)
  • Chuan Wu, University of Hong Kong
  • Xuanzhe Liu, Peking University
  • Rajesh Krishna Balan, Singapore Management University
4:40 PM–5:20 PM Panel with discussion

Title: “What’s missing in system & networking for AI?”

  • Yunxin Liu, Microsoft Research (Moderator)
  • Yongqiang Xiong, Microsoft Research (Moderator)
  • Chuan Wu, University of Hong Kong
  • Xuanzhe Liu, Peking University
  • Rajesh Krishna Balan, Singapore Management University
  • Peng Cheng, Microsoft Research
  • Ting Cao, Microsoft Research
  • Quanlu Zhang, Microsoft Research
5:20 PM–5:30 PM Wrap-up and closing

Workshop on Low-Resource Machine Learning

Abstract: Deep learning has greatly driven this wave of AI. While deep learning has made many breakthroughs in recent years, its success heavily relies on big labeled data, big model, and big computing. As edge computing becomes the trend and more and more IoT devices become available, deep learning faces the low-resource challenge: how to learn from limited labeled data, with limited model size, and limited computation resources. The theme of this workshop is low-resource machine learning: learning from low-resource data, learning compact models, and learning with limited computational resources. This workshop aims to report latest progress and discuss the trends and frontiers of research on low-resource machine learning.

Event owner: Tao Qin

Time (CST) Workshops Speaker Location
2:00 PM–2:10 PM Welcome and introductions Tao Qin, Microsoft Research Xi Zhi Men, Microsoft Tower 1-1F
2:10 PM–3:25 PM Research at Microsoft (25 mins per talk x3)
  • Yingce Xia, Microsoft Research
  • Xu Tan, Microsoft Research
  • Guolin Ke, Microsoft Research
3:25 PM–4:40 PM Research talks (25 mins per talk x3)
  • Jaegul Choo, Korea University
  • Sinno Jialin Pan, Nanyang Technological University
  • Sung Ju Hwang, KAIST
4:40 PM–5:20 PM Panel with discussion

Title: “Challenges and Future of Low-Resource Machine Leaning”

  • Tao Qin, Microsoft Research (Moderator)
  • Jaegul Choo, Korea University
  • Sung Ju Hwang, KAIST
  • Shujie Liu, Microsoft Research
  • Dongdong Zhang, Microsoft Research
5:20 PM–5:30 PM Wrap-up and closing

Workshop on Multimodal Representation Learning and Applications

Abstract: We live in a world of multimedia (text, image, video, audio, sensor data, 3D, etc.). These modalities are integral components of real-world events and applications. A full understanding of multimedia relies heavily on feature learning, entity recognition, knowledge, reasoning, language representation, etc. Cross-modal learning, which requires joint feature learning and cross-modal relationship modeling, has attracted increasing attention from both academia and industry. This workshop aims to report and discuss the most recent progress and trends on multimodal representation learning for multimedia applications.

Event owners: Wenjun Zeng, Nan Duan

Time (CST) Workshops Speaker Location
2:00 PM–2:10 PM Welcome and introductions Wenjun Zeng, Microsoft Research Tian An Men, Microsoft Tower 1-1F
2:10 PM–3:25 PM Research at Microsoft (25 mins per talk x3)
  • Nan Duan, Microsoft Research
  • Yue Cao, Microsoft Research
  • Chong Luo, Microsoft Research
3:25 PM–4:40 PM Research talks (25 mins per talk x3)
  • Gunhee Kim, Seoul National University
  • Winston Hsu, National Taiwan University
  • Jiwen Lu, Tsinghua University
4:40 PM–5:20 PM Panel with discussion

Title: Opportunities and Challenges for Cross-Modal Learning

  • Wenjun Zeng, Microsoft Research (Moderator)
  • Xilin Chen, Chinese Academy of Science
  • Winston Hsu, National Taiwan University
  • Gunhee Kim, Seoul National University
  • Nan Duan, Microsoft Research
5:20 PM–5:30 PM Wrap-up and closing

November 8

Time (CST) Workshops Speaker Location
09:00 – 09:30 Welcome & MSRA Overview Hsiao-Wuen Hon Gu Gong, Microsoft Tower 1-1F
09:30 – 09:40 Fellowship Award Ceremony Presenter: Hsiao-Wuen Hon
09:40 – 10:00 Photo session & Break
10:00 – 10:40 Panel Discussion

Title: “How to foster a computer scientist”

Moderator: Tim Pan, Microsoft Research

Panelists:

  • Bohyung Han, Seoul National University
  • Junichi Rekimoto, The University of Tokyo
  • Winston Hsu, National Taiwan University
  • Xin Tong, Microsoft Research
10:40 – 11:55 Technology Showcase by Microsoft Research Asia (5)
  • “OneOCR For Digital Transformation” by Qiang Huo
  • “NN grammar check” by Tao Ge
  • “AutoSys: Learning based approach for system optimization” by Mao Yang
  • “Dual learning and its application in translation and speech from ML” by Tao Qin(Yingce Xia and Xu Tan)
  • “Spreadsheet Intelligence for Ideas in Excel” by Shi Han
12:00 – 14:00 Technology Showcase by Academic Collaborators Lunch, Microsoft Tower1-1F
14:00 – 17:30 Breakout Sessions
Language and Knowledge Leader: Xing Xie

Speakers: Seung-won Hwang, Min Zhang, Lei Chen, Masatoshi Yoshikawa, Shou-De Lin, Rui Yan, Hiroaki Yamane, Chenhui Chu, Tadashi Nomoto

Zhong Guan Cun, Microsoft Tower 2-4F
System and Networking Leaders: Lidong Zhou, Yunxin Liu

Speakers: Insik Shin, Wenfei Wu, Rajesh Krishna Balan, Youyou Lu, Chuck Yoo, Yu Zhang, Atsuko Miyaji, Jingwen Leng, Yao Guo, Heejo Lee, Cheng Li

San Li Tun, Microsoft Tower 2-4F
Computer Vision Leader: Wenjun Zeng

Speakers: Gunhee Kim, Tianzhu Zhang, Yonggang Wen, Wen-Huang Cheng, Jiaying Liu, Bohyung Han, Wei-Shi Zheng, Jun Takamatsu, Xueming Qian

Qian Men, Microsoft Tower 2-4F
Graphics Leader: Xin Tong

Speakers: Min H. Kim, Seungyong Lee, Sung-eui Yoon

Di Tan, Microsoft Tower 2-4F
Multimedia Leader: Yan Lu

Speakers: Seung Ah Lee, Huanjing Yue, Hiroki Watanabe, Minsu Cho, Zhou Zhao, Seungmoon Choi

Gu Lou, Microsoft Tower 2-4F
Healthcare Leader: Eric Chang

Speakers: Ryo Furukawa, Winston Hsu

Dong Cheng, Microsoft Tower 2-4F
Data, Knowledge, and Intelligence Leaders: Jian-Guang Lou, Qingwei Lin

Speakers: Shixia Liu, Huamin Qu, Jong Kim, Yingcai Wu

Xi Cheng, Microsoft Tower 2-4F
Machine Learning Leader: Tao Qin

Speakers: Hongzhi Wang, Seong-Whan Lee, Sinno Jialin Pan, Lijun Zhang, Jaegul Choo, Mingkui Tan, Liwei Wang

Ri Tan, Microsoft Tower 2-4F
Speech Leader: Frank Soong

Speakers: Jun Du, Hong-Goo Kang

Guo Zi Jian, Microsoft Tower 2-4F
17:30-18:00 Transition Break
18:15 – 20:30 Banquet Ballroom located @ 3F, Tylfull Hotel

Abstracts

Workshops

AI Platform Acceleration with Programmable Hardware

Speaker: Peng Cheng, Microsoft Research

Programmable hardware has been used to build high throughput, low latency real-time core AI engine such as BrainWave. Instead of AI engine, we focus on solving AI-platform-related bottlenecks, for instance in this case, storage and networking I/O, model distribution, synchronization and data pre-processing in machine learning tasks, with acceleration from programmable hardware. Our proposed system enables direct hardware-assisted device-to-device interconnection with inline processing. We choose FPGA as our first prototype to build a general platform for AI acceleration since FPGA has been widely deployed in Azure to achieve high performance with much lower economy cost. Our system can accelerate AI in many aspects. It now enables GPUs directly fetch training data from storage to GPU memory to bypass costly CPU involvement. As an intelligent hub, it can also do inline data pre-processing efficiently. More accelerating scenarios are under development including in-network inference acceleration and hardware parameter server for distributed machine learning, etc.

Audio captioning and knowledge-grounded conversation

Speaker: Gunhee Kim, Seoul National University

In this talk, I will introduce two recent works about NLP from Vision and Learning Lab of Seoul National University. First, we present our work that explores the problem of audio captioning: generating natural language description for any kind of audio in the wild, which has been surprisingly unexplored in previous research. We not only contribute a large-scale dataset of about 46K audio clips to human-written text pairs collected via crowdsourcing but also propose two novel components that help improve audio captioning performance of attention-based neural models. Second, I discuss about our work on knowledge-grounded dialogues, in which we address the problem of better modeling the knowledge selection in the multi-turn knowledge-grounded dialogue. We propose a sequential latent variable model as the first approach to this matter. Our experimental results show that the proposed model improves the knowledge selection accuracy and subsequently the performance of utterance generation.

Building Large-Scale Decentralized Intelligent Software Systems

Speaker: Xuanzhe Liu, Peking University

We are in the fast-growing flood of “data” and we significantly benefit from the “intelligence” derived from it. Such intelligence heavily relies on the centralized paradigm, i.e., the cloud-based systems and services. However, we realize that we are also at the dawn of emerging “decentralized” fashion to make intelligence more pervasive and even “handy” over smartphones, wearables, IoT devices, along with the collaborations among them and the cloud. This talk tries to discuss some technical challenges and opportunities of building the decentralized intelligence, mostly from a software system perspective, covering aspects of programming abstraction, performance, privacy, energy, and interoperability. We also share our recent efforts on building such software systems and industrial experiences.

Coloring with Limited Data: Few-Shot Colorization via Memory-Augmented Networks

Speaker: Jaegul Choo, Korea University

Despite recent advancements in deep learning-based automatic colorization, they are still limited when it comes to few-shot learning. Existing models require a significant amount of training data. To tackle this issue, we present a novel memory-augmented colorization model that can produce high-quality colorization with limited data. In particular, our model can capture rare instances and successfully colorize them. We also propose a novel threshold triplet loss that enables unsupervised training of memory networks without the need of class labels. Experiments show that our model has superior quality in both few-shot and one-shot colorization tasks.

FastSpeech: Fast, Robust and Controllable Text to Speech

Speaker: Xu Tan, Microsoft Research

Neural network based end-to-end text to speech (TTS) has significantly improved the quality of synthesized speech. However, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i.e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control). In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS. Experiments show that our parallel model matches autoregressive models in terms of speech quality, nearly eliminates the problem of word skipping and repeating in particularly hard cases, and can adjust voice speed smoothly. Most importantly, compared with autoregressive Transformer TTS, our model speeds up the mel-spectrogram generation by 270x and the end-to-end speech synthesis by 38x.

Improving the Performance of Video Analytics Using WiFi Signals

Speaker: Rajesh Krishna Balan, Singapore Management University

Automatic analysis of the behaviour of large groups of people is an important requirement for a large class of important applications such as crowd management, traffic control, and surveillance. For example, attributes such as the number of people, how they are distributed, which groups they belong to, and what trajectories they are taking can be used to optimize the layout of a mall to increase overall revenue. A common way to obtain these attributes is to use video camera feeds coupled with advanced video analytics solutions. However, solely utilizing video feeds is challenging in high people-density areas, such as a normal mall in Asia, as the high people density significantly reduces the effectiveness of video analytics due to factors such as occlusion. In this work, we propose to combine video feeds with WiFi data to achieve better classification results of the number of people in the area and the trajectories of those people. In particular, we believe that our approach will combine the strengths. of the two different sensors, WiFi and video, while reducing the weaknesses of each sensor. This work has started fairly recently and we will present our thoughts and current results up to now.

Learning Beyond 2D Images

Speaker: Winston Hsu, National Taiwan University

We observed super-human capabilities from current (2D) convolutional networks for the images — either for discriminative or generative models. For this talk, we will show our recent attempts in visual cognitive computing beyond 2D images. We will first demonstrate the huge opportunities as augmenting the leaning with temporal cues, 3D (point cloud) data, raw data, audio, etc. over emerging domains such as entertainment, security, healthcare, manufacturing, etc. In an explainable manner, we will justify how to design neural networks leveraging the novel (and diverse) modalities. We will demystify the pros and cons for these novel signals. We will showcase a few tangible applications ranging from video QA, robotic object referring, situation understanding, autonomous driving, etc. We will also review the lessons we learned as designing the advanced neural networks which accommodate the multimodal signals in an end-to-end manner.

LightGBM: A highly efficient gradient boosting machine

Speaker: Guolin Ke, Microsoft Research

Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithm and widely-used in the real-world applications. We open-sourced LightGBM, which contains many critical optimizations for the efficient training of GBDT and becomes one of the most popular GBDT tools. During this talk, I will introduce the key technologies behind LightGBM.

MobiDL: Unleash the Mobile CPU Computing Power for Deep Learning Inference

Speaker: Ting Cao, Microsoft Research

Deep learning (DL) models are increasingly deployed into real-world applications on mobile devices. However, current mobile DL frameworks neglect the CPU asymmetry, and the CPUs are seriously underutilized. We propose MobiDL for mobile DL inference, targeting improved CPU utilization and energy efficiency through novel designs for hardware asymmetry and appropriate frequency setting. It integrates four main techniques: 1) cost-model directed matrix block partition; 2) prearranged memory layout for model parameters; 3) asymmetry-aware task scheduling; and 4) data-reuse based CPU frequency setting. During the one-time initialization, the proper block partition, parameter layout, and efficient frequency for DL models can be configured by MobiDL. During inference, MobiDL scheduling balances tasks to fully utilize all the CPU cores. Evaluation shows that for CNN models, MobiDL can achieve 85% performance and 72% energy efficiency improvement on average compared to default TensorFlow. For RNN, it achieves up-to 17.51X performance and 8.26X energy efficiency improvement.

Multi-agent dual learning

Speaker: Yingce Xia, Microsoft Research

Dual learning is our recently proposed framework, where a primal task (e.g. Chinese-to-English translation) and a dual task (e.g., English-to-Chinese translation) are jointly optimized through a feedback signal. We extend standard dual learning to multi-agent dual learning, where multiple models for the primal task and multiple models for the dual task are evolved. In the case, the feedback signal is enhanced and we can get better performances. Experimental results on low-resource settings show that our method works pretty well. On WMT’19 machine translation competition, we won four top places using multi-agent dual learning.

Multi-view Deep Learning for Visual Content Understanding

Speaker: Jiwen Lu, Tsinghua University

In this talk, I will overview the trend of multi-view deep learning techniques and discuss how they are used to improve the performance of various visual content understanding tasks. Specifically, I will present three multi-view deep learning approaches: multi-view deep metric learning, multi-modal deep representation learning, and multi-agent deep reinforcement learning, and show how these methods are used for visual content understanding tasks. Lastly, I will discuss some open problems in multi-view deep learning to show how to further develop more advanced multi-view deep learning methods for computer vision in the future.

NNI: An open source toolkit for neural architecture search and hyper-parameter tuning

Speaker: Quanlu Zhang, Microsoft Research

Recent years have witnessed the great success of deep learning in a broad range of applications. Model tuning becomes a key step for finding good models. To be effective in practice, a system is demanded to facilitate this tuning procedure from both programming effort and searching efficiency. Thus, we open source NNI (Neural Network Intelligence), a toolkit for neural architecture search and hyper-parameter tuning, which provides easy-to-use interface, rich built-in AutoML algorithms. Moreover, it is highly extensible to support various new tuning algorithms and requirements. With high scalability, many trials could run in parallel on various training platforms.

Pre-training for Video-Language Cross-Modal Tasks

Speaker: Chong Luo, Microsoft Research

Video-language cross-modal tasks are receiving increasing interests in recent years, from video retrieval, video captioning, to spatial-temporal localization in video by language query. In this talk, we will present the research and application of some of these tasks. We will show how pre-trained single-modality models have made these tasks tractable and discuss the paradigm shift in deep neural network design with pre-trained models. In addition, we propose a universal cross-modality pre-training framework which may benefit a wide range of video-language tasks. We hope that our work will provide inspiration to other researchers in solving these interesting but challenging cross-modal tasks.

Resource Scheduling for Distributed Deep Training

Speaker: Chuan Wu, University of Hong Kong

More and more companies/institutions are running AI clouds/machine learning clusters with various ML model training workloads, to support various AI-driven services. Efficient resource scheduling is the key to maximize the performance of ML workloads, as well as hardware efficiency of the very expensive ML cluster. A large room exists in improving today’s ML cluster schedulers, e.g., to include interference awareness in task placement and to schedule not only computation but also communication, etc. In this talk, I will share our recent work on designing deep learning job schedulers for ML clusters, aiming at expediting training speeds and minimizing training completion time. Our schedulers decide communication scheduling, the number of workers/PSs, and the placement of workers/PSs for jobs in the cluster, through both heuristics with theoretical support and reinforcement learning approaches.

Transferable Recursive Neural Networks for Fine-grained Sentiment Analysis

Speaker: Sinno Jialin Pan, Nanyang Technological University

In fine-grained sentiment analysis, extracting aspect terms and opinion terms from user-generated texts is the most fundamental task in order to generate structured opinion summarization. Existing studies have shown that the syntactic relations between aspect and opinion words play an important role for aspect and opinion terms extraction. However, most of the works either relied on pre-defined rules or separated relation mining with feature learning. Moreover, these works only focused on single-domain extraction which failed to adapt well to other domains of interest, where only unlabeled data is available. In real-world scenarios, annotated resources are extremely scarce for many domains or languages. In this talk, I am going to introduce our recent series of works on transfer learning for cross-domain and cross-language fine-grained sentiment analysis based on recursive neural networks.

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

Speaker: Yue Cao, Microsoft Research

We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short). VL-BERT adopts the simple yet powerful Transformer model as the backbone and extends it to take both visual and linguistic embedded features as input. In it, each element of the input is either of a word from the input sentence, or a region-of-interest (RoI) from the input image. It is designed to fit for most of the visual-linguistic downstream tasks. To better exploit the generic representation, we pre-train VL-BERT on the massive-scale Conceptual Captions dataset, together with text-only corpus. Extensive empirical analysis demonstrates that the pre-training procedure can better align the visual-linguistic clues and benefit the downstream tasks, such as visual commonsense reasoning, visual question answering and referring expression comprehension.

When Language Meets Vision: Multi-modal NLP with Visual Contents

Speaker: Nan Duan, Microsoft Research

In this talk, I will introduce our latest work on multi-modal NLP, including (i) multi-modal pre-training, which aims to learn the joint representations between language and visual contents; (ii) multi-modal reasoning, which aims to handle complex queries by manipulating knowledge extracted from language and visual contents; (iii) video-based QA/summarization, which aims to make video contents readable and searchable.

Breakout Sessions

Adaptive Regret for Online Learning

Speaker: Lijun Zhang, Nanjing University

To deal with changing environments, a new performance measure—adaptive regret, defined as the maximum static regret over any interval, is proposed in online learning. Under the setting of online convex optimization, several algorithms have been developed to minimize the adaptive regret. However, existing algorithms are problem-independent and lack universality. In this talk, I will briefly introduce our two contributions in this direction. The first one is to establish problem-dependent bounds of adaptive regret by exploiting the smoothness condition. The second one is to design an universal algorithm that can handle multiple types of functions simultaneously.

Advances and Challenges on Human-Computer Conversational Systems

Speaker: Rui Yan, Peking University

Nowadays, automatic human-computer conversational systems have attracted great attention from both industry and academia. Intelligent products such as XiaoIce (by Microsoft) have been released, while tons of Artificial Intelligence companies have been established. We see that the technology behind the conversational systems is accumulating and now open to the public gradually. With the investigation of researchers, conversational systems are more than scientific fictions: they become real. It is interesting to review the recent advances of human-computer conversational systems, especially the significant changes brought by deep learning techniques. It would also be exciting to anticipate the development and challenges in the future.

AI and Data: a closed Loop

Speaker: Hongzhi Wang, Harbin Institute of Technology

Data is the base of modern Artificial Intelligence (AI). Efficient and effective AI requires the support of data acquirement, governance, management, analytics and mining, which brings new challenges. From another aspect, the advances of AI provide new chances for data process to increase its automation. Thus, AI and data forms a closed loop and promote each other. In this talk, the speaker will demonstrate the mutual promotion of AI and data with some examples and discuss the further chance of promote bother of these areas.

Artificial Intelligence for Fashion

Speaker: Wen-Huang Cheng, National Chiao Tung University

The fashion industry is one of the biggest in the world, representing over 2 percent of global GDP (2018). Artificial intelligence (AI) has been a predominant theme in the fashion industry and is impacting its every part in scales from personal to industrial and beyond. In recent years, I and my research group have devoted to advanced AI research on helping revolutionize the fashion industry to enable innovative applications and services with improved user experience. In this talk, I would like to give an overview of the major outcomes of our researches and discuss what research subjects we can further work on together with Microsoft researchers to make new impact on the fashion domains.

BERT is not all you need

Speaker: Seung-won Hwang, Yonsei University

This talk is inspired by a question to my talk at MSRA faculty summit last year: presenting NLP models where injecting (diverse forms of) knowledge contributes to meaningfully enhancing the accuracy and robustness. Then Chin-yew asked: “Do you think BERT implicitly contains all these information already?” This talk an extended investigation to support my short answer at the talk. The title is a spoiler.

Big Data, AI and HI, What is Next?

Speaker: Lei Chen, Hong Kong University of Science and Technology

Recently, AI has become quite popular and attractive, not only to academia but also to the industry. The successful stories of AI on various applications raise significant public interests in AI. Meanwhile, human intelligence is turning out to be more sophisticated, and Big Data technology is everywhere to improve our life quality. The question that we all want to ask is “what is the next?”. In this talk, I will discuss about DHA, a new computing paradigm, which combines big Data, Human intelligence, and AI (DHA). Specifically, I will first briefly explain the motivation of the DHA. Then I will present challenges, after that, I will highlight some possible solutions to build such a new paradigm.

Combinatorial Inference against Label Noise

Speaker: Bohyung Han, Seoul National University

Label noise is one of the critical sources that degrade generalization performance of deep neural networks significantly. To handle the label noise issue in a principled way, we propose a unique classification framework of constructing multiple models in heterogeneous coarse-grained meta-class spaces and making joint inference of the trained models for the final predictions in the original (base) class space. Our approach reduces noise level by simply constructing meta-classes and improves accuracy via combinatorial inferences over multiple constituent classifiers. Since the proposed framework has distinct and complementary properties for the given problem, we can even incorporate additional off-the-shelf learning algorithms to improve accuracy further. We also introduce techniques to organize multiple heterogeneous meta-class sets using k-means clustering and identify a desirable subset leading to learn compact models. Our extensive experiments demonstrate outstanding performance in terms of accuracy and efficiency compared to the state- of-the-art methods under various synthetic noise configurations and in a real-world noisy dataset.

Communication-Efficient Geo-Distributed Multi-Task Learning

Speaker: Sinno Jialin Pan, Nanyang Technological University

Multi-task learning aims to learn multiple tasks jointly by exploiting their relatedness to improve the generalization performance for each task. Traditionally, to perform multi-task learning, one needs to centralize data from all the tasks to a single machine. However, in many real-world applications, data of different tasks is owned by different organizations and geo-distributed over different local machines. Due to heavy communication caused by transmitting the data and the issue of data privacy and security, it is impossible to send data of different task to a master machine to perform multi-task learning. In this paper, we present our recent work on distributed multi-task learning, which jointly learns multiple tasks in the parameter server paradigm without sharing any training data, and has a theoretical guarantee on convergence to the solution obtained by the corresponding centralized multi-task learning algorithm.

Compact Snapshot Hyperspectral Imaging with Diffracted Rotation

Speaker: Min H. Kim, KAIST

Traditional snapshot hyperspectral imaging systems include various optical elements: a dispersive optical element (prism), a coded aperture, several relay lenses, and an imaging lens, resulting in an impractically large form factor. We seek an alternative, minimal form factor of snapshot spectral imaging based on recent advances in diffractive optical technology. We there- upon present a compact, diffraction-based snapshot hyperspectral imaging method, using only a novel diffractive optical element (DOE) in front of a conventional, bare image sensor. Our diffractive imaging method replaces the common optical elements in hyperspectral imaging with a single optical element. To this end, we tackle two main challenges: First, the traditional diffractive lenses are not suitable for color imaging under incoherent illumination due to severe chromatic aberration because the size of the point spread function (PSF) changes depending on the wavelength. By leveraging this wavelength-dependent property alternatively for hyperspectral imaging, we introduce a novel DOE design that generates an anisotropic shape of the spectrally-varying PSF. The PSF size remains virtually unchanged, but instead the PSF shape rotates as the wavelength of light changes. Second, since there is no dispersive element and no coded aperture mask, the ill-posedness of spectral reconstruction increases significantly. Thus, we pro- pose an end-to-end network solution based on the unrolled architecture of an optimization procedure with a spatial-spectral prior, specifically designed for deconvolution-based spectral reconstruction. Finally, we demonstrate hyperspectral imaging with a fabricated DOE attached to a conventional DSLR sensor. Results show that our method compares well with other state- of-the-art hyperspectral imaging methods in terms of spectral accuracy and spatial resolution, while our compact, diffraction-based spectral imaging method uses only a single optical element on a bare image sensor.

ContextDM: Context-aware Permanent Data Management Framework for Android

Speaker: Jong Kim, Pohang University of Science and Technology (POSTECH)

The data management practices by third-party apps have failed in terms of manageability and security because the modern systems cannot provide a fine-grained data management and security due to lack of understanding about stored data. As results, users suffer from storage shortage, data stealing, and data tampering.

To tackle the problem, we propose a novel and general data management framework, ContextDM, that sheds light on the storage to help system services and aid-apps for storage to have a better understanding on permanent data. In specific, the framework provides permanent data with metadata that includes contextual semantic information in terms of importance and sensitivity of data. Further, we show the effectiveness of our framework by demonstrating ContextDM based aid-tools that automatically identifying important and useless data as well as sensitive data that is disclosed.

Controlling Deep Natural Language Generation Models

Speaker: Shou-De Lin, National Taiwan University

Deep Neural Network based solutions have shown promising results in natural language generation recently. From Autoencoder to the Seq2Seq models to the GAN-based solutions, deep learning models can already generate text that pass Turing Test, making the outputs non-distinguishable to human generated ones. However, researchers have pointed out that the content generated from deep neural networks can be fairly unpredictable, meaning that it is non-trivial for human to control the outputs to be generated. This talk will be discussing how to control the outputs of an NLG model and demonstrating some of our recent works along this line.

Cross-lingual Visual Grounding and Multimodal Machine Translation

Speaker: Chenhui Chu, Osaka University

In this talk, we will introduce two of our recent work on multilingual and multimodal processing: cross-lingual visual grounding and multimodal machine translation. Visual grounding is a vision and language understanding task aiming at locating a region in an image according to a specific query phrase. We will present our work on cross-lingual visual grounding to expand the task to different languages. In addition, we will introduce our work on multimodal machine translation that incorporate semantic image regions with both visual and textural attention.

Cryptographi-based security solutions for internet of things

Speaker: Atsuko Miyaji, Osaka University

The consequences of security failures in the era of internet of things (IoT) can be catastrophic, as have been demonstrated by a rapidly growing list of IoT security incidents. As a result, people have begun to recognize the importance and value of bringing the highest level of security to IoT. Tradition wisdom has it that, though technologically superior, public-key cryptography (PKC) is too expensive to deploy in IoT devices and networks. In this talk, we present our cost-effective improvement of elliptic curve cryptography (ECC) in terms of memory and computational resource.

Deep Efficient Image (Video) Restoration

Speaker: Huanjing Yue, Tianjin University

In this talk, I will introduce our team’s work on image (video) denoising and demoiréing.

Realistic noise, which is introduced when capturing images under high ISO modes or low light conditions, is more complex than Gaussian noise, and therefore is difficult to be removed. By exploring the spatial, channel, and temporal correlations via deep CNNs, we can efficiently remove noise for images and videos. We construct two datasets to facilitate research on realistic noise removal for images and videos.

Moiré patterns, caused by aliasing between the grid of the display device and the array of camera sensor, greatly degrade the visual quality of recaptured screen images. Considering that the recaptured screen image and the original screen content usually have a large difference in brightness, we construct a moiré removal and brightness improvement (MRBI) database with moiré-free and moiré image pairs to facilitate the supervised learning and quantitative evaluation. Correspondingly, we propose a CNN based moiré removal and brightness improvement method. Our work provides a benchmark dataset and a good baseline method for the demoiréing task.

Deep Reinforcement Learning for the Transfer from Simulation to the Real World with Uncertainties for AI Curling Robot System

Speaker: Seong-Whan Lee, Korea University

Recently, deep reinforcement learning (DRL) has even enabled real world applications such as robotics. Here we teach a robot to succeed in curling (Olympic discipline), which is a highly complex real-world application where a robot needs to carefully learn to play the game on the slippery ice sheet in order to compete well against human opponents. This scenario encompasses fundamental challenges: uncertainty, non-stationarity, infinite state spaces and most importantly scarce data. One fundamental objective of this study is thus to better understand and model the transfer from simulation to real-world scenarios with uncertainty. We demonstrate our proposed framework and show videos, experiments and statistics about Curly our AI curling robot being tested on a real curling ice sheet. Curly performed well both, in classical game situations and when interacting with human opponents.

Development of a 3D endoscopic system with abilities of multi-frame, wide-area scanning

Speaker: Ryo Furukawa, Hiroshima City University

For effective in situ endoscopic diagnosis and treatment, or robotic surgery, 3D endoscopic systems have been attracting many researchers. We have been developing a 3D endoscopic system based on an active stereo technique, which projects a special pattern wherein each feature is coded. We believe it is a promising approach because of simplicity and high precision. However, previous works of this approach have problems. First, the quality of 3D reconstruction depended on stabilities of feature extraction from the images captured by the endoscope camera. Second, due to the limited pattern projection area, the reconstructed region was relatively small. In this talk, we describe our works of a learning-based technique using CNNs to solve the first problem and an extended bundle adjustment technique, which integrates multiple shapes into a consistent single shape, to address the second. The effectiveness of the proposed techniques compared to previous techniques was evaluated experimentally.

Differential Privacy for Spatial and Temporal Data

Speaker: Masatoshi Yoshikawa, Kyoto University

Differential Privacy (DP) has received increased attention as a rigorous privacy framework. In this talk, we introduce our recent studies on extension of DP to spatial temporal data. The topics include i) DP mechanism under temporal correlations in the context of continuous data release; and ii) location privacy for location-based service over road networks.

Dissecting and Accelerating Neural Network via Graph Instrumentation

Speaker: Jingwen Leng, Shanghai Jiao Tong University

Despite the enormous success of deep neural network, there is still no solid understanding of deep neural network’s working mechanism. As such, one fundamental question arises – how should architects and system developers perform optimizations centering DNNs? Treating them as black box leads to efficiency and security issues: 1) DNN models require fixed computation budge regardless of input; 2) a human-imperceivable perturbation to the input causes a DNN misclassification. This talk will present our efforts toward addressing those challenges. We recognize an increasing need of monitoring and modifying the DNN’s runtime behavior, as evident by our recent work effective path, and other researchers’ work of network pruning and quantization. As such, we present our on-going effort of building a graph instrumentation framework that provides programmers with the great convenience of achieving those abilities.

Dynamic GPU Memory Management for DNNs

Speaker: Yu Zhang, University of Science & Technology of China

While deep learning researchers are seeking deeper and wider nonlinear networks, there is an increasing challenge for deploying deep neural network applications on low-end GPU devices for mobile and edge computing due to the limited size of GPU DRAM. The existing deep learning frameworks lack effective GPU memory management for different reasons. It is hard to apply effective GPU memory management on dynamic computation graphs which cannot get global computation graph (e.g. PyTorch), or can only impose limited dynamic GPU memory management strategies for static computation graphs (e.g. Tensorflow). In this talk, I will analyze the state of the art GPU memory management in the existing DL frameworks, present challenges on GPU memory management faced by running deep neural networks on low-end resource-constrained devices and finally give our thinking.

Emotional Speech Synthesis with Granularized Control

Speaker: Hong-Goo Kang, Yonsei University

Tangible interaction allows a user to interact with a computer using ordinary physical objects. It substantially expands the interaction space owing to the natural affordance and metaphors provided by real objects. However, tangible interaction requires to identify the object held by the user or how the user is touching the object. In this talk, I will introduce two sensing techniques for tangible interaction, which exploits active sensing using mechanical vibration. A vIn end-to-end deep learning-based emotional text-to-speech (TTS) systems such as the ones using Tacotron networks, it is very important to provide additional embedding vectors to flexibly control the distinct characteristic of target emotion.

This talk introduces a couple of methods to effectively estimate representative embedding vectors. Using the mean of embedding vectors is a simple approach, but the expressiveness of synthesized speech is not satisfactory. To enhance the expressiveness, we needs to consider the distribution of emotion embedding vectors. An inter-to-intra (I2I) distance ratio-based algorithm recently proposed by our research team shows much higher performance than the conventional mean-based one. The I2I algorithm is also useful for gradually changing the intensity of expressiveness. Listening test results verify that the emotional expressiveness and control-ability of the I2I algorithm is superior to those of the mean-based one. ibration is transmitted from an exciter worn in the user’s hand or fingers, and the transmitted vibration is measured using a sensor. By comparing the input-output pair, we can recognize the object held between two fingers or the fingers touching the object. The mechanical vibrations also provide pleasant confirmation feedback to the user. Details will be shared in the talk.

Fairness in Recommender Systems

Speaker: Min Zhang, Tsinghua University

Recommender systems have played significant roles in our daily life, and are expected to be available to any user, regardless of their gender, age or other demographic factors. Recently, there has been a growing concern about the bias that can creep into personalization algorithms and produce unfairness issues. In this talk, I will introduce the trending topics and our recent research progresses at THUIR (Tsinghua University Information Retrieval) group on fairness issue in recommender systems, including the causes of unfairness and the approaches to handle it. These series of work provide new ideas for building fairness-aware recommender system, and have been published on related top-tier international conferences SIGIR 2018, WWW 2019, SIGIR 2019, etc.

FLUID: Flexible User Interface Distribution for Ubiquitous Multi-device Interaction

Speaker: Insik Shin, KAIST

The growing trend of multi-device ownerships creates a need and an opportunity to use applications across multiple devices. However, in general, the current app development and usage still remain within the single-device paradigm, falling far short of user expectations. For example, it is currently not possible for a user to dynamically partition an existing live streaming app with chatting capabilities across different devices, such that she watches her favorite broadcast on her smart TV while real-time chatting on her smartphone. In this paper, we present FLUID, a new Android-based multi-device platform that enables innovative ways of using multiple devices. FLUID aims to i) allow users to migrate or replicate individual user interfaces (UIs) of a single app on multiple devices (high flexibility), ii) require no additional development effort to support unmodified, legacy applications (ease of development), and iii) support a wide range of apps that follow the trend of using custom-made UIs (wide applicability). FLUID, on the other hand, meets the goals by carefully analyzing which UI states are necessary to correctly render UI objects, deploying only those states on different devices, supporting cross-device function calls transparently, and synchronizing the UI states of replicated UI objects across multiple devices. Our evaluation with 20 unmodified, real-world Android apps shows that FLUID can transparently support a wide range of apps and is fast enough for interactive use.

Global Texture Mapping for Dynamic Objects

Speaker: Seungyong Lee, Pohang University of Science and Technology (POSTECH)

In this talk, I will introduce a novel framework to generate a global texture atlas for a deforming geometry. Our approach distinguishes from prior arts in two aspects. First, instead of generating a texture map for each timestamp to color a dynamic scene, our framework reconstructs a global texture atlas that can be consistently mapped to a deforming object. Second, our approach is based on a single RGB-D camera, without the need of a multiple-camera setup surrounding a scene. In our framework, the input is a 3D template model with an RGB-D image sequence, and geometric warping fields are found using a state-of-the-art non-rigid registration method to align the template mesh to noisy and incomplete input depth images. With these warping fields, our multi-scale approach for texture coordinate optimization generates a sharp and clear texture atlas that is consistent with multiple color observations over time. Our approach provides a handy configuration to capture a dynamic geometry along with a clean texture atlas, and we demonstrate it with practical scenarios, particularly human performance capture.

Gradient Descent Finds Global Minima of Deep Neural Networks

Speaker: Liwei Wang, Peking University

Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture. This structure allows us to show the Gram matrix is stable throughout the training process and this stability implies the global optimality of the gradient descent algorithm. We further extend our analysis to deep residual convolutional neural networks and obtain a similar convergence result.

Graph-based Action Assessment

Speaker: Wei-Shi Zheng, Sun Yat-sen University

We present a new model to assess the performance of actions visually from videos by graph-based joint relation modelling. Previous works mainly focused on the whole scene including the performer’s body and background, yet they ignored the detailed joint interactions. This is insufficient for fine-grained and accurate action assessment, because the action quality of each joint is dependent of its neighboring joints. Therefore, we propose to learn the detailed joint motion based on the joint relations. We build trainable Joint Relation Graphs, and analyze joint motion on them. We propose two novel modules, namely the Joint Commonality Module and the Joint Difference Module, for joint motion learning. The Joint Commonality Module models the general motion for certain body parts, and the Joint Difference Module models the motion differences within body parts. We evaluate our method on six public Olympic actions for performance assessment. Our method outperforms previous approaches (+0.0912) and the whole-scene model (+0.0623) in terms of the Spearman’s Rank Correlation. We also demonstrate our model’s ability of interpreting the action assessment process.

Intelligent Action Analytics with Multi-Modal Reasoning

Speaker: Jiaying Liu, Peking University

In this talk, we focus on intelligent action analytics in videos with multi-modal reasoning, which is important but remains under explored. We first present challenges in this problem by introducing PKU-MMD dataset collected by ourselves, i.e., multi-modal complementary feature learning, noise-robust feature learning, and dealing with tedious label annotation, etc. To tackle the above issues, we propose initial solutions with multi-modal reasoning. A modality compensation network is proposed to explicitly explore relationship of different modalities and further boost multi-modal feature learning. A noise-invariant network is developed to recognize human actions from noisy skeletons by referring denoised skeletons. To light up the community, we introduce possible future work in the end, such as self-supervised learning, language-guided reasoning.

Kafe: can OS kernel handle packets fast enough

Speaker: Chuck Yoo, Korea University

It is widely believed that commodity operating systems cannot deliver high-speed packet processing, and a number of alternative approaches (including user-space network stacks) have been proposed. This talk revisits the inefficiency of packet processing inside kernel and explores whether a redesign of kernel network stacks can improve the incompetence. We present a case through a redesign: Kafe – a kernel-based advanced forwarding engine. Contrary to the belief, Kafe can process packets as fast as user-space network stacks. Kafe neither adds any new API nor depends on proprietary hardware features.

Learning Multi-label Feature for Fine-Grained Food Recognition

Speaker: Xueming Qian, Xi’an Jiaotong University

Fine-grained food recognition is the detailed classification provide more specialized and professional attribute information of food. It is the basic work to realize healthy diet recommendation and cooking instructions, nutrition intake management and caféteria self-checkout system. Chinese food appearance without the structured information, and ingredients composition is an important consideration. We proposed a new method for fine-grained food and ingredients recognition, include Attention Fusion Network (AFN) and Food-Ingredient Joint Learning. In AFN, it is focus on important attention regional features, and generates the feature descriptor. In Food-Ingredient Joint Learning, we proposed the balance focal loss to solve the issue of imbalanced ingredients multi-label. Finally, a series of experiments to prove results have significantly improved on the existing methods.

Learning to Appreciate: Transforming Multimedia Communications via Deep Video Analytics

Speaker: Yonggang Wen, Nanyang Technological University

Media-rich applications will continue to dominate mobile data traffic with an exponential growth, as predicted by Cisco Video Index. The improved quality of experience (QoE) for the video consumers plays an important role in shaping this growth. However, most of the existing approaches in improving video QoE are system-centric and model-based, in that they tend to derive insights from system parameters (e.g., bandwidth, buffer time, etc) and propose various mathematical models to predict QoE scores (e.g., mean opinion score, etc). In this talk, we will share our latest work in developing a unified and scalable framework to transform multimedia communications via deep video analytics. Specifically, our framework consists two main components. One is a deep-learning based QoE prediction algorithm, by combining multi-modal data inputs to provide a more accurate assessment of QoE in real-time manner. The other is a model-free QoE optimization paradigm built upon deep reinforcement learning algorithm. Our preliminary results verify the effectiveness of our proposed framework. We believe that the hybrid approach of multimedia communications and computing would fundamentally transform how we optimization multimedia communications system design and operations.

Lensless Imaging for Biomedical Applications

Speaker: Seung Ah Lee, Yonsei University

Miniaturization of microscopes can be a crucial stepping stone towards realizing compact,cost-effective and portable platforms for biomedical research and healthcare. This talk reports on implementations lensless microscopes and lensless cameras for a variety of biological imaging applications in the form of mass-producible semiconductor devices, which transforms the fundamental design of optical imaging systems.

Leveraging Generative Adversarial Networks for Data Augmentation by Disentangling Class-Independent Features

Speaker: Jaegul Choo, Korea University

Considering its success in generating high-quality, realistic data, generative adversarial networks (GANs) have potentials to be used for data augmentation to improve the prediction accuracy in diverse problems where the limited amount of training data is given. However, GANs themselves require a nontrivial amount of data for their training, so data augmentation via GANs does not often improve the accuracy in practice. This talk will briefly review existing literature and our on-going approach based on feature disentanglement. I will conclude the talk with further research issues that I would like to address in the future.

Manipulatable Auditory Perception in Wearable Computing

Speaker: Hiroki Watanabe, Hokkaido University

Since auditory perception is passive sense, we often do not notice important information and acquire unimportant information. We focused on a earphone-type wearable computer (hearable device) that not only has speakers but also microphones. In a hearable computing environment, we always attach microphones and speakers to the ears. Therefore, we can manipulate our auditory perception using a hearable device. We manipulated the frequency of the input sound from the microphones and transmitted the converted sound from the speakers. Thus, we could acquire the sound that is not heard with our normal auditory perception and eliminate the unwanted sound according to the user’s requirements.

Model Centric DevOps for Network Functions

Speaker: Wenfei Wu, Tsinghua University

Network Functions play important roles in improving performance and enhancing security in modern computer networks. More and more NFs are being developed, integrated, and managed in production networks. However, the connection between the development and the operation for network functions has not drawn attention yet, which slows down the development and delivery of NFs and complicates NF network management.

We propose that building a common abstraction layer for network functions would benefit both the development and operation. For NF development, having a uniform abstraction layer to describe NF behaviors would make the cross-platform development to be rapid and agile, which accelerate the NF delivery for NF vendors, and we would introduce our recent NF development framework based on language and compiler technologies. For NF operation, having a behavior model would ease the network reasoning, which can avoid runtime bugs, and more crucially, the behavior model is guaranteed to reflect the actual implementation; we would introduce our NF verification work based on the NF modeling language. Around our model-centric NF development and operation, we also other NF model works which lay the foundation of NF modeling language, and fill in the semantic gap between legacy NFs and NF models.

NAT: Neural Architecture Transformer for Accurate and Compact Architectures

Speaker: Mingkui Tan, South China University of Technology

Architecture design is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-searched architecture may still contain many non-significant or redundant modules or operations (e.g., convolution or pooling), which not only incur substantial memory consumption and computational cost but may also deteriorate the performance. Thus, it is necessary to optimize the operations inside the architecture to improve the performance without introducing extra computational cost. However, such a constrained optimization problem is an NP-hard problem and is very hard to solve. To address this problem, we cast the optimization problem into a Markov decision process (MDP) and learn a Neural Architecture Transformer (NAT) to replace the redundant operations with the more computationally efficient ones (e.g., skip connection or directly removing the connection). In MDP, we train NAT with reinforcement learning to obtain the architecture optimization policies w.r.t. different architectures. To verify the effectiveness of the proposed method, we apply NAT on both hand-crafted architectures and NAS based architectures. Extensive experiments on two benchmark datasets, i.e., CIFAR-10 and ImageNet, show that the transformed architecture significantly outperforms both the original architecture and the architectures optimized by the existing methods.

Novelty-aware exploration in RL and Conditional GANs for diversity

Speaker: Gunhee Kim, Seoul National University

In this talk, I will introduce two recent works on machine learning from Vision and Learning Lab of Seoul National University. First, we present our work in reinforcement learning. We introduce an information-theoretic exploration strategy named Curiosity-Bottleneck (CB) that distills task-relevant information from observation. In our experiments, we observe that the CB algorithm robustly measures the state novelty in distractive environments where state-of-the-art exploration methods often degenerate. Second, we propose novel training schemes with a new set of losses that can prevent conditional GANs from losing the diversity in their outputs. We perform thorough experiments on image-to-image translation, super-resolution and image inpainting and show that our methods achieve a great diversity in outputs while retaining or even improving the visual fidelity of generated samples.

Numerical/quantitative system for common sense natural language processing

Speaker: Hiroaki Yamane, RIKEN AIP & The University of Tokyo

Numerical common sense (e.g., “a person with a height of 2m is very tall”) is essential when deploying artificial intelligence (AI) systems in society. We construct methods for converting contextual language to numerical variables for quantitative/numerical common sense in natural language processing (NLP).

We are living the world where we need common sense. We use some common sense when observing objects: A 165 cm human cannot be bigger than a 1 km bridge. The weight of the aforementioned human ranges from 40 kg to 90 kg. If one’s weight is less than 50 kg, they are more likely to be very thin. This can be also applied to money. If the latest Surface Pro is $500, it is quite cheap. There is a necessity to account for common sense in future AI system.

To address this problem, we first use a crowdsourcing service to obtain sufficient data for a subjective agreement on numerical common sense. Second, to examine whether common sense is attributed to current word embedding, we examined the performance of a regressor trained on the obtained data.

Paraphrasing and Simplification with Lean Vocabulary

Speaker: Tadashi Nomoto, The SOKENDAI Graduate School of Advanced Studies

In this work, we examine whether it is possible to achieve the state of the art performance in paraphrase generation with reduced vocabulary. Our approach consists of building a convolution to sequence model (Conv2Seq) partially guided by the reinforcement learning, and training it on the sub-word representation of the input. The experiment on the Quora dataset, which contains over 140,000 pairs of sentences and corresponding paraphrases, found that with less than 1,000 token types, we were able to achieve performance that exceeded that of the current state of the art. We also report that the same architecture works equally well for text simplification, with little change.

Ray-SSL: Ray Tracing based Sound Source Localization considering Reflection and Diffraction

Speaker: Sung-eui Yoon, KAIST

In this talk, we discuss a novel, ray tracing based technique for 3D sound source localization for indoor and outdoor environments. Unlike prior approaches, which are mainly based on continuous sound signals from a stationary source, our formulation is designed to localize the position instantaneously from signals within a single frame. We consider direct sound and indirect sound signals that reach the microphones after reflecting off surfaces such as ceilings or walls. We then generate and trace direct and reflected acoustic paths using backward acoustic ray tracing and utilize these paths with Monte Carlo localization to estimate a 3D sound source position. For complex cases with many objects, we also found that diffraction effects caused by the wave characteristics of sound become dominant. We propose to handle such non-trivial problems even with ray tracing, since directly applying wave simulation is prohibitively expensive.

Recent Advances and Trends in Visual Tracking

Speaker: Tianzhu Zhang, University of Science and Technology of China

Visual tracking is one of the most fundamental topics in computer vision with various applications in video surveillance, human computer interaction and vehicle navigation. Although great progress has been made in recent years, it remains a challenging problem due to factors such as illumination changes, geometric deformations, partial occlusions, fast motions and background clutters. In this talk, I will first review several recent models of visual tracking including particle filtering, classifier learning for tracking, sparse tracking, deep learning tracking, and correlation filter based tracking. Then, I will review several recent works of our group including correlation particle filter tracking, and graph convolutional tracking.

Relational Knowledge Distillation

Speaker: Minsu Cho, Pohang University of Science and Technology (POSTECH)

Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. For concrete realizations of RKD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations. Experiments conducted on different tasks show that the proposed method improves educated student models with a significant margin. In particular for metric learning, it allows students to outperform their teachers’ performance, achieving the state of the arts on standard benchmark datasets.

Requirements of Computer Vision for Household Robots

Speaker: Jun Takamatsu, Nara Institute of Science and Technology

For household robots that work in everyday-life dynamic environments, the computer vision (CV) to recognize the environments is essential. Unfortunately, CV issues in household robots sometimes cannot be solved by the methods that were usually proposed in the CV fields. In this talk, I exemplify the two examples and would like to ask their solutions. The first example is CV in learning-from-observation, where it is not enough to recognize names of actions, such as walk and jump. The second example is analysis of usage of time. This requires recognizing activities in the level such as watch TV and spend one’s hobby.

Software and Hardware Co-design for Networked Memory

Speaker: Youyou Lu, Tsinghua University

Non-volatile memory (NVM) and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. Comparatively, the software overhead in the file systems become a non-negligible part in persistent memory storage systems. To achieve efficient networked memory design, I will present this design choices in Octopus. Octopus is a distributed file system that redesigns file system internal mechanisms by closely coupling NVM and RDMA features. I will further discuss the possible hardware enhancements for networked memory for research in my group.

System support for designing efficient gradient compression algorithms for distributed DNN training

Speaker: Cheng Li, University of Science and Technology of China

Training DNN models across a large number of connected devices or machines has been at norm. Studies suggest that the major bottleneck of scaling out the training jobs is to exchange the huge amount of gradients per mini-batch. Thus, a few compression algorithms have been proposed, such as Deep Gradients Compression, Terngrad, and evaluated to demonstrate their benefits of reducing the transmission cost. However, when re-implementing these algorithms and integrating them into mainstream frameworks such as MxNet, we identified that they performed less efficiently than what was claimed in their original papers. The major gap is that the developers of those algorithms did not necessarily understand the internals of the deep learning frameworks. As a consequence, we believe that there is lack of system support for enabling the algorithm developers to primarily focus on the innovations of the compression algorithms, rather than the efficient implementations which may take into account various levels of parallelism. To this end, we propose a domain-specific language that allows the algorithm developers to sketch their compression algorithms, a translator that converts the high-level descriptions into low-level highly optimized GPU codes, and a compiler that generates new computation DAGs that fuses the compression algorithms with proper operators that produce gradients.

Towards solving the cocktail party problem: from speech separation to speech recognition

Speaker: Jun Du, University of Science and Technology of China

Solving the cocktail party problem is one ultimate goal for the machine to achieve the human-level auditory perception. Speech separation and recognition are two related key techniques. With the emergence of deep learning, new milestones are achieved for both speech separation and recognition. In this talk, I will introduce our recent progress and future trends in these areas with the development of DIHARD and CHiME Challenges.

Toward Ubiquitous Operating Systems: Challenges and Research Directions

Speaker: Yao Guo, Peking University

In recent years, operating systems have expanded beyond traditional computing systems into the cloud, IoT devices, and other emerging technologies and will soon become ubiquitous. We call this new generation of OSs as ubiquitous operating systems (UOSs). Despite the apparent differences among existing OSs, they all have in common so-called “software-defined” capabilities—namely, resource virtualization and function programmability. In this talk, I will present our vision and some recent work toward the development of ubiquitous operating systems (UOSs).

Vibration-Mediated Sensing Techniques for Tangible Interaction

Speaker: Seungmoon Choi, Pohang University of Science and Technology (POSTECH)

Tangible interaction allows a user to interact with a computer using ordinary physical objects. It substantially expands the interaction space owing to the natural affordance and metaphors provided by real objects. However, tangible interaction requires to identify the object held by the user or how the user is touching the object. In this talk, I will introduce two sensing techniques for tangible interaction, which exploits active sensing using mechanical vibration. A vibration is transmitted from an exciter worn in the user’s hand or fingers, and the transmitted vibration is measured using a sensor. By comparing the input-output pair, we can recognize the object held between two fingers or the fingers touching the object. The mechanical vibrations also provide pleasant confirmation feedback to the user. Details will be shared in the talk.

Video Analytics in Crowded Spaces

Speaker: Rajesh Krishna Balan, Singapore Management University

I will describe the flow of work I am starting on video analytics in crowded spaces. This includes malls, conferences centres, and university campuses in Asia. The goal of this work is to use video analytics, combined with other sensors to accurately count the number of people in the environments, track their movement trajectories, and discover their demographics and persona.

Video Dialog via Progressive Inference and Cross-Transformer

Speaker: Zhou Zhao, Zhejiang University

Video dialog is a new and challenging task, which requires the agent to answer questions combining video information with dialog history. And different from single-turn video question answering, the additional dialog history is important for video dialog, which often includes contextual information for the question. Existing visual dialog methods mainly use RNN to encode the dialog history as a single vector representation, which might be rough and straightforward. Some more advanced methods utilize hierarchical structure, attention and memory mechanisms, which still lack an explicit reasoning process. In this paper, we introduce a novel progressive inference mechanism for video dialog, which progressively updates query information based on dialog history and video content until the agent think the information is sufficient and unambiguous. In order to tackle the multimodal fusion problem, we propose a cross-transformer module, which could learn more fine-grained and comprehensive interactions both inside and between the modalities. And besides answer generation, we also consider question generation, which is more challenging but significant for a complete video dialog system. We evaluate our method on two largescale datasets, and the extensive experiments show the effectiveness of our method.

Visual Analytics of Sports Data

Speaker: Yingcai Wu, Zhejiang University

With the rapid development of sensing technologies and wearable devices, large sports data have been acquired daily. The data usually implies a wide spectrum of information and rich knowledge about sports. Visual analytics, which facilitates analytical reasoning by interactive visual interfaces, has proven its value in solving various problems. In this talk, I will discuss our research experiences in visual analytics of sports data and introduce several recent studies of our group of making sense of sports data through interactive visualization.

Visual Analytics for Data Quality Improvement

Speaker: Shixia Liu, Tsinghua University

The quality of training data is crucial to the success of supervised and semi-supervised learning. Errors in data have long been known to limit the performance of machine learning models. This talk presents the motivation, major challenges of interactive data quality analysis and improvement. With that perspective, I will then discuss some of my recent efforts on 1) analyzing and correcting poor label quality, and 2) resolving the poor coverage of the training data caused by dataset bias.

VIS+AI: Making AI more Explainable and VIS more Intelligent

Speaker: Huamin Qu, Hong Kong University of Science and Technology

VIS for AI and AI for VIS have become hot research topics recently. On the one side, visualization plays an important role in explainable AI. On the other side, AI has been transforming the visualization field and automated the whole visualization system development pipeline. In this talk, I will introduce the emerging opportunities of combining AI and VIS to leverage both human intelligence and artificial intelligence to solve some grand challenging problems facing both fields and the society.

What We Learned from Medical Image Learning

Speaker: Winston Hsu, National Taiwan University

We observed super-human capabilities from convolutional networks for image learning. It is a natural extension for advancing the technologies towards healthcare applications such as medical image segmentation (CT, MRI), registration, detection, prediction, etc. In the past few years, working closely with the university hospitals, we found many exciting developments in this aspect. However, we also learn a lot as working in the cross-disciplinary setup, which requires strong devotions and deep technologies from the medical and machine learning domains. We’d like to take this opportunity to share what we failed and succeeded for the few attempts in advancing machine learning for medical applications. We will identity promising working models (also the misunderstandings between these two disciplines) derived with the medical experts and evidence the great opportunities to discover new treatment or diagnosis methods across numerous common diseases.

Speakers

Workshops


Rajesh Krishna Balan

Singapore Management University

Bio

Prof. Balan is an ACM Distinguished Scientist and has worked in the area of mobile systems for over 18 years. He obtained his Ph.D. in Computer Science in 2006 from Carnegie Mellon University under the guidance of Professor Mahadev Satyanarayanan. He has been a general chair for both MobiSys 2016 and UbiComp 2018 and has served as a program chair for HotMobile 2012 and MobiSys 2019. In addition, he also organised student workshop, called ASSET, that ran at MobiCom 2019, COMSNETS 2018, and MobiSys 2016. Prof. Balan has a strong interest in applied research and was a director for LiveLabs (http://www.livelabs.smu.edu.sg), a large research / startup lab that turned real-world environments (such as a university, a convention centre, and a resort island) into living testbeds for mobile systems experiments. He founded a startup to more effectively provide LiveLabs technologies to interested commercial clients. These experiences have given Prof Balan a great insight into how hard and meaningful it is to translate research into tangible systems that are tested and deployed in the real world.


Ting Cao

Microsoft Research

Bio

Ting Cao is now a Researcher in System Research Group of MSRA. Her research interests include HW/SW co-design, high-level language implementation, software management of heterogeneous hardware, big data and deep learning frameworks. She has reputable publications in ISCA, ASPLOS, PLDI, Proceedings of the IEEE, etc. She got her PhD from the Australian National University. Before joining MSRA, she was a senior software engineer in the Compiler and Computing Language Lab in Huawei Technologies.


Yue Cao

Microsoft Research

Bio

Yue Cao is now a researcher at Microsoft Research Asia. He received the B.E. degree in Computer Software at 2014 and Ph.D. degree in Software Engineering at 2019, both from Tsinghua University, China. He was awarded the Top-grade Scholarship of Tsinghua University at 2018, and Microsoft Research Asia PhD Fellowship at 2017. His research interests include computer vision and deep learning. He has published more than 20 papers in the top-tier conferences with more than 1,700 citations.


Xilin Chen

Chinese Academy of Sciences

Bio

Xilin Chen is a professor with the Institute of Computing Technology, Chinese Academy of Sciences (CAS). He has authored one book and more than 300 papers in refereed journals and proceedings in the areas of computer vision, pattern recognition, image processing, and multimodal interfaces. He is currently an associate editor of the IEEE Transactions on Multimedia, and a Senior Editor of the Journal of Visual Communication and Image Representation, a leading editor of the Journal of Computer Science and Technology, and an associate editor-in-chief of the Chinese Journal of Computers, and Chinese Journal of Pattern Recognition and Artificial Intelligence. He served as an Organizing Committee member for many conferences, including general co-chair of FG13 / FG18, program co-chair of ICMI 2010. He is / was an area chair of CVPR 2017 / 2019 / 2020, and ICCV 2019. He is a fellow of the IEEE, IAPR, and CCF.


Peng Cheng

Microsoft Research

Bio

Peng Cheng is the researcher in Networking Research Group, MSRA. His research interests are computer networking and networked systems. His recent work is focusing on Hardware-based System in Data Center. He has publications in NSDI, CoNEXT, EuroSys, SIGCOMM, etc. He received his Ph.D. in Computer Science and Technology from Tsinghua University in 2015.


Jaegul Choo

Korea University

Bio

Jaegul Choo (https://sites.google.com/site/jaegulchoo/ ) is an associate professor in the Dept. of Computer Science and Engineering at Korea University. He has been a research scientist at Georgia Tech from 2011 to 2015, where he also received M.S in 2009 and Ph.D in 2013. His research areas include computer vision, and natural language processing, data mining, and visual analytics, and his work has been published in premier venues such as KDD, WWW, WSDM, CVPR, ECCV, EMNLP, AAAI, IJCAI, ICDM, ICWSM, IEEE VIS, EuroVIS, CHI, TVCG, CFG, and CG&A. He earned the Best Student Paper Award at ICDM in 2016, the NAVER Young Faculty Award in 2015, the Outstanding Research Scientist Award at Georgia Tech in 2015, and the Best Poster Award at IEEE VAST (as part of IEEE VIS) in 2014.


Nan Duan

Microsoft Research

Bio

Dr. Nan DUAN is a Principle Research Manager at Microsoft Research Asia. He is working on fundamental NLP tasks, especially on question answering, natural language understanding, language + vision, pre-training and reasoning.


Winston Hsu

National Taiwan University

Bio

Prof. Winston Hsu is an active researcher dedicated to large-scale image/video retrieval/mining, visual recognition, and machine intelligence. He is a Professor in the Department of Computer Science and Information Engineering, National Taiwan University. He and his team have been recognized with technical awards in multimedia and computer vision research communities including IBM Research Pat Goldberg Memorial Best Paper Award (2018), Best Brave New Idea Paper Award in ACM Multimedia 2017, First Place for IARPA Disguised Faces in the Wild Competition (CVPR 2018), First Prize in ACM Multimedia Grand Challenge 2011, ACM Multimedia 2013/2014 Grand Challenge Multimodal Award, etc. Prof. Hsu is keen to realizing advanced researches towards business deliverables via academia-industry collaborations and co-founding startups. He was a Visiting Scientist at Microsoft Research Redmond (2014) and had his 1-year sabbatical leave (2016-2017) at IBM TJ Watson Research Center. He served as the Associate Editor for IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) and IEEE Transactions on Multimedia, two premier journals, and was on the Editorial Board for IEEE Multimedia Magazine (2010 – 2017).


Sung Ju Hwang

KAIST

Bio

Sung Ju Hwang is an assistant professor in the Graduate School of Artificial Intelligence and School of Computing at KAIST. He received his Ph.D. degree in computer science at University of Texas at Austin, under the supervision of Professor Kristen Grauman. Sung Ju Hwang’s research interest is mainly on developing machine learning models for tackling practical challenges in various application domains, including but not limited to, visual recognition, natural language understanding, healthcare and finance. He regularly presents papers at various top-tier AI conferences, such as NeurIPS, ICML, ICLR, CVPR, ICCV, AAAI and ACL.


Guolin Ke

Microsoft Research

Bio

Guolin Ke is currently a Researcher in Machine Learning Group, Microsoft Research Asia. His research interests mainly lie in machine learning algorithms.


Gunhee Kim

Seoul National University

Bio

Gunhee Kim is an associate professor in the Department of Computer Science and Engineering of Seoul National University from 2015. He was a postdoctoral researcher at Disney Research for one and a half years. He received his PhD in 2013 under supervision of Eric P. Xing from Computer Science Department of Carnegie Mellon University. Prior to starting PhD study in 2009, he earned a master’s degree under supervision of Martial Hebert in Robotics Institute, CMU. His research interests are solving computer vision and web mining problems that emerge from big image data shared online, by developing scalable and effective machine learning and optimization techniques. He is a recipient of 2014 ACM SIGKDD doctoral dissertation award, and 2015 Naver New faculty award.


Shujie Liu

Microsoft Research

Bio

Dr. Shujie Liu is a Principle Researcher in Natural Language Computing group at Microsoft Research Asia, Beijing, China. Shujie joined MSRA-NLC in Jul. 2012 after he received his Ph.D in Jun. 2012 from Department of Computer Science of Harbin Institute of Technology.

Shujie’s research interests include natural language processing and deep learning. He is now working on fundamental NLP problems, models, algorithms and innovations.


Xuanzhe Liu

Peking University

Bio

Prof. Xuanzhe Liu is now an associate professor with the Institute of Software, Peking University, since 2011. He now leads the DAAS (Data, Analytics, Applications, and Systems) lab in Peking University. Prof. Liu’s recent research interests are focused on measuring, engineering, and operating large-scale service-based and intelligent software systems (such as mobility and Web), mostly from a data-driven perspective. Prof. Liu has published more than 80 papers on premier conferences such as WWW, ICSE, OOPSLA, MobiCom, UbiComp, EuroSys, and IMC, and impactful journals such as ACM TOIS/TOIT and IEEE TSE/TMC/TSC. He won the Best Paper Award of WWW 2019. He was also recognized by several academic awards, such as the CCF-IEEE CS Young Scientist Award, the Honorable Young Faculty Award of Yangtze River Scholar Program, and so on. Prof. Liu was a visiting researcher with Microsoft Research (with “Star-Track Young Faculty Program”) from 2013-2014, and the winner of Microsoft Ph.D. Fellowship in 2007.


Jiwen Lu

Tsinghua University

Bio

Jiwen Lu is currently an Associate Professor with the Department of Automation, Tsinghua University, China. His current research interests include computer vision, machine learning, and intelligent robotics. He has authored/co-authored over 200 scientific papers in these areas, where over 70 of them are IEEE Transactions papers and over 50 of them are CVPR/ICCV/ECCV papers. He was a recipient of the National 1000 Young Talents Program of China in 2015, and the National Science Fund of China Award for Excellent Young Scholars in 2018. He serves as the Co-Editor-of-Chief for PR Letters, an Associate Editor for T-IP/T-CSVT/T-BIOM/PR. He is the Program Co-Chair of ICME’2020, AVSS’2020 and DICTA’2019, and an Area Chair for CVPR’2020, ICME’2017-2019, ICIP’2017-2019, and ICPR 2018.


Chong Luo

Microsoft Research

Bio

Dr. Chong Luo joined Microsoft Research Asia in 2003 and is now a Principal Researcher at the Intelligent Multimedia Group (IMG). She is an adjunct professor and a Ph.D. advisor at the University of Science and Technology of China (USTC), China. Her current research interests include computer vision, cross-modality multimedia analysis and processing, and multimedia communications. In particular, she is interested in visual object tracking, audio-visual and text-visual video analysis, and hybrid digital-analog transmission of wireless video. She is currently a member of the Multimedia Systems and Applications (MSA) Technical Committee (TC) of the IEEE Circuits and Systems (CAS) society. She is an IEEE senior member.


Sinno Jialin Pan

Nanyang Technological University

Bio

Dr Sinno Jialin Pan is a Provost’s Chair Associate Professor with the School of Computer Science and Engineering, and Deputy Director of the Data Science and AI Research Centre at Nanyang Technological University (NTU), Singapore. He received his Ph.D. degree in computer science from the Hong Kong University of Science and Technology (HKUST) in 2011. Prior to joining NTU, he was a scientist and Lab Head of text analytics with the Data Analytics Department, Institute for Infocomm Research, Singapore from Nov. 2010 to Nov. 2014. He joined NTU as a Nanyang Assistant Professor (university named assistant professor) in Nov. 2014. He was named to “AI 10 to Watch” by the IEEE Intelligent Systems magazine in 2018. His research interests include transfer learning, and its applications to wireless-sensor-based data mining, text mining, sentiment analysis, and software engineering.


Xu Tan

Microsoft Research

Bio

Xu Tan is currently a Senior Researcher in Machine Learning Group, Microsoft Research Asia (MSRA). He graduated from Zhejiang University on March, 2015. His research interests mainly lie in machine learning, deep learning, low-resource learning, and their applications on natural language processing and speech processing, including neural machine translation, text to speech, etc.


Chuan Wu

University of Hong Kong

Bio

Chuan Wu received her B.Engr. and M.Engr. degrees in 2000 and 2002 from the Department of Computer Science and Technology, Tsinghua University, China, and her Ph.D. degree in 2008 from the Department of Electrical and Computer Engineering, University of Toronto, Canada. Between 2002 and 2004, She worked in the Information Technology industry in Singapore. Since September 2008, Chuan Wu has been with the Department of Computer Science at the University of Hong Kong, where she is currently an Associate Professor. Her current research is in the areas of cloud computing, distributed machine learning/big data analytics systems, and smart elderly care technologies/systems. She is a senior member of IEEE, a member of ACM, and an associate editor of IEEE Transactions on Cloud Computing, IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology and ACM Transactions on Modeling and Performance Evaluation of Computing Systems. She was the co-recipient of the best paper awards of HotPOST 2012 and ACM e-Energy 2016.


Yingce Xia

Microsoft Research

Bio

I am currently a researcher at machine learning group, Microsoft Research Asia. I received my Ph.D. degree from University of Science and Technology in 2018, supervised by Dr. Tie-Yan Liu and Prof. Nenghai Yu. Prior to that, I obtained my bachelor degree from University of Science and Technology of China in 2013.

My research revolves around dual learning (a new learning paradigm proposed by our group) and deep learning (with application to neural machine translation and image processing).


Dongdong Zhang

Microsoft Research

Bio

Dr. Dongdong Zhang is a researcher in Natural Language Computing group at Microsoft Research Asia, Beijing, China. He received his Ph.D in Dec. 2005 from Department of Computer Science of Harbin Institute of Technology under the supervision of Prof. Jianzhong Li. Before that, he received a B.S. degree and M.S. degree from the same department in 1999 and 2001 respectively.

Dongdong’s research interests include natural language processing, machine translation and machine learning. He is now working on research and development of advanced statistical machine translation systems (SMT) as well as related fundamental NLP problems, models, algorithms and innovations.


Quanlu Zhang

Microsoft Research

Bio

Quanlu Zhang is a senior researcher at MSRA. He obtained his PhD in computer science from Peking University. His current focuses are on the areas of AutoML systems, GPU cluster management, resource scheduling, and storage support for DL workload. Some works have been published on conferences such as OSDI, SoCC, FAST etc.

Breakout Sessions


Rajesh Krishna Balan

Singapore Management University

Bio

Prof. Balan is an ACM Distinguished Scientist and has worked in the area of mobile systems for over 18 years. He obtained his Ph.D. in Computer Science in 2006 from Carnegie Mellon University under the guidance of Professor Mahadev Satyanarayanan. He has been a general chair for both MobiSys 2016 and UbiComp 2018 and has served as a program chair for HotMobile 2012 and MobiSys 2019. In addition, he also organised student workshop, called ASSET, that ran at MobiCom 2019, COMSNETS 2018, and MobiSys 2016. Prof. Balan has a strong interest in applied research and was a director for LiveLabs (http://www.livelabs.smu.edu.sg), a large research / startup lab that turned real-world environments (such as a university, a convention centre, and a resort island) into living testbeds for mobile systems experiments. He founded a startup to more effectively provide LiveLabs technologies to interested commercial clients. These experiences have given Prof Balan a great insight into how hard and meaningful it is to translate research into tangible systems that are tested and deployed in the real world.


Lei Chen

Hong Kong University of Science and Technology

Bio

Lei Chen has BS degree in computer science and engineering from Tianjin University, Tianjin, China, MA degree from Asian Institute of Technology, Bangkok, Thailand, and Ph.D. in computer science from the University of Waterloo, Canada. He is a professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology (HKUST). Currently, Prof. Chen serves as the director of Big Data Institute at HKUST, the director of Master of Science on Big Data Technology and director of HKUST MOE/MSRA Information Technology Key Laboratory. Prof. Chen’s research includes human-powered machine learning, crowdsourcing, Blockchain, social media analysis, probabilistic and uncertain databases, and privacy-preserved data publishing. Prof. Chen got the SIGMOD Test-of-Time Award in 2015.The system developed by Prof. Chen’s team won the excellent demonstration award in VLDB 2014. Currently, Pro. Chen serves as Editor-in-Chief of VLDB Journal, associate editor-in-chief of IEEE Transaction on Data and Knowledge Engineering and Program Committee Co-Chair for VLDB 2019. He is an ACM Distinguished Member and an IEEE Senior Member


Wen-Huang Cheng

National Chiao Tung University

Bio

Wen-Huang Cheng is Professor with the Institute of Electronics, National Chiao Tung University (NCTU), Hsinchu, Taiwan, where he is the Founding Director with the Artificial Intelligence and Multimedia Laboratory (AIMMLab). Before joining NCTU, he led the Multimedia Computing Research Group at the Research Center for Information Technology Innovation (CITI), Academia Sinica, Taipei, Taiwan, from 2010 to 2018. His current research interests include multimedia, artificial intelligence, computer vision, machine learning, social media, and financial technology. He has actively participated in international events and played important leading roles in prestigious journals and conferences and professional organizations, like Associate Editor for IEEE Multimedia, General co-chair for ACM ICMR (2021), TPC co-chair for ICME (2020), Chair-Elect for IEEE MSA-TC, governing board member for IAPR. He has received numerous research and service awards, including the 2018 MSRA Collaborative Research Award, the 2017 Ta-Yu Wu Memorial Award from Taiwan’s Ministry of Science and Technology (the highest national research honor for young Taiwanese researchers under age 42), the Top 10% Paper Award from the 2015 IEEE MMSP, the K. T. Li Young Researcher Award from the ACM Taipei/Taiwan Chapter in 2014, the 2017 Significant Research Achievements of Academia Sinica, the 2016 Y. Z. Hsu Scientific Paper Award, the Outstanding Youth Electrical Engineer Award from the Chinese Institute of Electrical Engineering in 2015, and the Outstanding Reviewer Award of 2018 IEEE ICME.


Minsu Cho

Pohang University of Science and Technology (POSTECH)

Bio

Minsu Cho is an assistant professor at the Department of Computer Science and Engineering at POSTECH, South Korea, leading POSTECH Computer Vision Lab. Before joining POSTECH in the fall of 2016, he has worked as a postdoc and a starting researcher in Inria (the French National Institute for computer science and applied mathematics) and ENS (École Normale Supérieure), Paris, France. He completed his Ph.D. in 2012 at Seoul National University, Korea. His research lies in the areas of computer vision and machine learning, especially in the problems of object discovery, weakly-supervised learning, semantic correspondence, and graph matching. In general, he is interested in the relationship between correspondence and supervision in visual learning. He is an editorial board member of International Journal of Computer Vision (IJCV) and has been serving area chairs in top computer vision conferences including CVPR 2018, ICCV 2019, and CVPR 2020.


Seungmoon Choi

Pohang University of Science and Technology (POSTECH)

Bio

Seungmoon Choi, PhD, is a Professor of Computer Science and Engineering at POSTECH in Korea. He received the BS and MS degrees from Seoul National University and the PhD degree from Purdue University. His main research area is haptics, the science and technology for the sense of touch, as well as its application to various domains including robotics, virtual reality, human-computer interaction, and consumer electronics. He received a 2011 Early Career Award from the IEEE Technical Committee on Haptics.


Jaegul Choo

Korea University

Bio

Jaegul Choo (https://sites.google.com/site/jaegulchoo/ ) is an associate professor in the Dept. of Computer Science and Engineering at Korea University. He has been a research scientist at Georgia Tech from 2011 to 2015, where he also received M.S in 2009 and Ph.D in 2013. His research areas include computer vision, and natural language processing, data mining, and visual analytics, and his work has been published in premier venues such as KDD, WWW, WSDM, CVPR, ECCV, EMNLP, AAAI, IJCAI, ICDM, ICWSM, IEEE VIS, EuroVIS, CHI, TVCG, CFG, and CG&A. He earned the Best Student Paper Award at ICDM in 2016, the NAVER Young Faculty Award in 2015, the Outstanding Research Scientist Award at Georgia Tech in 2015, and the Best Poster Award at IEEE VAST (as part of IEEE VIS) in 2014.


Chenhui Chu

Osaka University

Bio

Chenhui Chu received his B.S. in Software Engineering from Chongqing University in 2008, and M.S., and Ph.D. in Informatics from Kyoto University in 2012 and 2015, respectively. He is currently a research assistant professor at Osaka University. His research won the MSRA collaborative research 2019 grant award, 2018 AAMT Nagao award, and CICLing 2014 best student paper award. He is on the editorial board of the Journal of Natural Language Processing, Journal of Information Processing, and a steering committee member of Young Researcher Association for NLP Studies. His research interests center on natural language processing, particularly machine translation and language and vision understanding.


Jun Du

University of Science and Technology of China

Bio

Jun Du received the B.Eng. and Ph.D. degrees from the Department of Electronic Engineering and Information Science, University of Science and Technology of China (USTC), in 2004 and 2009, respectively. From July 2009 to June 2010, he was with iFlytek Research leading a team to develop the ASR prototype system of the mobile app “iFlytek Input”. From July 2010 to January 2013, he joined MSRA as an Associate Researcher, working on handwriting recognition, OCR, and speech recognition. Since February 2013, he has been with the National Engineering Laboratory for Speech and Language Information Processing (NEL-SLIP), USTC. His main research interest includes speech signal processing and pattern recognition applications. He has published more than 100 conference and journal papers with more than 2300 citations in Google Scholar. His team is one of the pioneers in deep-learning-based speech enhancement area, publishing two ESI highly cited papers. As the corresponding author, the IEEE-ACM TASLP paper “A Regression Approach to Speech Enhancement Based on Deep Neural Networks” also received 2018 IEEE Signal Processing Society Best Paper Award. Based on those research achievements of speech enhancement, he led a joint team with members from USTC and iFlytek Research to win the champions of all three tasks in the 2016 CHiME-4 challenge and all four tasks in 2018 CHiME-5 challenge. Currently he is the associate editor of IEEE-ACM TASLP. He is one of the organizers for DIHARD Challenge 2018 and 2019.


Ryo Furukawa

Hiroshima City University

Bio

Ryo Furukawa is an associate professor of Faculty of Information Sciences, Hiroshima City University, Hiroshima, Japan. He received his Ph.D. from Nara Institute of Science and Technology, Japan. His research area includes shape-capturing, 3D modeling, image-based rendering, and medical image analysis. He has won academic awards including ACCV Songde Ma Outstanding Paper Award (2007), PSIVT Best Paper Award (2009), IEVC2014 Best Paper Award (2014), IEEE WACV Best Paper Honorable Mention (2017), MICCAI Workshop CARE, KUKA Best Paper Award 3rd Place (2018).


Yao Guo

Peking University

Bio

Yao Guo is a professor and vice chair of the Department of Computer Science at Peking University. His recent research interests mainly focus on mobile app analysis, as well as privacy and security of mobile systems. He has received multiple awards for his research work and teaching, including First Prize of National Technology Invention Award, an Honorable Mention Award from UbiComp 2016, as well as a Teaching Excellence Award from Peking University. He received his PhD in computer engineering from University of Massachusetts, Amherst in 2007, and BS/MS degrees in computer science from Peking University.


Bohyung Han

Seoul National University

Bio

Bohyung Han is an Associate Professor in the Department of Electrical and Computer Engineering at Seoul National University, Korea. Prior to the current position, he was an Associate Professor in the Department of Computer Science and Engineering at POSTECH, Korea and a visiting research scientist in Machine Intelligence Group at Google, Venice, CA, USA. He is currently visiting Snap Research, Venice, CA. He received the B.S. and M.S. degrees from Seoul National University, Korea, in 1997 and 2000, respectively, and the Ph.D. in Computer Science at the University of Maryland, College Park, MD, USA, in 2005. He served or will be serving as an Area Chair or Senior Program Committee member of major conferences in computer vision and machine learning including CVPR, ICCV, NIPS/NeurIPS, IJCAI and ACCV, a Tutorial Chair in ICCV 2019, a General Chair in ACCV 2022, a Demo Chair in ECCV 2022, a Workshop Chair in ACCV 2020, and a Demo Chair in ACCV 2014. His research interest is computer vision and machine learning with emphasis on deep learning.


Winston Hsu

National Taiwan University

Bio

Prof. Winston Hsu is an active researcher dedicated to large-scale image/video retrieval/mining, visual recognition, and machine intelligence. He is a Professor in the Department of Computer Science and Information Engineering, National Taiwan University. He and his team have been recognized with technical awards in multimedia and computer vision research communities including IBM Research Pat Goldberg Memorial Best Paper Award (2018), Best Brave New Idea Paper Award in ACM Multimedia 2017, First Place for IARPA Disguised Faces in the Wild Competition (CVPR 2018), First Prize in ACM Multimedia Grand Challenge 2011, ACM Multimedia 2013/2014 Grand Challenge Multimodal Award, etc. Prof. Hsu is keen to realizing advanced researches towards business deliverables via academia-industry collaborations and co-founding startups. He was a Visiting Scientist at Microsoft Research Redmond (2014) and had his 1-year sabbatical leave (2016-2017) at IBM TJ Watson Research Center. He served as the Associate Editor for IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) and IEEE Transactions on Multimedia, two premier journals, and was on the Editorial Board for IEEE Multimedia Magazine (2010 – 2017).


Seung-won Hwang

Yonsei University

Bio

Prof. Seung-won Hwang is a Professor of Computer Science at Yonsei University. Prior to joining Yonsei, she had been an Associate Professor at POSTECH for 10 years, after her PhD from UIUC. Her recent research interests has been machine intelligence from data, language, and knowledge, leading to 100+ publication at top-tier AI, DB/DM, and NLP venues, including ACL, AAAI, EMNLP, IJCAI, KDD, SIGIR, SIGMOD, and VLDB. She has received best paper runner-up and outstanding collaboration award from WSDM and Microsoft Research respectively. Details can be found at http://dilab.yonsei.ac.kr/~swhwang.


Hong-Goo Kang

Yonsei University

Bio

Hong-Goo Kang received the B.S., M.S., and Ph.D. degrees from Yonsei University, Korea in 1989, 1991, and 1995, respectively. From 1996 to 2002, he was a senior technical staff member at AT&T Labs-Research, Florham Park, New Jersey. He was an associate editor of the IEEE Transactions on Audio, Speech, and Language processing from 2005 to 2008, and served numerous conferences and program committees. In 2008~2009 and 2015~2016, respectively, he worked for Broadcom (Irvine, CA) and Google (Mountain View, CA) as a visiting scholar, where he participated in various projects on speech signal processing. His research interests include speech/audio signal processing, machine learning, and human computer interface.


Gunhee Kim

Seoul National University

Bio

Gunhee Kim is an associate professor in the Department of Computer Science and Engineering of Seoul National University from 2015. He was a postdoctoral researcher at Disney Research for one and a half years. He received his PhD in 2013 under supervision of Eric P. Xing from Computer Science Department of Carnegie Mellon University. Prior to starting PhD study in 2009, he earned a master’s degree under supervision of Martial Hebert in Robotics Institute, CMU. His research interests are solving computer vision and web mining problems that emerge from big image data shared online, by developing scalable and effective machine learning and optimization techniques. He is a recipient of 2014 ACM SIGKDD doctoral dissertation award, and 2015 Naver New faculty award.


Jong Kim

Pohang University of Science and Technology (POSTECH)

Bio

Jong Kim is a professor in the Department of Computer Science and Engineering at Pohang University of Science and Technology (POSTECH). He received his Ph.D. degree from Penn. State University in 1991. From 1991 to 1992, he worked at University of Michigan as a Research Fellow. His research interests include dependable computing, hardware security, mobile security, and machine learning security. He has published papers on top security and security conferences including S&P, NDSS, CCS, WWW, Micro, and RTSS.


Min H. Kim

KAIST

Bio

Min H. Kim is a KAIST-Endowed Chair Professor of Computer Science at KAIST, Korea, leading the Visual Computing Laboratory (VCLAB). Before coming to KAIST, he had been a postdoctoral researcher at Yale University, working on hyperspectral 3D imaging. He received his Ph.D. in computer science from University College London (UCL) in 2010, with a focus on HDR color reproduction for high-fidelity computer graphics. In addition to serving on international program committees, e.g., ACM SIGGRAPH Asia, Eurographics (EG), Pacific Graphics (PG), CVPR, and ICCV, he has worked as an associate editor of ACM Transactions on Graphics (TOG), ACM Transactions on Applied Perception (TAP), and Elsevier Computers and Graphics (CAG). His recent research interests include a wide variety of computational imaging in the field of computational photography, hyperspectral imaging, BRDF acquisition, and 3D imaging.


Heejo Lee

Korea University

Bio

Heejo Lee is a Professor in the Department of Computer Science and Engineering, Korea University (KU), Seoul, Korea and the director of CSSA (Center for Software Security and Assurance). Before joining KU, he was at AhnLab, Inc., the leading security company in Korea, as a CTO from 2001 to 2003. He received his BS, MS, PhD from POSTECH, and worked for Purdue and CMU. He is a recipient of the ISC^2 ISLA award and got the most prestigious recognition of Asia-Pacific community service star in 2016.


Seong-Whan Lee

Korea University

Bio

Seong-Whan Lee is a full professor at Korea University, where he is the head of the Department of Artificial Intelligence and the Department of Brain and Cognitive Engineering.

A Fellow of the IAPR(1998), IEEE(2009), and Korean Academy of Science and Technology(2009), he has served several professional societies as chairman or governing board member. He was the founding Co-Editor-in-Chief of the International Journal of Document Analysis and Recognition and has been an Associate Editor of several international journals: Pattern Recognition, ACM Trans. on Applied Perception, IEEE Trans. on Affective Computing, Image and Vision Computing, International Journal of Pattern Recognition and Artificial Intelligence, and International Journal of Image and Graphics.


Seung Ah Lee

Yonsei University

Bio

Seung Ah Lee is an assistant professor at the Department of Electrical and Electronic Engineering at Yonsei University. Seung Ah joined Yonsei University in Fall 2018, currently leading the Optical Imaging Systems Laboratory. Prior to Yonsei, she was at Verily Life Sciences, a former Google [x] team, between 2015-2018 as a scientist. She received her PhD in Electrical Engineering at Caltech (2014) and a postdoctoral training at Stanford Bioengineering (2014-2015). She completed her BS (2007) and MS (2009) degree in Electrical Engineering at Seoul National University.


Seungyong Lee

Pohang University of Science and Technology (POSTECH)

Bio

Seungyong Lee is a professor of computer science and engineering at Pohang University of Science and Technology (POSTECH), Korea. He received a PhD degree in computer science from Korea Advanced Institute of Science and Technology (KAIST) in 1995. From 1995 to 1996, he worked at City College of New York as a postdoctoral researcher. Since 1996, he has been a faculty member of POSTECH, where he leads Computer Graphics Group. During his sabbatical years, he worked at MPI Informatik (2003-2004) and Creative Technologies Lab at Adobe Systems (2010-2011). His technologies on image deblurring and photo upright adjustment have been transferred to Adobe Creative Cloud and Adobe Photoshop Lightroom. His current research interests include image and video processing, deep learning based computational photography, and 3D scene reconstruction.


Jingwen Leng

Shanghai Jiao Tong University

Bio

Jingwen Leng is an Assistant Professor in the John Hopcroft Computer Science Center and Computer Science & Engineering Department at Shanghai Jiao Tong University. His research focuses on building efficient and resilient architectures for deep learning. He received his Ph.D. from the University of Texas at Austin, where he worked on improving the efficiency and resiliency of general-purpose GPUs.


Cheng Li

University of Science and Technology of China

Bio

Cheng Li is a research professor at the School of Computer Science and Technology, University of Science and Technology of China (USTC). His research interests lie in various topics related to improving performance, consistency, fault tolerance, and availability of distributed systems. Prior to joining USTC, he was an associated researcher at INESC-ID, Portugal, and a senior member of technical staff at Oracle Labs Swiss. He received his PhD degree from Max Planck Institute for Software Sytems (MPI-SWS) in 2016, and his bachelor degree from Nankai University in 2009. His work has been published in the premier peer-reviewed system research venues such as OSDI, USENIX ATC, EuroSys, TPDS and etc. He is a member of ACM Future Computing Academy. He was a co-chair on the Program Committee of the ACM SOSP 2017 Poster Session and ACM TURC 2018 SIGOPS/ChinaSys workshop.


Shou-De Lin

National Taiwan University

Bio

Shou-de Lin is currently a full professor in the CSIE department of National Taiwan University. He holds a BS degree in EE department from National Taiwan University, an MS-EE degree from the University of Michigan, an MS degree in Computational Linguistics and PhD in Computer Science both from the University of Southern California. He leads the Machine Discovery and Social Network Mining Lab in NTU. Before joining NTU, he was a post-doctoral research fellow at the Los Alamos National Lab. Prof. Lin’s research includes the areas of machine learning and data mining, social network analysis, and natural language processing. His international recognition includes the best paper award in IEEE Web Intelligent conference 2003, Google Research Award in 2007, Microsoft research award in 2008, 2015, 2016 merit paper award in TAAI 2010, 2014, 2016, best paper award in ASONAM 2011, US Aerospace AFOSR/AOARD research award winner for 5 years. He is the all-time winners in ACM KDD Cup, leading or co-leading the NTU team to win 5 championships. He also leads a team to win WSDM Cup 2016. He has served as the senior PC for SIGKDD and area chair for ACL. He also served as the co-founder and chief scientist of a start-up The OmniEyes.


Jiaying Liu

Peking University

Bio

Jiaying Liu is currently an Associate Professor with the Institute of Computer Science and Technology, Peking University. She received the Ph.D. degree (Hons.) in computer science from Peking University, Beijing China, 2010. She has authored over 100 technical articles in refereed journals and proceedings, and holds 42 granted patents. Her current research interests include multimedia signal processing, compression, and computer vision.

Dr. Liu is a Senior Member of IEEE, CSIG and CCF. She was a Visiting Scholar with the University of Southern California, Los Angeles, from 2007 to 2008. She was a Visiting Researcher with the Microsoft Research Asia in 2015 supported by the Star Track Young Faculties Award. She has served as a member of Multimedia Systems & Applications Technical Committee (MSA TC), Visual Signal Processing and Communications Technical Committee (VSPC TC) and Education and Outreach Technical Committee (EO TC) in IEEE Circuits and Systems Society, a member of the Image, Video, and Multimedia (IVM) Technical Committee in APSIPA. She has also served as the Technical Program Chair of IEEE VCIP-2019/ACM ICMR-2021, the Publicity Chair of IEEE ICIP-2019/VCIP-2018/MIPR 2020, the Grand Challenge Chair of IEEE ICME-2019, and the Area Chair of ICCV-2019. She was the APSIPA Distinguished Lecturer (2016-2017).


Shixia Liu

Tsinghua University

Bio

Shixia Liu is a tenured associate professor at Tsinghua University. Her research interests include explainble machine learning, interative data quality improvement, and visual text analytics. Shixia is an associate Editor-in-Chief of IEEE Transactions on Visualization and Computer Graphics, IEEE Transactions on Big Data, and ACM Transactions on Interactive Intelligent Systems . She was the Papers Co-Chairs of IEEE VAST 2016/2017 and the program co-chair of PacifcVis 2014.


Youyou Lu

Tsinghua University

Bio

Youyou Lu is an assistant professor in the Department of Computer Science and Technology at Tsinghua University. He obtained his B.S. degree from Nanjing University in 2009 and his Ph.D degree from Tsinghua University in 2015, both in Computer Science, and was a postdoctoral fellow at Tsinghua from 2015 to 2017. His current research interests include file and storage systems spanning from architectural to system levels. His research works have been published at a number of top-tier conferences including FAST, USENIX ATC, SC, EuroSys etc. His research won the Best Paper Award at NVMSA 2014 and was selected into the Best Papers at MSST 2015. He was elected in the Young Elite Scientists Sponsorship Program by CAST (China Association for Science and Technology) in 2015, and received the CCF Outstanding Doctoral Dissertation Award in 2016.


Atsuko Miyaji

Osaka University

Bio

She received the Dr. Sci. degrees in mathematics from Osaka University, Osaka, Japan in 1997. She joined Panasonic Co., LTD from 1990 to 1998.She was an associate professor at the Japan Advanced Institute of Science and Technology (JAIST) in 1998. She joined the UC Davis from 2002 to 2003. She has been a professor at JAIST, a professor at Osaka University, and an Auditor of Information-technology Promotion Agency Japan since 2007, 2015 and 2016 respectively. She has been an editor of ISO/IEC since 2000.

She received Young Paper Award of SCIS’93 in 1993, Notable Invention Award of the Science and Technology Agency in 1997, the IPSJ Sakai Special Researcher Award in 2002, the Standardization Contribution Award in 2003, Engineering Sciences Society: Certificate of Appreciation in 2005, the AWARD for the contribution to CULTURE of SECURITY in 2007, IPSJ/ITSCJ Project Editor Award in 2007, 2008, 2009, 2010, 2012, 2016, and the Director-General of Industrial Science and Technology Policy and Environment Bureau Award in 2007, DoCoMo Mobile Science Awards in 2008, ADMA 2010 Best Paper Award, Prizes for Science and Technology, The Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology, ATIS 2016 Best Paper Award, IEEE Trustocm 2017 Best Paper Award, and IEICE milestone certification in 2017.


Tadashi Nomoto

The SOKENDAI Graduate School of Advanced Studies

Bio

Tadashi Nomoto is currently an associate professor at Graduate University for Advanced Studies (SOKENDAI) with a joint appointment to National Institute of Japanese Literature. He has been actively engaged in the area of natural language processing and information retrieval for more than a decade, both in academia and in industry. His research interests include computational linguistics, digital library, data mining, machine translation, and quantitative media analysis. He has published extensively in major international conferences (the likes of SIGIR, ACL, ICML, CIKM). He holds an MA in Linguistics from Sophia University, Japan, and a PhD in Computer Science from Nara Institute of Science and Technology located also in Japan.


Sinno Jialin Pan

Nanyang Technological University

Bio

Dr Sinno Jialin Pan is a Provost’s Chair Associate Professor with the School of Computer Science and Engineering, and Deputy Director of the Data Science and AI Research Centre at Nanyang Technological University (NTU), Singapore. He received his Ph.D. degree in computer science from the Hong Kong University of Science and Technology (HKUST) in 2011. Prior to joining NTU, he was a scientist and Lab Head of text analytics with the Data Analytics Department, Institute for Infocomm Research, Singapore from Nov. 2010 to Nov. 2014. He joined NTU as a Nanyang Assistant Professor (university named assistant professor) in Nov. 2014. He was named to “AI 10 to Watch” by the IEEE Intelligent Systems magazine in 2018. His research interests include transfer learning, and its applications to wireless-sensor-based data mining, text mining, sentiment analysis, and software engineering.


Tim Pan

Microsoft Research

Bio

Dr. Tim Pan is the senior director of Outreach of Microsoft Research Asia, responsible for the lab’s academic collaboration in the Asia-Pacific region. He establishes strategies and directions, identifies business opportunities, and designs various programs and projects that strengthen partnership between Microsoft Research and academia.


Xueming Qian

Xi’an Jiaotong University

Bio

Xueming Qian PhD/Professor, received the B.S. and M.S. degrees in Xi’an University of Technology, Xi’an, China, in 1999 and 2004, respectively, and the Ph.D. degree in the School of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, China, in 2008. He was awarded Microsoft fellowship in 2006, outstanding doctoral dissertation of Xi’an Jiaotong University and Shaanxi Province in 2010 and 2011 respectively. He is the director of SMILES LAB. He was a visit scholar at Microsoft research Asia from August 2010 to March 2011. His research interests include social mobile multimedia mining learning and search.


Huamin Qu

Hong Kong University of Science and Technology

Bio

Huamin Qu is a full professor in the Department of Computer Science and Engineering (CSE) at the Hong Kong University of Science and Technology (HKUST). His main research interests are in data visualization and human-computer interaction, with focuses on explainable AI, urban informatics, social media analysis, E-learning, and text visualization. He has served as paper co-chairs for IEEE VIS’14, VIS’15, and VIS’18 and an associate editor of IEEE Transactions on Visualization and Computer Graphics (TVCG). He received a BS in Mathematics from Xi’an Jiaotong University and a PhD in Computer Science from Stony Brook University.


Junichi Rekimoto

The University of Tokyo

Bio

Jun Rekimoto received his B.A.Sc., M.Sc., and Ph.D. in Information Science from Tokyo Institute of Technology in 1984, 1986, and 1996, respectively. From 1986 to 1994, he worked for the Software Laboratory of NEC. During 1992-1993, he worked in the Computer Graphics Laboratory at the University of Alberta, Canada, as a visiting scientist. Since 1994 he has worked for Sony Computer Science Laboratories (Sony CSL). In 1999 he formed, and has since directed, the Interaction Laboratory within Sony CSL.

Rekimoto’s research interests include computer augmented environments, mobile/wearable computing, virtual reality, and information visualization. He has authored dozens of refereed publications in the area of human-computer interactions, including ACM, CHI, and UIST. One of his publications was recognized with the 30th commemorative papers award from the Information Processing Society Japan (IPSJ) in 1992. He also received the Multi-Media Grand Prix Technology Award from the Multi-Media Contents Association Japan in 1998, the Yamashita Memorial Research Award from IPSJ in 1999, and the Japan Inter-Design Award in 2003. In 2007, He elected to ACM SIGCHI Academy.


Insik Shin

KAIST

Bio

Insik Shin is a professor in the School of Computing and a Chief Professor of Graduate School of Information Security at KAIST, Korea. He received a Ph.D. degree from the University of Pennsylvania. His research interests include real-time embedded systems, systems security, mobile computing, and cyber-physical systems. He serves on program committees of top international conferences, including RTSS, RTAS and ECRTS. He is a recipient of several best (student) paper awards, including MobiCom ’19, RTSS ’12, RTAS ’12, and RTSS ’03, KAIST Excellence Award, and Naver Young Faculty Award.


Jun Takamatsu

Nara Institute of Science and Technology

Bio

Jun Takamatsu received a Ph.D. degree in Computer Science from the University of Tokyo, Japan, in 2004. From 2004 to 2008, he was with the Institute of Industrial Science, the University of Tokyo. In 2007, he was with Microsoft Research Asia, as a visiting researcher. From 2008 to now, he joined Nara Institute of Science and Technology, Japan, as an associate professor. He was also with Carnegie Mellon University as a visitor in 2012 and 2013 and with Microsoft as a visiting scientist in 2018. His research interests are in robotics including learning-from-observation, task/motion planning, and feasible motion analysis, 3D shape modeling and analysis, and physics-based vision.


Mingkui Tan

South China University of Technology

Bio

Dr. Mingkui Tan is currently a professor with the School of Software Engineering at South China University of Technology, China. He received his Bachelor Degree in Environmental Science and Engineering in 2006 and Master degree in Control Science and Engineering in 2009, both from Hunan University in Changsha, China. He received the PhD degree in Computer Science from Nanyang Technological University, Singapore, in 2014. From 2014-2016, he worked as a Senior Research Associate on machine learning and computer vision in the School of Computer Science, University of Adelaide, Australia. His research interests include machine learning, sparse analysis, deep learning and large-scale optimization. He has published about 70 research papers in top-tier conferences such as NeurIPS, ICML and KDD and international peer-reviewed journals such as TNNLS, JMLR and TIP.


Xin Tong

Microsoft Research

Bio

I am now a principal researcher in Internet Graphics Group of Microsoft Research Asia . I obtained my Ph.D. degree in Computer Graphics from Tsinghua University in 1999. My Ph.D. thesis is about hardware assisted volume rendering. I got my B.S. Degree and Master Degree in Computer Science from Zhejiang University in 1993 and 1996 respectively.

My research interests include appearance modeling and rendering, texture synthesis, and image based modeling and rendering. Specifically, my research concentrates on studying the underline principles of material light interaction and light transport, and developing efficient methods for appearance modeling and rendering. I am also interested in performance capturing and facial animation.


Hongzhi Wang

Harbin Institute of Technology

Bio

Hongzhi Wang, Professor, PHD supervisor, Vice Dean of Honors School of Harbin Institute of Technology, the secretary general of ACM SIGMOD China, CCF outstanding member, a member of CCF databases and big data committee. Research Fields include big data management and analysis, database and data quality. He was “starring track” visiting professor at MSRA. He has been PI for more than 10 projects including NSFC key project, NSFC projects. He also serve as a member of ACM Data Science Task Force. His publications include over 200 papers including VLDB, SIGMOD, SIGIR papers, and 4 books. His papers were cited more than 1000 times. His personal website is http://homepage.hit.edu.cn/wang.


Liwei Wang

Peking University

Bio

Professor in School of Electronics Engineering and Computer Science, Peking University, researcher in Beijing Institute of Big Data Research, adjunct professor in Institute for Interdisciplinary Information Science, Tsinghua University. He was recognized by IEEE Intelligent Systems as one of AI’s 10 to Watch in 2010, the first Asian scholar since the establishment of the award. He received the NSFC excellent young researcher grant in 2012. He was also supported by program for New Century Excellent Talents in University by the Ministry of Education.


Hiroki Watanabe

Hokkaido University

Bio

Hiroki Watanabe is an assistant professor at Graduate School of Information Science and Technology, Hokkaido University, Japan. He received B. Eng. and M. Eng. and Ph.D. degrees from Kobe University in 2012, 2014, and 2017, respectively. He is working on wearable computing and ubiquitous computing.


Yonggang Wen

Nanyang Technological University

Bio

Dr. Yonggang Wen is the Professor Computer Science and Engineering (SCSE) at Nanyang Technological University (NTU), Singapore. He also serves as the Associate Dean (Research) at the College of Engineering, and the Director of Nanyang Technopreneurship Centre at NTU. He received his PhD degree in Electrical Engineering and Computer Science (minor in Western Literature) from Massachusetts Institute of Technology (MIT), Cambridge, USA, in 2007.

Dr. Wen has worked extensively in learning-based system prototyping and performance optimization for large-scale networked computer systems. In particular, his work in Multi-Screen Cloud Social TV has been featured by global media (more than 1600 news articles from over 29 countries) and received 2013 ASEAN ICT Awards (Gold Medal). His work on Cloud3DView, as the only academia entry, has won 2016 ASEAN ICT Awards (Gold Medal) and 2015 Datacentre Dynamics Awards – APAC (‘Oscar’ award of data centre industry). He is a co-recipient of 2015 IEEE Multimedia Best Paper Award, and a co-recipient of Best Paper Awards at 2016 IEEE Globecom, 2016 IEEE Infocom MuSIC Workshop, 2015 EAI/ICST Chinacom, 2014 IEEE WCSP, 2013 IEEE Globecom and 2012 IEEE EUC. He was the sole winner of 2016 Nanyang Awards in Entrepreneurship and Innovation at NTU, and received 2016 IEEE ComSoc MMTC Distinguished Leadership Award. He serves on editorial boards for ACM Transactions Multimedia Computing, Communications and Applications, IEEE Transactions on Circuits and Systems for Video Technology, IEEE Wireless Communication Magazine, IEEE Communications Survey & Tutorials, IEEE Transactions on Multimedia, IEEE Transactions on Signal and Information Processing over Networks, IEEE Access Journal and Elsevier Ad Hoc Networks, and was elected as the Chair for IEEE ComSoc Multimedia Communication Technical Committee (2014-2016). His research interests include cloud computing, blockchain, green data centre, distributed machine learning, big data analytics, multimedia network and mobile computing.


Wenfei Wu

Tsinghua University

Bio

Wenfei Wu is an assistant professor in the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University. Wenfei Wu obtained his Ph.D. from the CS department at the University of Wisconsin-Madison in 2015. Dr. Wu’s research interests are in networked systems, including architecture design, data plane optimization, and network management optimization. He was awarded the best student paper in SoCC’13. Currently, Dr. Wu is working on model-centric DevOps for network functions, in-network computation for distributed systems (including distributed neural networks and big data systems), and secure network protocol design.


Yingcai Wu

Zhejiang University

Bio

Yingcai Wu is a National Youth-1000 scholar and a ZJU100 Young Professor at the State Key Lab of CAD & CG, College of Computer Science and Technology, Zhejiang University. He obtained his Ph.D. degree in Computer Science from the Hong Kong University of Science and Technology (HKUST). Prior to his current position, Yingcai Wu was a researcher in the Microsoft Research Asia, Beijing, China from 2012 to 2015, and a postdoctoral researcher at the University of California, Davis from 2010 to 2012. He was a paper co-chair of IEEE Pacific Visualization 2017 and ChinaVis 2016-2017. His main research interests are in visual analytics and human-computer interaction, with focuses on sports analytics, urban computing, and social media analysis. He has published more than 50 refereed papers, including 25 IEEE Transactions on Visualization and Computer Graphics (TVCG) papers. His three papers have been awarded Honorable Mention at IEEE VIS (SciVis) 2009, IEEE VIS (VAST) 2014, and IEEE PacificVis 2016. For more information, visit www.ycwu.org


Hiroaki Yamane

RIKEN AIP & The University of Tokyo

Bio

Hiroaki Yamane is a post-doctoral researcher at RIKEN AIP and a visiting researcher at the University of Tokyo. He completed his PhD at Keio University where he proposed slogan generating systems. After PhD acquisition, he was dedicated to brain decoding and currently is working on building machine intelligence for medical engineering at RIKEN AIP. Because he has a strong interest in human intelligence, sensitivity, and health, his research interests include: word embedding on commonsense, sentiment analysis, sentence generation, and domain adaptation. He is more broadly interested in multidisciplinary areas natural language processing, computer vision, cognitive & neuroscience, and AI applications to medical.


Rui Yan

Peking University

Bio

Dr. Rui Yan is an assistant professor at Peking University, an adjunct professor at Central China Normal University and Central University of Finance and Economics, and he was a Senior Researcher at Baidu Inc. He has investigated several open-domain conversational systems and dialogue systems in vertical domains. Till now he has published more than 100 highly competitive peer-reviewed papers. He serves as a (senior) program committee member of several top-tier venues (such as KDD, SIGIR, ACL, WWW, IJCAI, AAAI, CIKM, and EMNLP, etc.).


Chuck Yoo

Korea University

Bio

Chuck Yoo received B.S. degree from Seoul National University in 1982, and M.S. and Ph.D degrees from University of Michigan, Ann Arbor, Michigan in 1986 and 1990 respectively. From 1990 to 1995, he was with Sun Microsystems, Mountain View, California, working on Sun’s operating systems. In 1995, he joined the computer science department of Korea University and served the dean of the College of Informatics for 5 years until Jan. of 2018.

He has been working on virtualization, starting with hypervisor for mobile phones, virtualized automotive platform, integrated SLA (service level agreement) for clouds and network virtualization including virtual routers and SDN. He hosted Xen Summit in Seoul in 2011 and served program committees of various conferences. In addition to publishing quite a number of papers, his research has influenced global industry leaders such as Samsung and LG to inspire and enhance their products.

Recently, he is working with the College of Medicine for precision medicine and also with the College of Law to bring up new and revised legislative bills for the fourth industrial revolution.


Sung-eui Yoon

KAIST

Bio

Sung-Eui Yoon is a professor at Korea Advanced Institute of Science and Technology (KAIST). He received the B.S. and M.S. degrees in computer science from Seoul National University in 1999 and 2001, respectively. He received his Ph.D. degree in computer science from the University of North Carolina at Chapel Hill in 2005. He was a postdoctoral scholar at Lawrence Livermore National Laboratory, USA. His research interests include graphics, vision, and robotics. He has published about 100 technical papers, and gave numerous tutorials on ray tracing, collision detection, and image search in premier conferences like ACM SIGGRAPH, IEEE Visualization, CVPR, ICRA, etc. He served as conf. co-chair and paper co-chair for ACM I3D 2012 and 2013 respectively. At 2008, he published a monograph on real-time massive model rendering with other three co-authors. Recently, we also published an online book on Rendering at 2018. Some of his papers received a test-of-time award, a distinguished paper award, and a few invitations to IEEE Trans. on Visualization and Graphics. He is currently senior members of IEEE and ACM.


Masatoshi Yoshikawa

Kyoto University

Bio

Masatoshi Yoshikawa received the B.E., M.E. and Ph.D. degrees from Department of Information Science, Kyoto University in 1980, 1982 and 1985, respectively. In 1985, he joined The Institute for Computer Sciences, Kyoto Sangyo University as an Assistant Professor. From April 1989 to March 1990, he has been a Visiting Scientist at the Computer Science Department of University of Southern California (USC). In 1993, he joined Nara Institute of Science and Technology as an Associate Professor of Graduate School of Information Science. From April 1996 to January 1997, he has stayed at Department of Computer Science, University of Waterloo as a Visiting Associate Professor. From June 2002 to March 2006, he served as a professor at Nagoya University. From April 2006, he has been a professor of Graduate School of Informatics, Kyoto University.

One of his current research topics is theory and practice of privacy protection. As a basic research, he investigated the potential privacy loss of a traditional Differential Privacy (DP) mechanism under temporal correlations. He is also interested in personal data market. Particularly, he is studying a mechanism for pricing and selling personal data perturbed by DP.

He was a General Co-Chair of the 6th IEEE International Conference on Big Data and Smart Computing (BigComp 2019). He is a Steering Committee member of the International Conference on Big Data and Smart Computing (BigComp), He is serving as a PC member of VLDB2020 and ICDE2030. He is member of the IEEE ICDE Steering Committee, Science Council of Japan (SCJ), ACM, IPSJ and IEICE.


Huanjing Yue

Tianjin University

Bio

Huanjing Yue received the B.S. and Ph.D. degrees from Tianjin University, Tianjin, China, in 2010 and 2015, respectively. She was an Intern with Microsoft Research Asia from 2011 to 2012, and from 2013 to 2015. She visited the Video Processing Laboratory, University of California at San Diego, from 2016 to 2017. She is currently an Associate Professor with the School of Electrical and Information Engineering, Tianjin University. Her current research interests include image processing and computer vision. She received the Microsoft Research Asia Fellowship Honor in 2013 and was selected into the Elite Scholar Program of Tianjin University in 2017.


Lijun Zhang

Nanjing University

Bio

Lijun Zhang received the B.S. and Ph.D. degrees in Software Engineering and Computer Science from Zhejiang University, China, in 2007 and 2012, respectively. He is currently an associate professor of the Department of Computer Science and Technology, Nanjing University, China. Prior to joining Nanjing University, he was a postdoctoral researcher at the Department of Computer Science and Engineering, Michigan State University, USA. His research interests include machine learning and optimization. He has published 80 academic papers, most of which are on prestigious conferences and journals, such as ICML, NeurIPS, COLT and JMLR. He received the DAMO Academy Young Fellow of Alibaba, and AAAI-12 Outstanding Paper Award.


Min Zhang

Tsinghua University

Bio

Dr. Min Zhang is a tenured associate professor in the Dept. of Computer Science & Technology, Tsinghua University, specializes in Web search and recommendation, and user modeling. She is the vice director of State Key Lab. of Intelligent Technology & Systems, the executive director of Tsinghua-MSRA Lab on Media and Search. She also serves as the ACM SIGIR Executive Committee member, associate editor for the ACM Transaction of Information Systems (TOIS), Short Paper co-Chair of SIGIR 2018, Program co-Chair of WSDM 2017, etc. She has published more than 100 papers on top level conferences with 4100+ citations. She was awarded Beijing Science and Technology Award (First Prize), etc. She also owns 12 patents. And she has made a lot of cooperation with international and domestic enterprises, such as Microsoft, Toshiba, Samsung, Sogou, WeChat, Zhihu, JD, etc


Tianzhu Zhang

University of Science and Technology of China

Bio

Tianzhu Zhang is currently a Professor at the Department of Automation, School of Information Science and Technology, University of Science and Technology of China. His current research interests include pattern recognition, computer vision, multimedia computing, and machine learning. He has authored or co-authored over 80 journal and conference papers in these areas, including over 60 IEEE/ACM Transactions papers (TPAMI/IJCV/TIP) and top-tier conference papers (ICCV/CVPR/ACM MM). According to the Google Scholar, his papers have been cited more than 4900 times. His work has been recognized by 2017 China Multimedia Conference Best Paper Award and 2016 ACM Multimedia Conference Best Paper Award (CCF-A). He has got Chinese Academy of Sciences President Award of Excellence in 2011, Excellent Doctoral Dissertation of Chinese Academy of Sciences in 2012, Youth Innovation Promotion Association CAS in 2018, and the Natural Science Award (first Prize) of Chinese Institute of Electronics in 2018. He served/serves as the Area Chair for CVPR 2020, ICCV 2019, ACM MM 2019, WACV 2018, ICPR 2018, and MVA 2017, the Associate Editor for IEEE T-CSVT and Neurocomputing. He received the outstanding reviewer award in MMSJ, ECCV 2016 and CVPR 2018.


Yu Zhang

University of Science & Technology of China

Bio

Yu Zhang is an associate professor in School of Computer Science & Technology, University of Science and Technology of China (USTC). She got her Ph.D. at USTC in Jan. 2005. Her current research interests include programming languages and systems for emerging AI applications, quantum software.


Zhou Zhao

Zhejiang University

Bio

Zhou Zhao received his Ph.D. from the Hong Kong University of Science and Technology in 2015. He subsequently worked at Zhejiang University as an associate professor and doctoral supervisor. Zhao’s main research interests are in natural language processing and multimedia key technology research and development. Zhao is a fellow of the Association for Computing Machinery(ACM),a fellow of the Institute of Electrical and Electronics Engineers(IEEE),and a fellow of the China Computer Federation(CCF).In addition, he release more than sixty papers on the top international conference, such as NIPS, CLR, ICML. Zhao was rewarded the Innovation Award of the Information Department of Zhejiang University the title of the Outstanding Youth in Zhejiang.


Wei-Shi Zheng

Sun Yat-sen University

Bio

Dr. Wei-Shi Zheng is now a Professor with Sun Yat-sen University. Dr. Zheng received the PhD degree in Applied Mathematics from Sun Yat-sen University in 2008. He is now a full Professor at Sun Yat-sen University. He has now published more than 100 papers, including more than 80 publications in main journals (TPAMI, TNN/TNNLS, TIP, TSMC-B, PR) and top conferences (ICCV, CVPR, IJCAI, AAAI). He has joined the organisation of four tutorial presentations in ACCV 2012, ICPR 2012, ICCV 2013 and CVPR 2015. His research interests include person/object association and activity understanding in visual surveillance, and the related large-scale machine learning algorithm. Especially, Dr. Zheng has active research on person re-identification in the last five years. He serves a lot for many journals and conference, and he was announced to perform outstanding review in recent top conferences (ECCV 2016 & CVPR 2017). He has ever joined Microsoft Research Asia Young Faculty Visiting Programme. He has ever served as a senior PC/area chair/associate editor of AVSS 2012, ICPR 2018, IJCAI 2019/2020, AAAI 2020 and BMVC 2018/2019. He is an IEEE MSA TC member. He is an associate editor of Pattern Recognition. He is a recipient of Excellent Young Scientists Fund of the National Natural Science Foundation of China, and a recipient of Royal Society-Newton Advanced Fellowship of United Kingdom.

Technology Showcase

Technology Showcase by Microsoft Research Asia

AutoSys: Learning based approach for system optimization

Presenter: Mao Yang, Microsoft Research

As computer systems and networking get increasingly complicated, optimizing them manually with explicit rules and heuristics becomes harder than ever before, sometimes impossible. At Microsoft Research Asia, our AutoSys project applies learning to large-scale system performance tuning. Our AutoSys framework (1) defines interfaces to expose system features for learning, (2) introduces monitors to detect learning-induced failures, and (3) runs resource management to support heterogenous requirements of learning-related tasks. Based on AutoSys, we have built a tool to help many crucial system scenarios within Microsoft. These scenarios include multimedia search for Bing (e.g., tail latency reduced by up to ~40%, and capacity increased by up to ~30%), job scheduling for Bing Ads (e.g., tail latency reduced by up to ~13%), and so on.

Dual Learning and Its Applications to Machine Translation and Speech Synthesis

Presenter: Yingce Xia and Xu Tan, Microsoft Research

Many AI tasks are emerged in dual forms, e.g., English-to-French translation vs. French-to-English translation, speech recognition vs. speech synthesis, question answering vs. question generation, and image classification vs. image generation. Dual learning is a new learning framework that leverages the primal-dual structure of AI tasks to obtain effective feedback or regularization signals to enhance the learning/inference process. In this demo, we will show two applications of dual learning: machine translation and speech synthesis.

Fluency Boost Learning and Inference for Neural Grammar Checker

Presenter: Tao Ge, Microsoft Research

Neural sequence-to-sequence (seq2seq) approaches have proven to be successful in grammatical error correction (GEC). Based on the seq2seq framework, we propose a novel fluency boost learning and inference mechanism. Fluency boosting learning generates diverse error-corrected sentence pairs during training, enabling the error correction model to learn how to improve a sentence’s fluency from more instances, while fluency boosting inference allows the model to correct a sentence incrementally with multiple inference steps. Combining fluency boost learning and inference with conventional seq2seq models, our approach achieves the state-of-the-art performance in the GEC benchmarks.

OneOCR For Digital Transformation

Presenter: Qiang Huo, Microsoft Research

In Microsoft, we have been developing a new generation OCR engine (aka OneOCR), which can detect both printed and handwritten text in an image captured by a camera or mobile phone, and recognize the detected text for follow-up actions. Our unified OneOCR engine can recognize mixed printed and handwritten English text lines with arbitrary orientations (even flipped), outperforming significantly other leading industrial OCR engines on a wide range of application scenarios. Empowered by OneOCR engine, Computer Vision Read capability and Cognitive Search capability of Azure Search are generally available, and a Form Recognizer with Receipt Understanding capability is available for preview, all in Azure Cognitive Services, which can power enterprise workflows and Robotic Process Automation (RPA) to spur digital transformation. In this presentation, I will demonstrate the capabilities of Microsoft’s latest OneOCR engine, highlight its core component technologies, and explain the roadmap ahead.

Spreadsheet Intelligence for Ideas in Excel

Presenter: Shi Han, Microsoft Research

Ideas in Excel aims at such one-click intelligence—when a user clicks the Ideas button on the Home tab of Excel, the intelligent service will empower the user to understand his or her data via automatic recommendation of visual summaries and interesting patterns. Then the user can insert the recommendations to the spreadsheet to help further analysis or as analysis result directly. To enable such one-click intelligence, there are underlying technical challenges to solve. At the Data, Knowledge and Intelligence group of Microsoft Research Asia, we have long-term research on spreadsheet intelligence and automated insights accordingly. And via close collaboration with Excel product teams, we transferred a suite of technologies and shipped Ideas in Excel together. In this demo presentation, we will show this intelligent feature and introduce corresponding technologies.

Technology Showcase by Academic Collaborators

3D Caricature Generation from Real Face Images

Presenter: Yucheol Jung, Wonjong Jang, and Seungyong Lee, POSTECH

A 3D caricature can be defined as a 3D mesh with cartoon-style shape exaggeration of a face. We present a novel deep learning based framework that generates a 3D caricature for a given real face image. Our approach exploits 3D geometry information in the caricature generation process and produces more convincing 3D shape exaggerations than 2D caricature-based approaches.

A Co-Training Method towards Machine Reading Comprehension

Presenter: Minlie Huang, Tsinghua University

A Co-Training Method towards Machine Reading Comprehension

A Method for Controlling Human Hearing by Editing the Frequency of the Sound in Real Time

Presenter: Hiroki Watanabe, Hokkaido University

A Method for Controlling Human Hearing by Editing the Frequency of the Sound in Real Time

Abstractive Summarization of Reddit Posts with Multi-level Memory Networks

Presenter: Gunhee Kim, Seoul National University

We address the problem of abstractive summarization in two directions: proposing a novel dataset and a new model. First, we collect Reddit TIFU dataset, consisting of 120K posts from the online discussion forum Reddit. We use such informal crowd-generated posts as text source, in contrast with existing datasets that mostly use formal documents as source such as news articles. Thus, our dataset could less suffer from some biases that key sentences usually locate at the beginning of the text and favorable summary candidates are already inside the text in similar forms. Second, we propose a novel abstractive summarization model named multi-level memory networks (MMN), equipped with multi-level memory to store the information of text from different levels of abstraction. With quantitative evaluation and user studies via Amazon Mechanical Turk, we show the Reddit TIFU dataset is highly abstractive and the MMN outperforms the state-of-the-art summarization models.

Adaptive Graph Structure Learning for Image Sentence Matching

Presenter: TianZhu Zhang, University of Science and Technology of China

We adapt the attention mechanism for visual and semantic elements representation.

We adaptively construct graphs and update the features for objects and words, making good use of both the intra modality relationship and inter modality relationship.

We consider the structure information across different graphs by proposing a constraint on the semantic element, forcing the semantic element aligning to the corresponded visual element.

The proposed model obtains the promising results on dataset Flickr30K and MS-COCO.

Adversarial Attacks and Defenses in Deep Learning

Presenter: Yinpeng Dong, Tsinghua University

Adversarial Attacks and Defenses in Deep Learning

AI+VIS: Automated Visualization Production

Presenter: Huamin Qu, The Hong Kong University of Science and Technology

Existing visualization designs are often based on manual design and need lots of human efforts. How can we apply deep learning techniques to automatically generating visualization products? We report our two recent progresses on this direction:

Automated Graph Drawing: We propose a graph-LSTM-based model to directly generate graph drawings with desirable visual properties similar to the training drawings, which do not need users to tune different algorithm-specific parameters.

Automated Design of Timeline Infographics: We contribute an end-to-end approach to automatically extract an extensible template from a bitmap timeline image. The output can be used to generate new timelines with updated data.

Blockchain-Enabled Incentive and Trading Mechanism Design for AIoT Service Platform

Presenter: Ai-Chun Pang, National Taiwan University

Ensure data effectiveness by the blockchain technology so as to hold data properties like immutability and credibility during the whole transaction process.

Bypassing Defense Methods for Neural Network Backdoor

Presenter: Sangwoo Ji and Jong Kim, POSTECH

Bin Zhu, Microsoft Research

Bypass two backdoor detection method: suspicious data instance detection and backdoor trigger detection.

Can Kernel Networking Become Fast Enough?

Presenter: Chuck Yoo, Korea University

  • Existing network optimizations suffer from poor stability, low resource efficiency, and a need for API changes
  • Solution: Kernel-based optimization for high-performance networking
  • L3 forwarding achieves performance similar to DPDK
  • A virtual switch achieves 67.5% performance of DPDK-OVS and three times greater resource efficiency

CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation

Presenter: Xiangyang Ji, Tsinghua University

CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation

Commonsense Reasoning with Structured Knowledge

Presenter: Hongming Zhang, The Hong Kong University of Science and Technology

Understanding human‘s language requires complex commonsense knowledge. However, existing large-scale knowledge graphs mainly focus on knowledge about entities while ignoring commonsense knowledge about activities, states, or events, which are used to describe how entities or things act in the real world. To fill this gap, we develop ASER (activities, states, events, and their relations), a large-scale eventuality knowledge graph extracted from more than 11-billion-token unstructured textual data. ASER contains 15 relation types belonging to five categories, 194-million unique eventualities, and 64-million unique edges among them. Both human and extrinsic evaluations demonstrate the quality and effectiveness of ASER.

Complex Correlation Modeling and Analysis Framework for Incomplete, Multimodal and Dynamic Data

Presenter: Zizhao Zhang, Tsinghua University

A well constructed hypergraph structure can represent the data correlation accurately, yet leading to better performance.How to construct a good hypergraph to fit complex data?

Concordia: Distributed Shared Memory with In-Network Cache Coherence

Presenter: Youyou Lu, Tsinghua University

Divides coherence responsibility between the switch and servers. The switch serializes conflicted requests and forwards them to correct destinations via a lock-check-forward pipeline. Servers execute requester-driven coherence control to reach coherence and transit states.

Continual Learning with Dynamic Network Expansion

Presenter: Sung Ju Hwang, KAIST

  • Perform effective knowledge transfer from earlier tasks to later tasks.
  • Prevent catastrophic forgetting, where the earlier task performance gets negatively affected by semantic drift of the representations as the model adapts to later tasks.
  • Obtain maximal performance with minimal increase in the network capacity.

Counting Hypergraph Colorings in the Local Lemma Regime

Presenter: Chao Liao, Shanghai Jiao Tong University

Counting Hypergraph Colorings in the Local Lemma Regime

Cross-Lingual Visual Grounding and Multimodal Machine Translation

Presenter: Chenhui Chu, Osaka University

Cross-Lingual Visual Grounding and Multimodal Machine Translation

Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty

Presenter: Gunhee Kim, Seoul National University

Exploration based on state novelty has brought great success in challenging reinforcement learning problems with sparse rewards. However, existing novelty-based strategies become inefficient in real-world problems where observation contains not only task-dependent state novelty of our interest but also task-irrelevant information that should be ignored. We introduce an information- theoretic exploration strategy named Curiosity-Bottleneck that distills task-relevant information from observation. Based on the information bottleneck principle, our exploration bonus is quantified as the compressiveness of observation with respect to the learned representation of a compressive value network. With extensive experiments on static image classification, grid-world and three hard-exploration Atari games, we show that Curiosity-Bottleneck learns an effective exploration strategy by robustly measuring the state novelty in distractive environments where state-of-the-art exploration methods often degenerate.

Deep Reinforcement Learning for the Transfer from Simulation to the Real World with Uncertainties for AI Curling Robot System

Presenter: Dong-Ok Won and Seong-Whan Lee, Korea University

Recently, deep reinforcement learning (DRL) has even enabled real world applications such as robotics. Here we teach a robot to succeed in curling (Olympic discipline), which is a highly complex real-world application where a robot needs to carefully learn to play the game on the slippery ice sheet in order to compete well against human opponents. This scenario encompasses fundamental challenges: uncertainty, non-stationarity, infinite state spaces and most importantly scarce data. One fundamental objective of this study is thus to better understand and model the transfer from simulation to real-world scenarios with uncertainty. We demonstrate our proposed framework and show videos, experiments and statistics about Curly our AI curling robot being tested on a real curling ice sheet. Curly performed well both, in classical game situations and when interacting with human opponents.

Deep Text Generation: Conversation and Application

Presenter: Rui Yan, Peking University

Deep Text Generation: Conversation and Application

Development of 3D capsule endoscopic system

Presenter: Ryo Furukawa, Hiroshima City University

Development of 3D capsule endoscopic system

Development of automatic Labanotation estimation system from video using Deep Learning

Presenter: Hiroshi Kawasaki, Kyushu University

Our project aims to research on human representation and understanding human motion based on vision-based approach and develop new applications.

Dissecting and Accelerating Neural Network via Graph Instrumentation

Presenter: Jingwen Leng, Shanghai Jiao Tong University

The proposed graph instrumentation framework can observe and modify neural networks using user-defined analysis code without changes in source code.

Distant Supervised Domain-Specific Knowledge Base Construction and Population

Presenter: Lei Chen, The Hong Kong University of Science and Technology

Our Goal in Domain-Specific KB Construction

  • Entity Extraction, Entity Typing and Relation Extraction related to the target domain.
  • Training data generation based on distant-supervision without human annotation.

Efficient and Effective Sparse DNNs with Bank-Balanced Sparsity

Presenter: Shijie Cao, Harbin Institute of Technology

Efficient and Effective Sparse DNNs with Bank-Balanced Sparsity

Efficient Deep Neural Networks for Realistic Noise Removal

Presenter: Huanjing Yue, Tianjin University

We propose an end-to-end noise estimation and removal network, where the estimated noise map is weighted concatenated with the noisy input to improve the denoising performance.

The proposed noise estimation network takes advantage of the Bayer pattern prior of the noise maps, which not only improves the estimation accuracy but also reduces the memory cost.

We propose a RSD block to fully take advantage of the spatial and channel correlations of realistic noise. The ablation study demonstrates the effectiveness of the proposed module.

Emoji-Powered Representation Learning for Cross-Lingual Sentiment Analysis

Presenter: Zhenpeng Chen, Peking University

Emoji-Powered Representation Learning for Cross-Lingual Sentiment Analysis

Erebus: A Stealthier Partitioning Attack against Bitcoin Peer-to-Peer Network

Presenter: Muoi Tran, National University of Singapore

We present the Erebus attack, which allows large malicious Internet Service Providers (ISPs) isolate any targeted public Bitcoin nodes from the Bitcoin peer-to-peer network. The Erebus attack does not require routing manipulation (e.g., BGP hijacks) and hence it is virtually undectable to any control-plane and even typical data-plane detectors.

Explaining Word Embeddings via Disentangled Representations

Presenter: Shou-de Lin, National Taiwan University

We propose transforming word embeddings into interpretable representations disentangling explainable factors

Examples of factors: a) Topical factors: food, location, animal, etc. b) Part-of-Speech factors: noun, adj, verb, etc.

We define and propose 4 desirable properties of our disentangled word vectors: a) Modularity, b) Compactness, c) Explicitness, d) Feature preservation

Free-form Video Inpainting with 3D Gated Conv, TPD, and LGTSM

Presenter: Winston Hsu, National Taiwan University.

Free-form Video Inpainting with 3D Gated Conv, TPD, and LGTSM

Fluid: A Blockchain based Framework for Crowdsourcing

Presenter: Lei Chen, The Hong Kong University of Science and Technology

Fluid: A Blockchain based Framework for Crowdsourcing

FLUID: Flexible User Interface Distribution for Ubiquitous Multi-device Interaction

Presenter: Insik Shin, KAIST

Key idea: separation between app logic & UI parts1) Distributing target UI objects to remote devices and rendering them2) Giving an illusion as if app logic and UI objects were in the same process

Fuzzing with Interleaving Coverage for Multi-threading Program

Presenter: Youngjoo Ko and Jong Kim, POSTECH

Bin Zhu, Microsoft Research

Increase the performance of fuzzing to discover more bugs in multi-threading programs using interleaving coverage.

Generative Model-based Speech Enhancement for Speech Recognition

Presenter: Jinyoung Lee and Hong-Goo Kang, Yonsei University

  • Remove ambient noise to improve automatic speech recognition performance
  • Overcome the problems of conventional masking-based speech enhancement algorithms, e.g. speech signal distortion
  • Propose a generative and adversarial model-based approach that effectively utilizes spectro-temporal characteristics of speech and noise components

Global-Local Temporal Representations For Video Person Re-Identification

Presenter: Shiliang Zhang, Peking University

  • Propose Dilated Temporal Convolution (DTC) to learn short-term temporal cues
  • Propose Temporal Self Attention (TSA) to learn the long-term temporal cues
  • DTC and TSA learn complementary temporal feature

Gradient Descent Finds Global Minima of DNNs

Presenter: Liwei Wang, Peking University

Gradient Descent Finds Global Minima of DNNs

Graph Neural Networks for 3D Face Anti-spoofing

Presenter: Wei HU and Gusi Te, Peking University

This project aims to explore the emerging graph neural networks (GNN) based on texture plus depth features to address the problem of 3D face anti spoofing. Various spoofing attacks are growing by presenting a fake or copied facial evidence to obtain valid authentication. While anti spoofingtechniques using 2D facial data have matured, 3D face anti spoofing hasn’t been studied much, thus allowing advanced spoofing techniques such as 3D masking at large. Hence, we propose to address this problem, based on texture plus depth cues acquired from RGBD cameras, and in the framework of GNN.

Graph-structured Knowledge Base Management and Applications

Presenter: Hongzhi Wang, Harbin Institute of Technology

Graph-structured Knowledge Base Management and Applications

Home Location Selection with Reachability

Presenter: YingcaiWu, Zhejiang University

This study characterizes the problem of reachabilitycentric multi-criteria decision-making for choosing ideal homes.The system can also be adopted inother location selection scenarios, in which the reachability of locations is considered (e.g., selecting a location for a convenience store).

Identifying Structures in Spreadsheets

Presenter: Wensheng Dou, Chinese Academy of Sciences

Identifying Structures in Spreadsheets

Image-to-Image Translation via Group-wise Deep Whitening-and-Coloring Transformation

Presenter: Jaegul Choo, Korea University

Recently, unsupervised exemplar-based image-to-image translation has accomplished substantial advancements. In order to transfer the information from an exemplar to an input image, existing methods often use a normalization technique, e.g., adaptive instance normalization, that controls the channel-wise statistics of an input activation map at a particular layer, such as the mean and the variance. Meanwhile, style transfer approaches similar task to image translation by nature, demonstrated superior performance by using the higher-order statistics such as covariance among channels in representing a style. However, applying this approach in image translation is computationally intensive and error-prone due to the expensive time complexity and its non-trivial backpropagation. In response, this paper proposes an end-to-end approach tailored for image translation that efficiently approximates this transformation with our novel regularization methods. We further extend our approach to a group-wise form for memory and time efficiency as well as image quality. Extensive qualitative and quantitative experiments demonstrate that our proposed method is fast, both in training and inference, and highly effective in reflecting the style of an exemplar.

Immersive Biology - An Interactive Microscope for Informal Biology Education

Presenter: Jaewoo Jung, Kyungwon Lee and Seung Ah Lee, Yonsei University

We developed a new hybrid digital-biological system that provide interactive and immersive experiences between humans and biological objects for applications in life science education and research. The scope of this work includes;

  • Construction of an automated optical stimulation microscope, which uses light to both image and interface with light-sensitive cells.
  • Use of human interaction modalities to convert human’s natural input into stimuli for the microscopic biological objects.
  • Comparative user study as a public installation that evaluated user behaviors, user engagement and learning outcomes.

We expect that this platform will transform microscopes from a passive observation tool to an active interaction medium, assisting scientific research, life science education and clinical interventions.

Improving Join Reorderability with Compensation Operators

Presenter: TaiNing Wang and Chee-Yong Chan, National University of Singapore

Improving Join Reorderability with Compensation Operators

Improving the Performance of Video Analytics Using WIFI Signal

Presenter: Hai Truong, Rajesh Krishna Balan, Singapore Management University

Automatic analysis of the behaviour of large groups of people is an important requirement for a large class of important applications such as crowd management, traffic control, and surveillance. For example, attributes such as the number of people, how they are distributed, which groups they belong to, and what trajectories they are taking can be used to optimize the layout of a mall to increase overall revenue. A common way to obtain these attributes is to use video camera feeds coupled with advanced video analytics solutions. However, solely utilizing video feeds is challenging in high people-density areas, such as a normal mall in Asia, as the high people density significantly reduces the effectiveness of video analytics due to factors such as occlusion. In this work, we propose to combine video feeds with WiFi data to achieve better classification results of the number of people in the area and the trajectories of those people. In particular, we believe that our approach will combine the strengths. of the two different sensors, WiFi and video, while reducing the weaknesses of each sensor. This work has started fairly recently and we will present our thoughts and current results up to now.

Intelligent Action Analytics

Presenter: Jiaying Liu, Peking University

Intelligent Action Analytics

Interactive Methods to Improve Data Quality

Presenter: Changjian Chen, Tsinghua University

Interactive Methods to Improve Data Quality

Inter-learner shadowing framework for comprehensibility-based assessment of learners' speech

Presenter: Nobuaki MINEMATSU, University of Tokyo

Inter-learner shadowing framework for comprehensibility-based assessment of learners’ speech

IoTcube: An Open Platform for Feedback based Protocol Fuzzing

Presenter: Heejo Lee, Korea University

An open platform for feedback based fuzzing improves its testing performance using two factors: binary feedback and user feedback.

Learning Multi-label Feature for Fine-Grained Food Recognition

Presenter: Xueming Qian, Xi’an Jiaotong University

1.We proposed Attention Fusion Network (AFN). it pay attention to food discrimination region against unstru-ctured defeat, and generate the feature embeddings jointly aware the ingredients and food.

2.We proposed the balance focal loss (BFL) to enhance the joint learning of ingredients and food, optimize feature expression ability for multi-label ingredients

3. The effectiveness is proved through the comparative experiments.  In particular, the use of balance focal loss make the Micro-F1, Macro-F1 and Accuracy of ingredi-ents improved by 5.76%, 12.62% and 5.78%.

MAP Inference for Customized Determinantal Point Processes via Maximum Inner Product Search

Presenter: Insu Han, KAIST

MAP Inference for Customized Determinantal Point Processes via Maximum Inner Product Search

Minimizing Network Footprint in Distributed Deep Learning

Presenter: Hong Xu, City University of Hong Kong

Minimizing Network Footprint in Distributed Deep Learning

Multilingual End-to-End Speech Translation

Presenter: Hirofumi Inaguma, Kyoto University

Directly translate source speech to target languages with a single sequence-to-sequence (S2S) model

  • Many-to-many (M2M)
  • One-to-many (O2M)

Outperformed the bilingual end-to-end speech translation (E2E-ST) models

Shared representations obtained from multilingual E2E-ST were more effective than those from the bilingual one for transfer learning to a very low-resource ST task: Mboshi->French (4.4h)

Multi-marginal Wasserstein GAN

Presenter: Mingkui Tan, South China University of Technology

  • We propose a novel MWGAN to optimize the multi-marginal distance among different domains.
  • We define and analyze the generalization performance of MWGAN for the multiple domain translation task.
  • Extensive experiments demonstrate the effectiveness of MWGAN on balanced and imbalanced translation tasks.

NAT: Neural Architecture Transformer for Accurate and Compact Architectures

Presenter: Mingkui Tan, South China University of Technology

  • Propose a novel Neural Architecture Transformer (NAT) to optimize any arbitrary architecture.
  • Cast the problem into a Markov Decision Process.
  • Employ Graph Convolution Network to learn the policy.

NFD: Using Behavior Models to Develop Cross-Platform NFs

Presenter: Wenfei Wu, Tsinghua University

We propose a new NF development framework named NFD which consists of an NF abstraction layer to develop NF behavior models and a compiler to adapt NF models to specific runtime environments.

Non-factoid Question Answering for Text and Video

Presenter: Seung-won Hwang, Yonsei University

Question Answering (QA) has been mostly studied in the context of factoid, providing concise facts. In contrast, we study Non-factoid QA, extending to cover more realistic questions such as how- or why-questions with long answers, from long texts or videos. This demo and poster address the following questions:

  • Non-factoid QA for text, combining the complementary strength of representation- and interaction-focused approaches [EMNLP19]. Extending this task for video has the opportunity and challenge, coming from multimodality and having no pre-divided answer candidates (e.g. paragraph), which is our ongoing MSRA collaboration.
  • Human-in-the-loop debugging for QA Demo [SIGIR19]

NPA: Neural News Recommendation with Personalized Attention

Presenter: Chuhan Wu, Tsinghua University

  • Different users usually have different interests in news.
  • Different users may click the same news article due to different interests.
  • We need personalized news and user representation!

Numerical/quantitative system for common sense natural language processing

Presenter: Hiroaki Yamane, The University of Tokyo

We construct methods for converting contextual language to numerical variables for quantitative/numerical common sense in natural language processing.

Online Convex Optimization in Non-stationary Environments

Presenter: Shiyin Lu, Nanjing University

Online Convex Optimization in Non-stationary Environments

Optimizing Quality of Experience (QoE) for Adaptive Bitrate Streaming via Deep Video Analytics

Presenter: Yonggang Wen, Nanyang Technological University

QoE depending multiple families of Influential Factors (IF), to be optimized jointly for the best user experience.

How to develop a unified and scalable framework to optimize QoE for multimedia communications, in the presence of system dynamics?

Paraphrasing and Simplification with Lean Vocabulary

Presenter: Tadashi Nomoto, National Institute of Japanese Literature

This work explores the impact of the subword representation on paraphrasing and text simplification. Experiments found that when combined with REINFORCE, the subword scheme boosted performance beyond the current state of the art both in paraphrasing and text simplification.

Pick-Carry-Place Household Tasks Using Labanotation for Learning-from-Observation Robots

Presenter: Jun Takamatsu, Nara Institute of Science and Technology

Pick-Carry-Place Household Tasks Using Labanotation for Learning-from-Observation Robots

Predicting Future Instance Segmentation with Contextual Pyramid ConvLSTMs

Presenter: Wei-Shi Zheng, Sun Yat-sen University

Predicting Future Instance Segmentation

  • Given several frames in a video, this task is to predict future instance segmentation before the corresponding frames are observed.
  • It is challenging due to the uncertainty in appearance variation caused by object moving, occlusion between objects, and viewpoint changing in videos.

Project Title: Secure and compact elliptic curve cryptosystems

Presenter: Yaoan Jin and Atsuko Miyaji, Graduate School of Engineering Osaka University

Any attack based on information, such as timing information and power consumption, gained from the implementation of a cryptosystem.

  • Simple Power Analysis (SPA)
  • Safe Error Attack

Pruning from Scratch

Presenter: Hang Su, Tsinghua University

In this work, we find that pre-training an over-parameterized model is not necessary for obtaining an efficient pruned structure. We propose a novel network pruning pipeline which allows pruning from scratch.

Recent Progress of Handwritten Mathematical Expression Recognition

Presenter: Jun Du, University of Science and Technology of China

Recent Progress of Handwritten Mathematical Expression Recognition

Recurrent Temporal Aggregation Framework for Deep Video Inpainting

Presenter: Dahun Kim, KAIST

  • To remove unwanted object from a video
  • Frame-by-frame image inpainting

Relational Knowledge Distillation

Presenter: Wonpyo park, Dongju Kim, and Minsu Cho, POSTECH

Yan Lu, Microsoft Research

Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. For concrete realizations of RKD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations. Experiments conducted on different tasks show that the proposed method improves educated student models with a significant margin. In particular for metric learning, it allows students to outperform their teachers’ performance, achieving the state of the arts on standard benchmark datasets.

Research on Deep Learning Framework for Julia

Presenter: Yu Zhang, YuxiangZhang, YitongHuang, Xing Guo, University of Science and Technology of China

Research on Deep Learning Framework for Julia

SARA: Self-Replay Augmented Record and Replay for Android in Industrial Cases

Presenter: Ting Liu, Xi’an Jiaotong University

SARA: Self-Replay Augmented Record and Replay for Android in Industrial Cases

secGAN: A Cycle-Consistent GAN for Securely-Recoverable Video Transformation

Presenter: Fengyuan Xu, Nanjing University

Video transformation needs to meet new requirements in actual use, such as privacy protection under surveillance scenarios:

  • The transformed video can be restored to the original ones.
  • The transformed video only can be restored by the authorized party.

We need a unified translation style and a unique stenography.

StyleMe: An AI Fashion Consultant for Personal Shopping and Style Advice

Presenter: Shintami Chusnul Hidayati, Institut Teknologi Sepuluh Nopember; Wen-Huang Cheng, National Chiao Tung University; Jianlong Fu, Microsoft Research

StyleMe: An AI Fashion Consultant for Personal Shopping and Style Advice

System support for designing efficient gradient compression algorithms for distributed DNN training

Presenter: Cheng Li, University of Science and Technology of China

System support for designing efficient gradient compression algorithms for distributed DNN training

Temporal Cause and Effect Localization on Car Crash Videos Via Multi-Task Neural Architecture Search

Presenter: Tackgeun You, POSTECH and Bohyung Han, Seoul National University

  • Introduce a benchmark for temporal cause and effect localization on car crash videos.
  • Propose a multi-task baseline for simultaneously conducting temporal cause and effect localization.
  • Propose a multi-task neural architecture search that decides to share or separate building blocks

Towards a Deep and Unified Understanding of Deep Neural Models in NLP

Presenter: Chaoyu Guan, Shanghai Jiao Tong University

A unified information based measure : quantify the information of each input word that is encoded in an intermediate layer of a deep NLP model.

The information based measure as a tool

  • Evaluating different explanation methods.
  • Explaining different deep NLP models

This measure enriches the capability of explaining DNNs.

Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation

Presenter: Ting Liu, Xi’an Jiaotong University

Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation

Vibration-Mediated Sensing Techniques for Tangible Interaction

Presenter: Seungmoon Choi and Seungjae Oh, POSTECH

  • Recognize contact finger(s) on any rigid surfaces by decoding transmitted frequencies
  • Identify a grasped object by visualizing the propagation dynamics of vibration

Video Generation from Natural Language by Decomposing the Components of Video : Background, Object, and Action

Presenter: Kibeom Hong and Hyeran Byun, Yonsei University

    • Video can be created by separating Background and Foreground, and Foreground can be divided into Object and Action.
    • We can get background and foreground information for video generation from text.
    • In the Image domain, previous works[1,2,3] have studied image generation with text extensively, [4,5,6] expanded this idea to video domain.
    • In this work, we want to create a video with three components in order to control more realistic and fine-grained parts.

Video Dialog via Progressive Inference and Cross-Transformer

Presenter: Zhou Zhao, Zhejiang University

Video dialog is a new and challenging task, which requires the agent to answer questions combining video information with dialog history. And different from single-turn video question answering, the additional dialog history is important for video dialog, which often includes contextual information for the question. Existing visual dialog methods mainly use RNN to encode the dialog history as a single vector representation, which might be rough and straightforward. Some more advanced methods utilize hierarchical structure, attention and memory mechanisms, which still lack an explicit reasoning process. In this paper, we introduce a novel progressive inference mechanism for video dialog, which progressively updates query information based on dialog history and video content until the agent think the information is sufficient and unambiguous. In order to tackle the multi- modal fusion problem, we propose a cross-transformer module, which could learn more fine-grained and comprehensive interactions both inside and between the modalities. And besides answer generation, we also consider question generation, which is more challenging but significant for a complete video dialog system. We evaluate our method on two largescale datasets, and the extensive experiments show the effectiveness of our method.

Widar 3.0: Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi

Presenter: Zheng Yang, Tsinghua University

Widar 3.0: Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi

Your Tweets Reveal What You Like: Introducing Cross-media Content Information into Multi-domain Recommendation

Presenter: Min Zhang, Tsinghua University

The key to solving this problem is to conduct better user profiling.

How about off-topic features in other platforms, such as tweets?

      • On-topic features are helpful in understanding users’ interests and preference.
      • Off-topic features are able to describe users too.

We will try to introduce these off-topic features (tweets) into different rating prediction algorithms.

Information

21ccc-icon-5 Microsoft Address

Venue: Tower 1-1F, No. 5 Danling Street, Haidian District, Beijing, China

地址:中国北京海淀区丹棱街5号微软大厦1号楼