Asia Faculty Summit 2016

Contact: Gang Hua, Microsoft Research

We will demo our latest video search technology in the query by example setting. We not only return a ranked list of all videos in the corpus, but also return the key shots that are automatically identified by our algorithm to provide comprehensible evidence on why a video is ranked higher. Providing such capability would naturally lead to a more convenient way for engaging users in the loop for interactive video search.

Contact: Tao Mei, Microsoft Research

This demo shows our recent work on video to language, including the translation of a video sequence to a textual description in the natural language form, as well as the automatic generation of human-level comments for a video.

Contact: Ming Zhou, Microsoft Research

Microsoft Conversation Hub is a complete solution to build and deploy high-quality end-to-end conversation systems and services with minimal efforts. Specifically, it provides an SDK for you to build your conversation engine with only one simple click. Based on state-of-the-art intelligent chat-bot techniques from Microsoft Research Asia, the Microsoft Conversation Hub has already leveraged big data to build a General Conversation Engine (GCE) within the underlying system. Besides, the Microsoft Conversation Hub provides ready-to-use APIs that allow developers to use customized data to enhance the conversational bot capabilities, according to personalized requirements; while also being highly capable of analyzing and distilling the knowledge from customized data, which then provides appropriate responses during the conversation. In addition, to integrate into different platforms, Microsoft Conversation Hub provides tools to easily build REST APIs. With just a few steps of configurations, REST APIs can be used from anywhere on any platform within seconds. We also show a digital parrot, Polly, built on top of the Microsoft Conversation Hub.

Contact: Mu Li, Microsoft Research

Smart Attention is a compact, flexible and efficient deep learning framework for natural language tasks, especially optimized for recurrent neural network and attention modeling both in terms of hardware utilization and memory footprints. Powered by .NET Core technology, Smart Attention can run on various platforms including Windows, Linux and Mac OS, and transparently uses CPU and GPU devices. Smart Attention also comes with a complete neural machine translation library enhanced with latest improvements from Microsoft Research, which can achieve best translation accuracy and training efficiency.

Contact: Xin Tong, Microsoft Research

We deliver a mixed reality rendering system with full surface reflectance effects on HoloLens. Our system delivers the realistic experience with the surface reflectance acquired from real object, rendered under real environment lighting surrounding the user. Using simple gestures and voice control, the user can easily navigate and observe the virtual object in different ways, as the object is presented in the real world.

Contact: Tao Qin, Microsoft Research

State-of-the-art machine translation (MT) systems are usually trained on aligned parallel corpora, which are limited in scale and costly to obtain in practice. Given that there exists almost unlimited monolingual data in the Web, in this work we study how to boost the performance of MT systems by leveraging monolingual data in two-language translation. Specifically, we formulate the translation system as a two-player communication game and learn the translation models through reinforcement learning. Player 1 only understands language A and sends a message in language A to Player 2 through a noisy channel, which is a translation model from language A to B. Player 2 only knows language B and sends her received message in language B back to Player 1 through another noisy channel, which is a translation model from language B to A. By checking whether the received message is consistent with her original one, Player 1 can assess the quality of the two channels (translation models) and improve the two channels accordingly. Similarly, Player 2 can send a message in language B to Player 1, go through a symmetric process, and improve the two translation models. This communication game can be played for multiple rounds until the obtained translation models get converged. Distinguishing features of this reinforcement approach include: (1) two dual translation models are trained within one framework; (2) translation models are improved purely from unlabeled data through reinforcement learning, without the need of aligned parallel corpora as supervision; (3) it opens a window for learning to translate from scratch without bilingual data.

Contact: Taifeng Wang, Microsoft Research

CNTK – Computational Network Toolkit is a unified deep-learning toolkit by Microsoft Research. This demo will show how to setup CNTK, how to use, configure and test it, and how to define your own networks. DMTK is another open source toolkit from Microsoft Research which focuses on distributed machine learning. In this demo we will also show what kinds of machine learning tools DMTK can provide and how external users can leverage such powerful tools. As both toolkits are from MSR, they focus on different domains. By merging and integrating them with each other, even powerful applications can be done based on them. It is our hope that the community will take advantage of CNTK+DMTK to share ideas more quickly through the exchange of open source working code.

Contact: Lei Ji, Microsoft Research

Q&A is an important knowledge data to enable many scenarios like auto question answering in bot. This Q&A miner provides a platform to: 1. Extract Q&A data automatically w/ human knowledge in loop. 2. Mine semantic tags like domain, entity, relation as well as intent and condition from Q&A data. Q&A extraction contains two parts: FAQ extraction from both web pages and enterprise documents such as Word, and Q&A extraction to extract from crowd sourcing data such as online forum. After we extract many Q&A pairs, Q&A Miner learns the semantic tags by using: NER, intent taxonomy mining and recognition, conditional knowledge mining as well as question linking techniques.

Contact: Xing Xie, Microsoft Research

An incisive understanding of user personality is not only essential to many scientific disciplines, it instills a profound business impact on practical applications such as digital marketing, personalized recommendation, mental diagnosis, and human resources management. Previous studies have demonstrated that language usage in social media is effective in personality prediction. However, except for single language features, a less studied direction is how to leverage the heterogeneous information on social media to have a better understanding of user personality. In this demo, we show how to predict users’ personality traits by integrating heterogeneous information including; self-language usage, avatar, emoticon, and responsive patterns. In addition, we will find out the right star for users via careful consideration of the predicted personality.

Contact: Kuansan Wang and Rui Li, Microsoft Research

Microsoft Academic services includes a set of APIs and data that make it easier to build robust apps, and tap into rich, academic data. In addition, a new data structure and graph engine has been developed to facilitate the real-time intent recognition and knowledge serving. This new service puts a knowledge driven, semantic inference based search and recommendation framework front and center. One illustrating feature is semantic query suggestions that identify authors, topics, journals, conferences, etc., as you type and offer ways to refine your search based on the data in the underlying academic knowledge graph. Plus, you can use the set of productivity tools and services that make it easy to stay-up-to-date on the latest research papers, people, journals, conferences and news.

Contact: Evelyne Viegas, Katja Hofmann, David Bignell, Fernando Diaz and Alekh Agarwal, Microsoft Research

Project Malmo is an open source AI experimentation platform designed to support fundamental research in artificial intelligence. With the Project Malmo platform, Microsoft aims to provide an experimentation environment in which promising approaches can be systematically and easily compared, and that fosters collaboration between researchers while working on fundamental AI research challenges such as; integration of multi-model, high-dimensional sensory data and life-long learning. Project Malmo achieves flexibility by building on top of Minecraft, a popular computer game with millions of players. The game is particularly appealing due to its open ended nature, collaboration with other players, and creativity in game-play. In this demo, we show the capabilities of the Project Malmo platform, and the kind of research they can enable. These range from 3D navigation tasks to interactive scenarios where agents compete or collaborate to achieve a goal.

Contact: Sangyoun Lee, Yonsei University

Many application technologies related to intelligent devices and wearable sensors have lately drawn a lot of attention from both the research community and the industry. Many of these technologies make use of hand and facial feature information, and the technical performances of these technologies are dependent on this information. Also, as the number of products equipped with these technologies rapidly increases, the performance of the core method becomes a crucial issue. However, it is a challenge to precisely detect the principal landmarks of fingers or face, as they flexibly deform with a large degree of freedom. Existing approaches mostly focus on depth feature extraction and the algorithm itself. Our approach, however, concentrates on depth-color-mutuality-based adaptive feature extraction and the hierarchical structure of the detection strategy. In other words, this research intends to develop a hierarchically organized structure of landmarks and a heterogeneously-coupled feature extraction method that builds a complementary correlation between depth and color features. A coarse-to-fine strategy will be adopted in order to construct a hierarchical landmark structure, and then features extracted from both depth and color information will have a co-operative effect on each other, functionally adapting to each condition of landmark.

Contact: Uichin Lee, KAIST

We study user experiences of active workstations that incorporate physical activities such as walking and cycling to promote active office working environments. As a case study of active workstations, we developed a smart under-desk elliptical trainer that visualizes workout performance and supports context monitoring. We then conducted both controlled and in-the-wild experiments to systematically analyze user experiences. Our research results had significant implications for designing active workstations and interactive workplaces.

Contact: Jinbae Park, Yonsei University

We propose a novel autonomous drone with 2 robotic manipulators controlled by both automatically, and manually. Our system delivers an intuitive interface of controlling the drone by combining user posture recognition and a head-mounted display with drone’s movement. At the same time, vision-based object recognition and automatic grasping algorithm are implemented to help the teleoperation using the drone. Therefore, this autonomous drone can be deployed to numerous industrial applications such as object delivery, and manipulating remote facilities.

Contact: Gu-Min Jeone, Kookmin University

In this presentation, we will introduce two useful applications on human activity analysis using multiple wearable devices including smartphone, smart shoes and smart bands. These applications solve critical issues related to human living, i.e., calculating the number of walking steps, estimating walking distance, estimating energy expenditure and recognizing human activities, using a fusion of sensory data from the wearable devices. Analyzing acceleration, angular speed and pressure data at users’ shoes and wrists, we can recognize users’ activities and estimate important information related to users’ walking. Moreover, we also provide a useful tool in human data management which can store physical information, such as age, height, weight, etc., effectively record activity data and visualize these dynamic data.

Contact: Seong-Whan Lee, Korea University

We propose a brain signal decoding method based on the deep learning technique to translate the users’ intention into the proper machine control commands. Our system recognizes the users’ voluntary imagination of body parts movements (i.e., MI; motor imagery) and transmits the appropriate commands to control the virtual avatar of ‘BrainRunners’ software (i.e., BCI racing; an obstacle race game using BCI). Various pilots including physically challenged people were possible to play the ‘BrainRunners’ by performing three classes of MI tasks in real time.

Contact: Seung-won Hwang, Yonsei University

We show how richer human intelligence can be captured from a fusion of multimodal social data (AAAI 2016, ICDM 2016). We also discuss enabling technologies under the hood– a main memory spatial query optimization (VLDB 2016) and a cost-aware query parallelization technique (WSDM 2015).

Contact: Jong Kim, POSTECH

We propose a novel automatic lock system in order to handle drawbacks of existing automatic unlock methods (e.g., Google’s SmartLock). The system leverages various information from both a smartphone and a smartwatch, and it automatically locks the smartphone whenever unauthorized users use the smartphone when it’s behavior does not match with that of the smartwatch.

Contact: Hyeran Byun, Yonsei University

Highlight detection from videos has been widely studied due to the fast growth of video contents. However, most existing approaches to highlight detection, either handcraft feature-based or deep learning-based, heavily rely on human-curated training data, which is very expensive to obtain and thus hinders the scalability to both large datasets and unlabeled video categories. We observe that the largely available web images can be applied as a weak supervision for highlight detection. Motivated by this observation, we propose a novel triplet deep ranking approach to video highlight detection using web images as a weak supervision. Our approach can iteratively train two interdependent deep models (i.e., a triplet highlight model and a pairwise noise model) to deal with the noisy web images in a single framework. We train the two models with the relative preferences to generalize the capability regardless of the categories of training data.

Contact: Seungijn Choi, POSTECH

Normalized random measures (NRMs) provide a broad class of discrete random measures that are often used as priors for Bayesian nonparametric models. Dirichlet process is a well-known example of NRMs. Most of posterior inference methods for NRM mixture models rely on MCMC methods since they are easy to implement and their convergence is well studied. However, MCMC often suffers from slow convergence when the acceptance rate is low. Tree-based inference is an alternative deterministic posterior inference method, where Bayesian hierarchical clustering (BHC) or incremental Bayesian hierarchical clustering (IBHC) have been developed for DP or NRM mixture (NRMM) models, respectively. Although IBHC is a promising method for posterior inference for NRMM models due to its efficiency and applicability to online inference, its convergence is not guaranteed since it uses heuristics that simply selects the best solution after multiple trials are made. In this paper, we present a hybrid inference algorithm for NRMM models, which combines the merits of both MCMC and IBHC. Trees built by IBHC outlines partitions of data, which guides Metropolis-Hastings procedure to employ appropriate proposals. Inheriting the nature of MCMC, our tree-guided MCMC (tgMCMC) is guaranteed to converge, and enjoys the fast convergence thanks to the effective proposals guided by trees. Experiments on both synthetic and real-world datasets demonstrate the benefit of our method.

Contact: Chuck Yoo, Korea University

Smart automobiles become IT device as they are evolving from the vehicle of transportation to platform of a variety of services for drivers. In particular, smart automobiles backed with cloud infrastructure can offer much rich services including self-driving and autonomous monitoring of running condition of automobile. To achieve high responsiveness and real-time data processing for smart automobile, clouds need to be intelligent – utilize cloud resources intelligently. For intelligent clouds, we have designed and implemented an intelligent resource scheduler (IReS) that integrates CPU and network resources. IReS has been tested in smart automobile environment equipped with digital cluster and shows its effectiveness.

Contact: Jooseok Song, Yonsei University

We have developed a new approach to improve network performance in social network services by relocating user’s data among multiple clouds using several factors: the amount of traffic, distance between user and cloud, and social relationship level between users on the social network services.

Contact: Gunhee Kim, Seoul National University

We address a variant of language grounding problems, to discover the reference alignment between the regions of images and the phrases in the associated natural language text. Unlike much of previous work, we especially deal with noisy text (e.g. user posts in social media) which is usually free-formed and contains irrelevant clutters. To this end, we introduce a novel vision-and-language dataset called Pinterest Entities. We design an attention-based deep learning network which aims to simultaneously extract key phrases from noisy text and localize the corresponding regions from image.

Contact: Hyunju Lee, Gwangju Institute of Science and Technology

We developed a wavelet-based method to identify copy number alterations (CNAs) of genes, which may drive cancer initiation and development. To use high-resolution next generation sequencing (NGS) data, we employed a wavelet transformation, which removes noises in the NGS data and detects recurrent focal CNAs. When we applied the proposed method to glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, our approach achieved better performances than the existing algorithms using microarrays.

Contact: Joon Heo, Yonsei University

Spatial big data (SBD) has been utilized in many fields and we propose SBD analytics to apply to education with semantic trajectory data based on ideal support of Songdo International Campus at Yonsei University. Higher education is under a pressure of disruptive innovation, so that colleges and universities strive to provide not only better education but also customized service to every single student, for a matter of survival in upcoming drastic wave. The entire research plan is to present a smart campus with SBD analytics for education, safety, health, and campus management, and this research is composed of four specific items: (1) to produce 3D mapping for test site; (2) to build semantic trajectory; (3) to collect pedagogical and other parameters of students through OSE center; (4) to find relationship among trajectory patterns and pedagogical characteristics. Successful completion of the research would set a milestone to use semantic trajectory to predict student performance and characteristics, even further to go to proactive student care system and student activity guiding system. It can eventually present better customized education services to participating students.

Contact: Jaegul Choo, Korea University

Organizing, classifying, and summarizing large document collections are important problems in today’s data-driven society. Central to many text analysis methods is the notion of a textual concept: a set of semantically related keywords characterizing a specific object, phenomenon, or theme. Textual concepts have potential for characterizing document collections, and can also be constructed once and then shared and reused over and over. Here we present a visual analytics system called ConceptVector that guides the user in building, refining, and sharing such concepts and then use them to classify documents. We validate ConceptVector via both a quantitative analysis and a user study to show that the happiness ranking of words generated with our methods are comparable to human-generated ones. Usage scenarios involving real-world datasets demonstrate the fine-grained level of analysis supported by ConceptVector.

Contact: Kyoungmu Lee, Seoul National University

For last few years, CNN has been widely applied to various computer vision problems, proving its ability to extract powerful and informative features from raw images. However, there are only few attempts to exploit this powerful features from CNN in structured output prediction problems. In this project, we propose a framework for structured output prediction by combining convolutional neural networks (CNN) and Structured Support Vector Machine (SSVM). We applied the proposed framework to estimate human pose from a single image and showed the improvement performance.

Contact: Chulhee Lee, Yonsei University

In this project, a new logo detection algorithm is developed based on the angle-distance map. This algorithm first identifies candidate logo regions based on color information. Then the algorithm computes the angle-distance map, which is invariant against scale and rotation. The proposed algorithm can detect logos with various rotations and sizes in natural images with low complexity. We implemented the algorithm as smartphone/tablet applications.

Contact: Sang-Wook Kim, Hanyang University

We study how to improve the accuracy and running time of top-N recommendation with collaborative filtering (CF). Unlike existing works that use mostly rated items (which is only a small fraction in a rating matrix), we propose the notion of pre-use preferences of users toward a vast amount of unrated items. Using this novel notion, we effectively identify uninteresting items that were not rated yet but are likely to receive very low ratings from users, and impute them as zero. This simple-yet-novel zero-injection method applied to a set of carefully-chosen uninteresting items not only addresses the sparsity problem by enriching a rating matrix but also completely prevents uninteresting items from being recommended as top-N items, thereby improving accuracy greatly. As our proposed idea is method-agnostic, it can be easily applied to a wide variety of popular CF methods. Through comprehensive experiments using the Movielens dataset and MyMediaLite implementation, we successfully demonstrate that our solution consistently and universally improves the accuracies of popular CF methods (e.g., item-based CF, SVD-based CF, and SVD++) by two to five orders of magnitude on average. Furthermore, our approach reduces the running time of those CF methods by 1.2 to 2.3 times when its setting produces the best accuracy.

Contact: Kwanghoon Sohn, Yonsei University

We present a method for jointly predicting a depth map and intrinsic images from single-image input. The two tasks are formulated in a synergistic manner through a joint conditional random field (CRF) that is solved using a novel convolutional neural network (CNN) architecture, called the joint convolutional neural field (JCNF) model. Tailored to our joint estimation problem, JCNF diff ers from previous CNNs in its sharing of convolutional activations and layers between networks for each task, its inference in the gradient domain where there exists greater correlation between depth and intrinsic images, and the incorporation of a gradient scale network that learns the con fidence of estimated gradients in order to eff ectively balance them in the solution. This approach is shown to surpass state-of-the-art methods both on single-image depth estimation and on intrinsic image decomposition.

Contact: Insik Shin, KAIST

In this work, we propose a novel tapstroke inference method, called TapSnoop. It accurately and robustly infers user typed sensitive information (e.g., passwords and PINs) by exploiting tapsound as a side channel of tapstrokes. First, for the accurate tapstroke inferencing, we develop tap detection and localization algorithms that leverage the acoustic characteristics of tapsound. Moreover, with the combined use of various sensors, we further improve the accuracy even in the presence of user’s mobility and ambient noise. We evaluate the performance of TapSnoop with an extensive evaluation collecting data from 10 real-world users in various scenarios. Our evaluation results show that TapSnoop achieves a high degree of accuracy (92.9% for a number keypad and 78.7% for a qwerty keypad). Furthermore, even with a moderate level of noise, it provides a similar degree of inference accuracy to the result obtained in a virtually noise-free environment.

Contact: Heejo Lee, Korea University

In this project, we focus on analyzing software vulnerability in an automated way. The proposed method is composed of two phases: vulnerability discovery phase which select potentially vulnerable code, and verification phase which verifies whether potentially vulnerable code is vulnerable or not. We apply a backward tracing method to reduce the number of paths to be explored. We test our method with Juliet Test Suite and show that our method can verify the vulnerability. Currently, we build a platform named IoTcube that analyzes vulnerabilities in software and networks, and the result of this study will be included in the IoTcube platform.

Contact: Steve(Sungdeok) Cha, Korea University

We propose a fresh departure from existing paradigms where image-based CAPTCHA mechanism is used to automatically generate appropriate image tags. This is possible if chosen (e.g., successful) responses are most likely to have been generated by humans who answered the CAPTCHA test to the best of their ability in order to create accounts with or log to subscribe portal services such as Microsoft Live.

Contact: Hwasoo Yeo, KAIST

We deliver a real-time highway traffic prediction system with Microsoft AZURE. The system predicts 6 hours ahead of time with highway sensor data from Dedicated Short-Range Communication (DSRC), Vehicle Detection System (VDS), and Toll Collection System (TCS), over the range of South Korea highway network. Based upon Multi-level K-Nearest Neighbor (MK-NN) method, future speed, travel time and collision risk values at five-minute interval are provided. Also, various scenarios with respect to traffic accident and control strategies are included in the system using Modified Cell Transmission Model (MCTM). Furthermore, online simulation functions are incorporated into the system in order to help to find highway management strategies for the optimal system performance. By effectively distributing the computation power by AZURE platform, we can provide the real-time service by significantly reducing the computation time.

Exhibit 1: Comprehensible Video Search by Example

Exhibit 2: Video to Language - Describing Videos with Natural Language

Exhibit 3: Microsoft Conversation Hub

Exhibit 4: Smart Attention

Exhibit 5: Mixed Reality Rendering for HoloLens

Exhibit 6: Self-teaching Machine - AI that Teach Itself through a Dual-learning Game

Exhibit 7: CNTK+DMTK - Distributed Deep Learning Framework from Microsoft

Exhibit 8: Q&A Miner

Exhibit 9: Ideal Couple - Predicting User Personality from Heterogeneous Information

Exhibit 10: Microsoft Academic - Research More, Search Less

Exhibit 11: Project Malmo - A Platform for Fundamental AI Research

Exhibit 12: Hierarchical 3D Landmark Detection Based on Heterogeneously-Coupled Feature Extraction

Exhibit 13: Exploring User Experiences of Active Workstations

Exhibit 14: Development of Autonomous Drone Control Technique for Teleoperation

Exhibit 15: Human Activity Recognition Using Smart Shoes and Smart Bands

Exhibit 16: Development of Real-Time Brain Signal Processing Algorithms based on Deep Learning for BCI-Racing

Exhibit 17: Processing and Optimizing Main Memory Spatial-Keyword Queries

Exhibit 18: Secure Automatic Unlock with a Trusted Device in Mobile System

Exhibit 19: Weakly Supervised Video Highlight Detection with Triplet Deep Ranking

Exhibit 20: Tree-guided MCMC Inference for Normalized Random Measure Mixture Models

Exhibit 21: IReS: Integrated resource scheduling for intelligent clouds

Exhibit 22: A Novel Load Balancing Scheme for Multi-cloud using Data Relocation based on Multiple factors

Exhibit 23: Correspondence Discovery between Image Regions and Phrases in Noisy Free-form Text

Exhibit 24: Identification of Cancer-driver Genes in Focal Genomic Alterations from Whole Genome Sequencing Data

Exhibit 25: Student Characterization Based on Semantic Trajectory Analysis

Exhibit 26: ConceptVector: Building User-Driven Concepts via Word Embedding

Exhibit 27: Structured Output Prediction using Convolutional Neural Network for Human Pose Estimation

Exhibit 28: Real Time Logo Detection using Shape, Color and Text Information

Exhibit 29: Imputing Uninteresting Items based on Pre-use Preferences for Effective Collaborative Filtering

Exhibit 30: Unified Depth Prediction and Intrinsic Image Decomposition from a Single Image via Joint Convolutional Neural Fields

Exhibit 31: TapSnoop - I Can Hear Your Touch-screen Taps: Leveraging Tap Sound to Infer Tapstrokes on Mobile Touch-screen Devices

Exhibit 32: Automatic Modeling and Verification of Software Vulnerabilities

Exhibit 33: Paradigm Shift on Authentication - Uncertainty, Personalization

Exhibit 34: Real-time traffic management system with online simulator and optimal traffic control algorithm based on smart-phone and vehicle detection system in the cloud platform