May 26, 2017

Microsoft Research Asia Academic Day 2017

Location: Yilan, Taiwan

Register
    • Gene Cheung, National Institute of Informatics (NII)
    • Dinei Florencio, Microsoft Research

    The goal of our research is to acquire, process and compactly represent 3D geometric data (e.g., depth images, meshes, 3D point cloud) for transmission over bandwidth-limited networks to a receiver for immersive visual communication (IVC) applications, such as holoportation. Unlike conventional 2D video conference tools like Skype, IVC renders captured human subjects in a virtual 3D space at the receiver side (observed using multi-view or head-mounted displays) so that “in-the-same-room” experience can be shared by the participants remotely located but connected via high-speed data networks. Advances in IVC, which include recent development in virtual reality (VR) and augmented reality (AR), can enable a new paradigm in distance human communication, resulting in cost reduction and quality improvement in a range of practical real-world applications, including distance learning, remote medical diagnosis, psychological counselling, etc.

    • Kyoko Sengoku-Haga*, Sae Buseki*, Min Lu**, Takeshi Masuda+, Takeshi Oishi**, *Tohoku University, **The University of Tokyo, +AIST
    • Katsu Ikeuchi, Microsoft Research

    The goal of our project is acquiring a substantial quantity of 3D data of ancient sculpture, which enable us getting archaeologically significant results and thus proving the validity of cyber-archaeological method; the final goal is the construction of a cyber museum open to all the researchers in the world, which will enable them to try the new cyber-archaeological method in studying ancient sculpture, namely, the 3D shape comparison method developed by our project. It has potentiality to cause a paradigm shift in the field of art history/archaeology, but that is not all; this new method opens a great possibility for Asian researchers and students in the field of Greek and Roman studies. Due to the absolute lack of real works of Greek and Roman art in their countries, most Asian researchers of this field are obliged to remain in secondary level in the world. With the help of 3D models and the shape comparison tool, its research and education in Asian countries will possibly change drastically. Till 2015 we selected statues to be scanned with the view of solving specific art historical problems; now we are shifting to scan a series of notable statues of each epoch systematically, thus acquiring a mass of data applicable to different problems of numerous researchers.

    • Ichiro IDE, Nagoya University
    • Tao Mei, Microsoft Research

    Aesthetics of photography and art work has been studied for a long time. The so-called “Rule of Thirds” based on the golden ratio is a well-known basic rule for deciding the framing. However, in reality, it is often the case that other constraints take precedence over the basic rule. Among the constraints are the purpose of photographing and the nature of the target contents-of-interest in the scene. In most situations, it is more preferable to include certain contents than other contents considering the purpose of photographing. So, the aesthetics of photography should actually be assessed according to the contents visible in the image in addition to general rules. Since the purpose of photographing varies case-by-case and in many cases not even explicitly describable, and also since it is nearly impossible to describe the nature of each content in the scene beforehand, it is very difficult to solve this problem in a general framework. So, the proposed project aimed to assess the aesthetics of especially food images whose purpose of photographing is clear (i.e. the target food should look delicious), and also whose contents are restricted and usually annotated (i.e. accompanied with dish names and/or ingredients).

    • Hideo Joho, University of Tsukuba
    • Ruihua Song, Microsoft Research

    The increase of voice-based interaction has changed the way people seek information, making search more conversational. Development of effective conversational approaches to search requires better understanding of how people express information needs in dialogue. This project set the following goals to address the research challenge.

    • Develop a conceptual model that can represent information needs expressed in conversations of collaborative task
    • Identify effective features to detect dialogues that contain conversational information needs
    • Establish behavioral patterns of conversational information needs for a common collaborative task
    • Makoto P. Kato, Kyoto University

    The purpose of this research project is to develop a cognition-aware search system that returns items such as documents, images, and music, in response to cognitive search intents (i.e. how the user wants to cognize the item). We develop methods to predict a cognitive search intent based on user brain activity during search, and to estimate the cognitive relevance of items by utilizing brain activity data as user profiles. We also investigate the relationship between brain activity and physiological data, and further propose a method of obtaining pseudo brain activity data for the case where brain activity data are not available. In this research project, we aim to extend the search engine ability from understanding what a user wants to understanding how a user wants to feel, and to initiate transferring findings in neuroscience into the industry.

    • Yuta Nakashima, Osaka University and Hiroshi Kawasaki, Kyushu University
    • Katsu Ikeuchi, Microsoft Research

    Learning actions, such as martial arts techniques or dance moves, is best done by imitating a demonstration. There are basically two ways to do this: one is by copying a teacher in real life who is performing the action, and another is by copying a video that has been recorded of the teacher. Both of these methods have drawbacks. Imitating a teacher in real life is dependent on the availability of the teacher. Using a video of the teacher is limited to the video’s viewpoint. If the action is ambiguous or hard to follow, the viewer may not change the viewpoint to see it better.

    Thus, the goal of this project is to create a method that combines these two approaches, and to develop an application that is able to present it easily to users. Our proposed method is called a reenactment, and it is a 3D reconstruction of a motion sequence. In order to make it easy to capture, we restrict ourselves to using consumer depth cameras, in contrast to existing 3D reconstruction techniques that make use of multiple cameras or depth cameras. Our proposed application will use augmented reality, with the mirror metaphor: we will overlay our reenactment on top of a mirror of the user, which will copy the orientation of the user, in order for him or her to more easily compare actions with the reenactment.

    • Hiroshi Kawasaki, Kyushu University and Yuta Nakashima, Osaka University
    • Katsushi Ikeuchi, Microsoft Research

    Active 3D scanning methods using a single image with static light pattern (a.k.a. one-shot 3D scan) have attracted interests from many researchers, because of their exclusive advantages, i.e., capability of capturing fast moving objects. The applicant has been researched on 3D shape reconstruction techniques based on the active 3D scanning method for more than a decade and published several papers and succeeded in recovering fast moving objects, i.e., a bursting balloon and a rotating fan. Such advantages contribute to various applications, such as medical system, product inspection, autonomous driving, etc. Among them, since a human sometimes moves so fast, motion capture of human is still a challenging problem, and thus, we set our goal to achieve capturing human in fast motion. One important difficulty of the system derived from noise, because human motion is so fast and shutter speed should be set at very short time, resulting in dark and noisy images. To compensate the light intensity, multiple projectors are frequently used, which is also useful to enlarge the recoverable region, however, this causes color crosstalk problem. Another issue is missing parts in reconstruction, which inevitably occurs because some parts of body are usually occluded by other parts. To solve those issues, we propose two approaches.

    • Mamoru Komachi, Tokyo Metropolitan University
    • Xianchao Wu, Microsoft Research

    In this project, we present a neural network based model for robust Japanese word segmentation. As the growth of the web, there emerge large variations in the language use. Existing morphological analyzers are typically trained on a newswire corpus, and are not robust for processing web texts. However, there are few resources for robust Japanese natural language analysis. Thus, we aim at creating fundamental language resources for neural network-based Japanese word segmentation.

    • Shunsuke Kudoh, The University of Electro-Communications
    • Katsushi Ikeuchi, Microsoft Research

    Learning from observation paradigm (LFO paradigm), in which a robot learns tasks by observing human demonstration, is an effective method for teaching motions to a robot. With this method users do not need to make programs explicitly every time they try to teach something new to a robot. However, since a human body and a robot body have very different joint structure and mass distribution, it is difficult to teach human motion by importing it directly. For example, angular trajectories of joints are difficult to directly import to a robot. Therefore, it is necessary in LFO-based learning that a robot first recognizes what a demonstrator is doing, and then from the recognition result, the robot reproduces motion that is both equivalent and feasible.Few studies have been done so far which describe human motion from the viewpoint described above. What is required for such a framework of motion description is that it be capable of both “recognizing” and “reproducing” human motion regardless of the domain of motion and the type of robot. The words “recognition” and “reproduction” in this document are defined as follows:

    • Recognition: generating motion description from observation of human motion
    • Reproduction: generating robot motion from motion description

    In this project, we proposed a general method for describing human motion which was capable of both recognition and reproduction.

    • Takuya Maekawa and Yasuyuki Matsushita, Osaka University
    • Katsushi Ikeuchi, Microsoft Research

    Construction of 3D maps of indoor environments can be a core technology for indoor real-world applications such as navigation for pedestrians and autonomous mobile robots, virtual tours of sightseeing spots and museums based on VR technologies, and so on. However, existing 3D reconstruction technologies require expensive devices such as laser range finders and depth sensors. Therefore, 3D reconstruction methods based on commodity devices are required. This study proposes a method for constructing a 3D model with real scale using a camera and Wi-Fi module, which are installed in recent smartphone devices.

    • Masaaki Fukumoto, Microsoft Research

    This project represents a somewhat “unusual” part of MSRA research as it’s hardware-based. Our research not only aims to improve existing devices, e.g., keyboard, pointing-devs, but focus more on creating brand-new interface devices.

    • Gang Niu (presented by Tomoya Sakai), University of Tokyo
    • Dr. Xianchao Wu, Microsoft Japan

    Our original proposal was entitled “deep similarity learning in graph-based semi-supervised methods” that involves three topics: Deep learning, which is good at highly nonlinear representations of the raw data; Metric learning, which focuses on pairwise distance measures of the data such that under the ideal metric, data with a same label should be close and data with different labels should be far apart; Semi-supervised learning, which requires unlabeled data at training for classifying either test data or unlabeled data themselves. Deep similarity learning is extensively used for learning-to-rank/match features in modern search engines (where titles/short abstracts are matched to a query), and graph-based methods like random walks and label propagations are also useful in search engine companies (where doc info can be propagated using query-query graph and query info can be propagated using doc-doc graph).

    However, due to some security reasons that will be explained later in “collaboration with Microsoft Research”, we cannot get access to the data possessed by Microsoft Japan in order to try our several novel ideas for the original proposal, we modified it into a closely related “positive-unlabeled learning with application to semi-supervised learning”. In positive-unlabeled (PU) learning, a binary classifier is trained from positive (P) and unlabeled (U) data without negative (N) data. This also belongs to semi-supervised learning and when submitting research papers to top learning conferences people choose the area of semi-supervised learning. In practice, PU learning has a lot of applications in detection, recognition, and retrieval problems.

    The goal of this project is to better understand the state-of-the-art unbiased PU learning methods and further improve on it. The proposed non-negative PU learning is shown to be the new state of the art.

    • Takahiro Shinozaki, Tokyo Institute of Technology
    • Frank Soong, Ningyi Xu, Microsoft Research

    In our daily lives, it is often the case that we want to control electric divides such as an audio player and an illumination lamp, find a small item such as a wallet and eyeglasses, catch an event such as a baby is crying and a dog is barking. Sometimes, however, it is bothering to walk in a room interrupting what you are doing, is time-consuming to find something, and is impossible without a help of someone else. These problems can be solved if tiny and energy efficient speech sensors are ubiquitously embedded in our living environment. These sensors must be very small so that it can be attached to various things. The energy consumption must be minimum since it must continuously work with a tiny energy source so that it can react to a voice at any time. It must be noise robust since it is used in noisy environments and there is a distance between the user and the speech sensor, and the SNR is low. The goal of this project is to develop a speech recognition architecture that is suitable for such speech sensors.

    • Takehiro Yamamoto, Kyoto University
    • Ruihua Song, Microsoft Research

    Web searchers are often motivated by the needs to achieve his/her real-world tasks. For example, a user who is suffering from a sleeping problem may issue the query “sleeping pills,” intending to find a good sleeping pill to solve his/her sleeping problem. develop methods for supporting users in such task-oriented Web search. This research project particularly focused on supporting query formulations of users in task-oriented Web search by providing alternative actions to them. More specifically, we tackled the alternative action mining problem, where a system is required to find alternative actions for a given query. An alternative action for a query is defined as an action that can solve the same problem. For example, given the query “sleeping pills,” our objective is to find alternative actions such as “have a cup of hot milk” or “stroll before bedtime,” both these alternative actions can achieve the same goal behind the query, i.e., “solve the sleeping problem.” Mined alternative actions can be utilized for supporting a searcher in a task-oriented Web search. For example, by suggesting the alternative actions to the searcher issuing the query “sleeping pills,” he/she is able to notice different solutions and make an improved decision on how to solve his/her sleeping problem.

    • Takahiro Hara, Osaka University
    • Xing Xie, Microsoft Research

    Recently, a flood of applications often makes users difficult to know all available applications and choose appropriate one according to their situations (context). In our previous project under CORE 11, we have first tried to investigate relationships between high-level user context (e.g., how busy, how good in health, and with whom the user is) and application usage by analyzing a large amount of application usage logs collected through a monster-breeding game on smart-phones. We have then developed a preliminary prototype of a system which recommends applications suitable for user’s current context based on the analytical results. This system is effective for solving the above mentioned application-flood problem, especially for people who are not familiar with smart-phones such as elderly people. The high-level context information collected by our game is useful for not only application recommendation, but also many other applications such as life-logging. Existing life-logging services either require burdensome operations such as inputting complicated information of users or just record simple information that can be easily calculated from sensor data such as walking distance and sleeping time. We therefore have developed, in our previous project, a life-logging service which makes use of high-level context provided by our game, thus, users need not do any extra operations.

    In this continuation project, we continued the above previous studies to further improve both of the preliminary developed systems. In particular, we focused on development of some application recommendation techniques such as that predict applications which will be used next to reduce the user’s burden to search the applications from a large number of installed applications.

    • Norihisa Miki, Keio University
    • Masaaki Fukumoto, Microsoft Research

    The next generation wearable human interface devices mandate to acquire signals of human activity, such as EEG and EMG, with high sensitivity and accuracy and to transfer information to human with minimum loss and low power consumption. These challenges are essentially derived from stratum corneum, which covers the surface of the skin, is a good insulating layer to protect the body from the environment, and is to work as the interface between the human interface devices and the body. We highlight two micro-needle-based human interface devices, which can penetrate through the high-impedance stratum corneum without reaching the pain points; The needle-type electrotactile displays can transfer tactile information at much lower voltage than the conventional flat-electro-tactile type. The needle-type electrodes for EEG can successfully measure high-quality EEG from hairy parts with a help of its candle-like shape.

    Although these results were new and highly evaluated from the research point of view, the needles may not be suitable for commercial applications, in particular, for long term use. Therefore, in this research project, we attempt to optimize the interface between the wearable devices and the human skin in terms of efficiency and user affinity. We will investigate the shape, material, density, etc. of the micro-needle electrodes. In addition, how the reliable interface can be maintained needs to be discussed for the user affinity.

    • Hajime Nagahara, Osaka University
    • Steve Lin, Microsoft Research

    It is impossible to apply many computer vision methods, such as shape from shading [1], depth from defocus [2], high-dynamic range imaging [3] and specular/Lambartian separation [4], to dynamic scene, since they require to use multiple image acquisitions and assume that scene is static during the capturing the images. However, regular CCD or CMOS sensors have uniform exposure timings and are impossible to take multiple images at the same time. These methods cannot ignore the difference of exposure timings among the images when the scene has dynamic motions. In this proposal, we propose to use a multi-tap CMOS sensor [5] for applying these methods to dynamic scene. The multi-tap CMOS sensor is able to acquire multiple images at almost the same time, 100 micro seconds difference. We can ignore the exposure differences among the images, but also switch lightings. Using these images, we can estimate a shape of object of a dynamic scene by using shape from shading technique.

    • Jason S. Chang, National Tsing Hua University
    • Chin-Yew Lin, Micrsoft Research

    Learners of English as a second language typically have problems getting up to speed and become a fluent and confident writer. In this project, we propose to develop a method for extracting grammar patterns, which can be used to provide instant writing suggestion in Microsoft Word. In our approach, we use partial parsing and pattern templates to extract grammar patterns and dictionary-like examples in genre-specific corpora. The method involves automatically derive base phrases of sentences in a given corpus, automatically generate and rank candidate patterns and examples matching templates, and filter high-ranking patterns and examples. At run-time, as the user types (or mouses over) a word, the system automatically retrieves and displays grammar patterns and examples, most relevant to the word and its surrounding context. The user can opt for patterns from a general corpus, academic corpus, or commonly overused dubious patterns found in a learner corpus. We present a prototype writing assistant, WriteAhead, that applies the method to reference and learner corpora, such as Gigaword English, CiteSeerx x, and WikEd Error Corpus. We expect intensive interactions provided by WriteAhead via writing suggestions on patterns and examples to continue the partial sentence. WriteAhead would minimize the time spent on hesitation and searching for the right word. Our methodology effectively turns the Microsoft word processor into a resource-rich Interactive Writing Environment, much like the Interactive development environments that are commonplace in writing software code.

    • James Cheng, The Chinese University of Hong Kong
    • Bin Shao, Microsoft Research

    The project aims to develop a distributed platform for efficiently querying big graphs potentially stored in distributed locations. Graph queries such as shortest-path distance queries, reachability queries, pattern matching queries, neighborhood queries, etc., have many important applications and have been extensively studied in the past. However, in recent years, we have witnessed a surge of graph data from various sources such as online social networks, online shopping networks, mobile and communication networks, financial and marketing networks, the WWW and Internet, etc. Most of these graphs are massively large, and existing graph query processing techniques are not scalable, while existing distributed graph computing systems were not designed for handling online graph query workloads. This motivates us to design a new type of distribute system for graph query processing. Such a system can advance research in the field of large scale graph query processing, where scalable techniques are still lacking, and also benefit industry, where massive volumes of graph data have been generated and online querying becomes increasingly critical.

    • Herming Chiueh, Shih-kai Lin, National Chiao Tung University
    • Chin-Yew Lin, Microsoft Research

    Epilepsy is a common neural disorder disease; about 1.7% of the global population has epilepsy. Most patients use antiepileptic drugs to reduce their seizures. Among them, nearly one-third of the patients are drug-resistant epilepsy. The alternative treatment is the resection surgery of removing the epileptogenic zone. However, all above patients will still have some seizures, which will influence the patients’ quality-of‐life, and further introduce danger and convenience to patients and people around. This project proposed to design and develop a smart headband for the epilepsy patients. The headband will consist of a textile headband with printed-circuit-board (PCB) inside, and textile electrodes on it.

    • Katsu IKEUCHI, Microsoft Research

    Demand for service robots has been increasing due to the necessity of elderly care and daily-life support. MSRA robotics team and MS Strategic prototyping team are jointly developing intelligent service robots to meet this demand. The robots follow the remote/cloud brain architecture for flexibility and varsity. From the microphone, incoming voice signals are converted to text messages which are sent to the basic activity module on the cloud server. Based on the analysis by the module, several services on the cloud server are launched. The current capability of the robot includes general chatting, language translation, person identification, object recognition, and guiding.

    Retention of fluency is one of the prerequisites in such service robot. By connecting a chatting engine to the robot, the conversation ability of the service robot is remarkably improved. In conversation, a gesture along with a spoken sentence is an important factor as it is referred to as body language. This is particularly true for humanoid service robots, because the key merit of such a humanoid robot is its resemblance to human shape as well as human behavior. We proposed a new method to generate gestures along with spoken sentences for such humanoid service robot.

    • Min-Yen Kan*, Kokil Jaidka+, Muthu Kumar Chandrasekaran*, *National University of Singapore, +University of Pennsylvania
    • Chin-Yew Lin, Microsoft Research

    We developed resources and technologies that solve problems for scientific summarization. Current scientific summaries are written manually by scholars, synthesizing the goals and contributions of a study. Advances in automated document summarization, while significant, are not adapted to summarize the specialized scientific document format, typified with conventional argumentation patterns and use of technical terminology. Furthermore, automatic summarization systems do not support a researcher in the actual task of a literature survey – which may involve tracking a research over time, and following developments since a seminal publication, which could amass hundreds to thousands of citations per year. It is also difficult to quantitatively evaluate these summaries, because there is no single rubric of what comprises an ideal scientific summary. Importantly, the key resource of a standardised reference corpus is missing – this is needed to interest the research community in dedicating resources and manpower, as comparative objective benchmarking is critical to reproducibility and assessment.

    • Michael R. Lyu, The Chinese University of Hong Kong
    • Dongmei Zhang, Microsoft Research

    This project aims at advancing the state-of-the-art techniques of log generation, selection and analysis for performance monitoring and reliability enhancement. We improve logging quality at the time logs are written, and investigate cost-effective logging mechanisms for large-scale distributed systems. The corresponding methods to collect and parse the logs generated in the target systems are also designed. We apply data mining techniques to select important and informative logs, and engage a log parser to structure raw logs with clean features for machine learning processing. With abundant information extracted in the log data, performance monitoring and system troubleshooting will be conducted accordingly. Finally the associated tools for performance monitoring and anomaly detection will be published for public access. There are totally three objectives.

    • Tseng-Hung Chen, Min Sun, National Tsinghua University
    • Jianlong Fu, Microsoft Research

    Datasets with large corpora of “paired” images and sentences have enabled the latest advance in image captioning. Many novel networks trained with these paired data have achieved impressive results under a domain-specific setting — training and testing on the same domain. However, the domain-specific setting creates a huge cost on collecting “paired” images and sentences in each domain. For real world applications, one will prefer a “cross-domain” captioner which is trained in a “source” domain with paired data and generalized to other “target” domains with very little cost (e.g., no paired data required).

    We propose a cross-domain image captioner that can adapt the sentence style from source to target domain without the need of paired image-sentence training data in the target domain. Left panel: Sentences from MSCOCO mainly focus on location, color, size of objects. Right panel: Sentences from CUB-200 describe the parts of birds in detail. Bottom panel shows our generated sentences before and after adaptation.

    • Andrea Nanetti, Siew Ann Cheong, Nanyang Technological University
    • Chin-Yew Lin, Microsoft Research

    Automatic Acquisition of Historical Knowledge and Machine Reading for News and Historical Sources Indexing/Summary can move from the historians and the reporters experiences in finding out more and more background information surrounding the event. In this context, the New Silk Road is quite a fortunate an exquisite case study. The first mention of the Silk Road (Seidensdtrasse) can be found in Ferdinand von Richthofen’s China (1877-1912) to name a segment of the intercontinental communication network in a specific time period: the first-century AD Marinus of Tyre’s overland route from the Mediterranean to the borders of the land of silk. But across time, the Silk Road became a double synecdoche (i.e., a form of speech, in which a part is made to represent the whole): the Road represents the entire intercontinental connectivity networks; the Silk is for all sorts of goods and trade. In September-October 2013 PRC President Xi’s proposal to the surrounding countries for a new silk road used that concept as a metaphor (i.e., a figure of speech, in which a word or phrase is applied to an object or action to which it is not literally applicable) to brand the launch of the Asian Infrastructure Investment Bank and the Silk Road Infrastructure Bank.

    • Wei Wang, HKUST
    • Thomas Moscibroda, Microsoft Research

    With the wide deployment of data-parallel frameworks like Spark and Hadoop, it has become a norm to run data analytics applications in a large cluster of machines. Having different applications coexisting in a cluster, data analytics jobs, each consisting of many parallel tasks, expect predictable performance with guarantees on the maximal completion delay. Cluster operators, on the other hand, aim to minimize the response times of jobs, i.e., the time between the instants of job arrivals and completions.

    Prevalent cluster schedulers deployed in today’s datacenters rely on fair sharing to provide predictable performance, e.g., Dryad’s Quincy, Hadoop Fair and Capacity Scheduler, and YARN’s DRF scheduler. By seeking max-min fair allocations at all times, fair schedulers aim to assure that each job receives equal amounts of cluster resources (to the degree possible), regardless of the behaviors of the other jobs, therefore, achieving performance isolation from one another. However, it has been widely confirmed that fair schedulers can be inefficient, and may result in significantly long response times.

    • Chao-Chung Wu, Shou-De Lin, National Taiwan University
    • Mi-Yen Yeh, Academia Sinica
    • Ruihua Song, Microsoft Research

    Recently, with the development of deep learning, natural language generation such as image to caption and dialogue generation has gained better and amazing results with respect to either accuracy or surprising output to human, especially in creative language generation like poetry generation. Among poetry generation, the creativity and readability of ancient poetry leave more imagination space for reader to understand and sometimes the constraint of ancient poetry such as length, rhyme, Part-of-speech make the poetry exactly same like the original poem in words. In this project, we develop a model that exploits the given image to generate modern Chinese poems. While generating poems that follow the constraints such as length, rhyme, and part-of-speech, the model also wants to show some “creativity” of a machine. That is, the model does not just copy the poem line of those exiting famous ones, but also adds some new ideas.

    • Ting Yao, Tao Mei, Microsoft Research
    • Wush Chi-Hsuan*, Mi-Yen Yeh*, Ming-Syan Chen#, *Academia Sinica, #National Taiwan University
    • Xing Xie, Microsoft Research
    • Charles HP Wen, National Chiao Tung University
    • Chin-Yew Lin, Microsoft Research