Microsoft Research Asia Academic Day 2017

Microsoft Research Asia Academic Day 2017


Welcome to Microsoft Research Asia Academic Day 2017. This is one of the workshops hosted by Microsoft Research Asia for our academic partners and researchers in Taiwan, Japan, Singapore, and Hong Kong to share the progress of collaborative research projects, discuss new ideas, and inspire technological innovation.

Over the years, Microsoft Research Asia has been collaborating with academia in Asia in a variety of research areas to advance state-of-the-art research in computer science. Knowledge and data mining research explores new algorithms, tools, and applications to collect, analyze, and mine results for data-intensive business in both the consumer and enterprise sectors. It applies data-mining, machine-learning, and knowledge-discovery techniques to information analysis, organization, retrieval, and visualization, all of which play a central and critical role in the rapid development of AI. Research in multimedia enables users to interact with a computer that understands and uses speech, graphics, and vision; thus allowing people to search for and be immersed in interactive online experiences through multimedia. We have seen tremendous innovations and growth opportunities in robotics and human-computer interactions in the form of hardware and software integration and progress of devices and mobile sensing. It is essential that we have a deep understanding of the digital revolution around us and how to best leverage opportunities to solve more pressing challenges for the benefit of society.

This workshop consists of plenary sessions, break-out sessions, and technology demos and showcases. We will also demonstrate our latest research work on AI along with products such as HoloLens and Microsoft Translator.

We look forward to seeing you soon!

Register Now


Friday, May 26

Time Session Speaker
Opening and Welcome
Hsiao-Wuen Hon, Corporate Vice President, Microsoft Research
Distinguished Talks
  • Katsu Ikeuchi, Microsoft Research
  • Mark Liao, Academia Sinica
  • Yi-Bing Lin, National Chiao Tung University
Panel: Turning Ideas Into Reality
Moderator: Tim Pan, Microsoft Research


  • Hsiao-Wuen Hon, Corporate Vice President, Microsoft Research
  • Frank Chang, President, National Chiao Tung University
  • Jun Rekimoto, The University of Tokyo
Lunch and Research Showcase
Robotics & HCI: Whether, When, and How Reddy’s 90% AI works
Chair: Katsu Ikeuchi, Microsoft Research


  • Ren C Luo, National Taiwan University
  • Masayuki Inaba, The University of Tokyo
  • Takeshi Oishi, The University of Tokyo
Machine Generation and Discovery: Going Beyond Learning
Chair: Ruihua Song, Microsoft Research


  • Winston Hsu, National Taiwan University
  • Shou-De Lin, National Taiwan University
  • Yuki Arase, Osaka University

Understanding Conversation: The Ultimate AI Challenge

Chair: Eric Chang, Microsoft Research


  • Helen Meng, The Chinese University of Hong Kong
  • Andrew Liu, The Chinese University of Hong Kong
  • Vivian Chen, National Taiwan University
Break & Networking
Robotics & HCI: Sense & Wear
Chair: Masaaki Fukumoto, Microsoft Research


  • Yoshihiro Kawahara, The University of Tokyo
  • James Lien, National Cheng Kung University
  • Hao-Chuan Wang, National Tsinghua University
Machine Learning, Textual Inference, and Language Generation
Chair: Chin-Yew Lin, Microsoft Research


  • James Kwok, The Hong Kong University of Science and Technology
  • Pascual Martínez-Gómez,  National Institute of Advanced Industrial Science and Technology
  • Koichiro Yoshino, Nara Institute of Science and Technology
Multimedia and Vision
Chair: Tao Mei, Microsoft Research


  • Toshihiko Yamasaki, The University of Tokyo
  • Yinqiang Zheng, National Institute of Informatics
  • Pai-Chi Li, National Taiwan University
Dinner at RSL hotel

Session Abstracts

AI, Robotics and Computer Vision: retrospective and perspective overview

Historically, AI, Robotics and Computer Vision shared the same origin. Early 70’s most of the AI laboratories in the world, such as MIT-AI lab and Stanford AI Lab, conducted research in these three areas in the same places. Researchers in these areas discussed research issues together, by the fact-to-face manner and published their papers in the common place, IJCAI (International Joint Conference on Artificial Intelligence). Around early 80’s, however, the separation occurred among these three areas. ICRA (International Conference on Robotics and Automation) and ICCV (International Conference on Computer Vision) launched from IJCAI around that time. It was inevitable to have such separations for deeper research along the Reductionism. Recently, however, the Cambrian explosion is occurring in these areas through too many fragmental theories by too many researchers. It is the time that we need the Holism to re-organize these areas for avoiding further fragmentations and, even, the extinction of these areas. I will examine why robotics needs AI, why AI needs Robotics, and what is the key issue toward the Holism. From this analysis, I will try to define the key directions in the future Robotics research.

Cyber Physical Integration fo IoT

Internet of Things (IoT) refers to connecting devices to each other through the Internet. Most IoT systems manage physical devices (such as Apple watches and Google glasse). In this talk we propose the concept of cyber IoT devices that are computer animation. An example is “Dandelion Mirror” that is cyber physical integration merging the virtual and physical worlds. In other words, it is a cyber-physical system (CPS) integrating computation, networking and physical process. We use IoTtalk, an IoT device management platform to develop cyber physical IoT applications. IoTtalk connects input devices (such as heart beat rate sensor) to flexibly interact with the cyber devices. We show how IoTtalk can easily accommodate cyber IoT devices such as a ball motion in animation, and how one can use a mobile phone (physical device) to control a flower growing in animation (cyber debice) and a physical pendulum guide the swing of a cyber pendulum.

Video Shot Type Classification: A First Step toward Automatic Concert Video Mashup

Varying types of shots is a fundamental element in the language of film, commonly used by a visual storytelling director. The technique is often used in creating professional recordings of a live concert, but meanwhile may not be appropriately applied in audience recordings of the same event. Such variations could cause the task of classifying shots in concert videos, professional or amateur, very challenging. We propose a novel probabilistic-based approach, named as Coherent Classification Net (CC-Net), to tackle the problem by addressing three crucial issues. First, We focus on learning more effective features by fusing the layer-wise outputs extracted from a deep convolutional neural network (CNN), pre-trained on a large-scale dataset for object recognition. Second, we introduce a frame-wise classification scheme, the error weighted deep cross-correlation model (EW-Deep-CCM), to boost the classification accuracy. Specifically, the deep neural network-based cross-correlation model (Deep-CCM) is constructed to not only model the extracted feature hierarchies of CNN independently but also relate the statistical dependencies of paired features from different layers. Then, a Bayesian error weighting scheme for classifier combination is adopted to explore the contributions from individual Deep-CCM classifiers to enhance the accuracy of shot classification in each image frame. Third, we feed the frame-wise classification results to a linear-chain conditional random field (CRF) module to refine the shot predictions by taking account of the global and temporal regularities. We provide extensive experimental results on a dataset of live concert videos to demonstrate the advantage of the proposed CC-Net over existing popular fusion approaches.

Robotics & HCI: Whether, When, and How Reddy's 90% AI works?

Artificial intelligence, and its embodiment robotics, originally aimed for making complete human copies, 100 % AI systems for replacing human workers. However, as seen in Prof. Reddy ‘s Turing Award Lecture, we have found that there is a huge boundary between artificial and human intelligence, referred to as the Frame. There is always an exception beyond the frame, that an AI system can define its tasks. Human intelligence can easily overcome such a frame by using exceptional handling methods, while artificial intelligence cannot do it and gets stuck there. Prof. Reddy, thus, proposes 90% AI and to re-name AI as augmented intelligence rather than artificial intelligence. Augmented intelligence, or 90% AI, usually works autonomously on routine works to help the burden of human workers, and, when the system encounters exceptional cases beyond the frame, the system consults fellow human co-workers to help the system. Augmented intelligence aims not to replace human workers instead of to cooperate and to help human workers. In this session, we consider the necessary requirements for such augmented intelligence robots. First, Prof. Luo at Taiwan University will outline the influence of such systems on human society. Next, Prof. Inaba at the University of Tokyo proposes one of the key technologies for such robots, that can understand the situation of fellow human workers to decide whether it is a good timing to collaborate with human or not. Finally, Prof. Oishi of the University of Tokyo describes a 3D modeling technique for giving the environmental frame of such AI ​​systems.

Machine Generation and Discovery: Going Beyond Learning

In this session, we will go beyond machine learning and discuss topics on machine generation and discovery. Can a machine comments like a young people who are familiar with internet culture for a fashion photo? Is it possible that a bot can sense users’ emotion and appropriately react to them in conversations? And can machines discover something new without any labelled data? We will discuss more possibilities of machines in this AI era.

Understanding Conversation: The Ultimate AI Challenge

Having a natural language conversation with a computer has been envisioned in movies over the years, ranging from HAL in “2001 Space Odyssey” to C3PO in “Star Wars” to Data in “Star Trek Next Generation” to  Samantha in “Her”. Yet the realization of true conversation understanding would require the following: robust speech recognition, natural language understanding, awareness of emotional and social cues, and mental model of the world. In this session, we have three great speakers who will describe the latest advances in research and also point out future problems to work on in this very important and exciting area.

Robotics & HCI: Sense & Wear

This session has three topics for realizing:

  • Truly-wearable small devices that does not need local battery by using wireless power transmission (given by Prof. Yoshihiro Kawahara).
  • Quick & accurate robot control by using vision & DNN technology (given by Prof. Jenn-Jier James Lien).
  • Much smarter personal assistant systems by observing human behavior (given by Prof. Hao-Chuan Wang).

Machine Learning, Textual Inference, and Language Generation

In this session, we have three presentations address three aspects of AI: machine learning, hardware, and language generation. The first talk presented by Prof. James Kwok describes a fast large-scale low-rank matrix learning method with a convergence rate of O(1/T), where T is the number of iterations.  The second talk given by Prof. Pascual Martínez-Gómez explains how to leverage phrases of different forms mapped to similar images to recognize phrasal entailment relations.  Prof. Yoshino closes the session by showing how to generate natural language sentences using a one-hot vector representation which can utilize information from various sources.

Vision and Multimedia

Recent years have witnessed the fast-growing research on artificial intelligence, especially the breakthroughs in deep learning, leading to many exciting ground-breaking applications in computer vision and multimedia communities. On the other hand, there remain many open problems and grand challenges regarding deep learning for vision and multimedia. In this session, we hope to discuss some reflections on this important research field, and discuss what are missing and what are the opportunities for academia and industry to further advance this field.


Hsiao-Wuen Hon, Corporate Vice President, Microsoft Research

Dr. Hsiao-Wuen Hon is corporate vice president of Microsoft, chairman of Microsoft’s Asia-Pacific R&D Group, and managing director of Microsoft Research Asia. He drives Microsoft’s strategy for research and development activities in the Asia-Pacific region, as well as collaborations with academia.

Dr. Hon has been with Microsoft since 1995. He joined Microsoft Research Asia in 2004 as deputy managing director, stepping into the role of managing director in 2007. He founded and managed Microsoft Search Technology Center from 2005 to 2007 and led development of Microsoft’s search products (Bing) in Asia-Pacific. In 2014, Dr. Hon was appointed as chairman of Microsoft Asia-Pacific R&D Group.

Prior to joining Microsoft Research Asia, Dr. Hon was the founding member and architect of the Natural Interactive Services Division at Microsoft Corporation. Besides overseeing architectural and technical aspects of the award-winning Microsoft Speech Server product, Natural User Interface Platform and Microsoft Assistance Platform, he was also responsible for managing and delivering statistical learning technologies and advanced search. Dr. Hon joined Microsoft Research as a senior researcher in 1995 and has been a key contributor to Microsoft’s SAPI and speech engine technologies. He previously worked at Apple, where he led research and development for Apple’s Chinese Dictation Kit.

An IEEE Fellow and a distinguished scientist of Microsoft, Dr. Hon is an internationally recognized expert in speech technology. Dr. Hon has published more than 100 technical papers in international journals and at conferences. He co-authored a book, Spoken Language Processing, which is a graduate-level textbook and reference book in the area of speech technology used in universities around the world. Dr. Hon holds three dozen patents in several technical areas.

Dr. Hon received a Ph.D. in Computer Science from Carnegie Mellon University and a B.S. in Electrical Engineering from National Taiwan University.

Mau-Chung Frank Chang, President, National Chiao Tung University

Dr. Mau-Chung Frank Chang is presently the President of National Chiao Tung University (NCTU), Hsinchu, Taiwan. Previously, he was the Chairman and Wintek Distinguished Professor of Electrical Engineering at UCLA (1997-2015).

Before joining UCLA, he was the Assistant Director and Department Manager of the High Speed Electronics Laboratory of Rockwell International Science Center (1983-1997), Thousand Oaks, California. In this tenure, he developed and transferred the AlGaAs/GaAs Heterojunction Bipolar Transistor (HBT) and BiFET (Planar HBT/MESFET) integrated circuit technologies from the research laboratory to the production line (later became Conexant Systems and Skyworks). The HBT/BiFET productions have grown into multi-billion dollar businesses and have dominated the cell phone power amplifier and front-end module markets for the past twenty years (currently exceeding 10 billion units/year and exceeding 50 billion units in the last decade).

Throughout his career, Dr. Chang’s research has primarily focused on the research & development of high-speed semiconductor devices and integrated circuits for RF and mixed-signal communication radar  and imaging system applications. He invented multiband,   reconfigurable RF-Interconnects for Chip-Multi-Processor (CMP) inter-core communications and inter-chip CPU-to-Memory communications. He was the 1st to demonstrate a CMOS active imager at sub-mm-Wave (180GHz) based on a Time-Encoded Digital Regenerative Receiver. He also pioneered the development of self-healing 57-64GHz radio-on-a-chip (DARPA’s HEALICs program) with embedded sensors, actuators and self-diagnosis/curing capabilities; and ultra low phase noise VCO (F.O.M. < -200dBc/Hz) with the invented Digitally Controlled Artificial Dielectric (DiCAD) embedded in CMOS technologies to vary its transmission-line permittivity in real-time (up to 20X) for realizing reconfigurable multiband/mode radios in (sub-)mm-Wave frequencies. He realized the first CMOS PLL for Terahertz operation and devised the first tri-color CMOS active imager at 180-500GHz based on a Time-Encoded Digital Regenerative Receiver and the first 3-dimensional SAR imaging radar with sub-centimeter resolution at 144GHz.

Dr. Chang is the Member of the US National Academy of Engineering, the Academician of Academia Sinica, Taiwan, Republic of China, and the Fellow of the US National Academy of Inventors. He is also a Fellow of IEEE. He has received numerous awards including Rockwell’s Leonardo Da Vinci Award (Engineer of the Year, 1992), IEEE David Sarnoff Award (2006), Pan Wen Yuan Foundation Award (2008), CESASC Life-Time Achievement Award (2009) and John J. Guarrera Engineering Educator of the Year Award from the Engineers’ Council (2014).

Dr. Chang earned his B.S. in Physics from National Taiwan University (1972); M.S. in Materials Science from National Tsing Hua University (1974); Ph.D. in Electronics Engineering from National Chiao Tung University (1979).

Chin-Yew Lin, Microsoft Research

Dr. Lin is a Principal Research Manager of the Knowledge Computing group at Microsoft Research Asia. His research interests are knowledge computing, natural language processing, semantic search, text generation, question answering, and automatic summarization.

He published over 100 papers in international conferences such as ACL, SIGIR, KDD, WWW, AAAI, IJCAI, WSDM, CIKM, COLING, and EMNLP and has an H-Index of 44. He has been granted 31 US Patents. He was the program co-chair of ACL 2012, program co-chair of AAAI 2011 AI & the Web Special Track, and program co-chair of NLPCC 2016. He created the ROUGE automatic summarization evaluation package. It has become the de facto standard in summarization evaluation.

His team at Microsoft achieved the best accuracy in the Knowledge Base Population Evaluation 2013, scored the best F1 in the Knowledge Base Acceleration Evaluation 2013 and 2014, and shipped the Entity Linking Intelligence Service (ELIS) in Microsoft //BUILD 2016.

Eric Chang, Microsoft Research

Dr. Eric Chang joined Microsoft Research Asia (MSRA) in July, 1999 to work in the area of speech technologies. Eric is currently the Senior Director of Technology Strategy at MSR Asia, where his responsibilities include industry collaboration, IP portfolio management, and driving new research themes such as eHealth. Prior to joining Microsoft, Eric had worked at Nuance Communications, MIT Lincoln Laboratory, Toshiba ULSI Laboratory, and General Electric Corporate Research and Development. Eric graduated from MIT with Ph.D., Master and Bachelor degrees, all in the field of electrical engineering and computer science. Eric’s work has been reported by Wall Street Journal, Technology Review, and other publications.

Hao-Chuan Wang, National Tsing Hua University

Hao-Chuan Wang is an Assistant Professor in the Department of Computer Science and the Institute of Information Systems and Applications at National Tsing Hua University, Taiwan (NTHU), since February 2012. He received his Ph.D. in Information Science from Cornell University in 2011. Dr. Wang’s main research interest lies in the collaborative and social aspects of Human-Computer Interaction (HCI). His work aims to integrate computing research and behavioral and social sciences for problem solving and value creation. Some of his recent projects include designing and evaluating human computation systems for supporting cross-lingual communication, using motion sensing to study the roles of gesture in conversation, and supporting interpersonal knowledge transfer with Internet of Things. Dr. Wang is an active participant of international and regional HCI communities, including ACM SIGCHI, CSCW and Chinese CHI. He currently serves as a member in the Steering Committees of CSCW and Chinese CHI, and is now a Subcommittee Chair for ACM CHI 2017 and 2018.

Helen Meng, The Chinese University of Hong Kong

Helen Meng is Professor and Chairman of the Department of Systems Engineering and Engineering Management at The Chinese University of Hong Kong (CUHK). She is the Founding Director of the CUHK MoE-Microsoft Key Laboratory for Human-Centric Computing and Interface Technologies, Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems, and the Stanley Ho Big Data Decision Analytics Research Center.  Previously she has served as Associate Dean (Research) of Engineering, Editor-in-Chief of the IEEE Transactions on Audio, Speech and Language Processing, and in the IEEE Board of Governors.  Her other professional services include memberships in the HKSAR Government’s (HKSARG) Steering Committee on eHealth Record Sharing, Research Grants Council (RGC), Convenor of the Engineering Panel in RGC’s Competitive Research Funding Schemes for the Self-financing Degree Sector, Hong Kong/Guangdong ICT Expert Committee and Coordinator of the Working Group on Big Data Research and Applications, and Chairlady of the Working Party of the Manpower Survey of the Information Technology Sector for both 2014-2015 and 2016-2017.  Helen received all her degrees from MIT.  She was elected APSIPA Distinguished Lecturer 2012-2013 and ISCA Distinguished Lecturer 2015-2016.  She received the Ministry of Education Higher Education Outstanding Scientific Research Output Award 2009, Hong Kong Computer Society’s inaugural Outstanding ICT (Information and Communication Technologies) Woman Professional Award 2015, Microsoft Research Outstanding Collaborator Award in 2016 and ICME 2016 Best Paper Award.  Helen is a Fellow of HKCS, HKIE, ISCA and IEEE.

James Kwok, The Hong Kong University of Science and Technology

Prof. Kwok is a Professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. He received his B.Sc. degree in Electrical and Electronic Engineering from the University of Hong Kong and his Ph.D. degree in computer science from the Hong Kong University of Science and Technology. Prof. Kwok served/is serving as Associate Editor for the IEEE Transactions on Neural Networks and Learning Systems and the Neurocomputing journal, and as Program Chair for a number of international conferences. He is an IEEE Fellow.

Jenn-Jier James Lien, National Cheng Kung University

Professor Lien did Ph.D. thesis research in facial expression recognition at RI, CMU, USA from 1993 to 1998.  His team developed a real-time stereo system for face recognition at a distance for US$5M DARPA surveillance grant at L1-Identity from 1998 to 2002.  He joined NCKU, Taiwan in 2002.  His student team worked on AOI with TFT-LCD and solar cell local companies since 2002.  His team started to work with Texas Instruments for embedded computer vision applied to surveillance and human-computer interactions in 2009. Since 2014, his team worked with machine & tool companies to develop deep learning technologies in the fields of DLP 3D inspection and reconstruction, robotic grasping, and tool wear monitoring and life prediction for industry 4.0.

Jun Rekimoto, The University of Tokyo

Jun Rekimoto received his B.A.Sc., M.Sc., and Ph.D. in Information Science from Tokyo Institute of Technology in 1984, 1986, and 1996, respectively. Since 1994 he has worked for Sony Computer Science Laboratories (Sony CSL). In 1999 he formed and directed the Interaction Laboratory within Sony CSL. Since 2007 he has been a professor in the Interfaculty Initiative in Information Studies at The University of Tokyo. Since 2011 he also has been Deputy Director of Sony CSL

Rekimoto’s research interests include human-computer interaction, computer augmented environments and computer augmented human (human-computer integration). He invented various innovative interactive systems and sensing technologies, including NaviCam (a hand-held AR system), Pick-and-Drop (a direct-manipulation technique for inter-appliance computing), CyberCode (the world’s first marker-based AR system), Augmented Surfaces, HoloWall, and SmartSkin (two earliest representations of multi-touch systems). He has published more than a hundreds articles in the area of human-computer interactions, including ACM SIGCHI, and UIST. He received the Multi-Media Grand Prix Technology Award from the Multi-Media Contents Association Japan in 1998, iF Interaction Design Award in 2000, the Japan Inter-Design Award in 2003, iF Communication Design Award in 2005, Good Design Best 100 Award in 2012, Japan Society for Software Science and Technology Fundamental Research Award in 2012, and ACM UIST Lasting Impact Award , Zoom Japon Les 50 qui font le Japon de demain in 2013. In 2007, He also elected to ACM SIGCHI Academy.

Katsu Ikeuchi, Microsoft Research

Dr. Katsushi Ikeuchi is a Principal Researcher of Microsoft Research. He received his Ph.D degree in Information Engineering from the Univ. of Tokyo in 1978.  After working at MIT-AI Lab as a posdoc fellow for three years, ETL (currently AIST) as a research member for five years, CMU-Robotics Institute as a faculty member for ten years, the Univ. of Tokyo as a faculty member for nineteen years, he joined Microsoft Research in 2015. His research interest spans computer vision, robotics, and computer graphics. He has received several awards, including IEEE-PAMI Distinguished Researcher Award, the Okawa Prize and 紫綬褒章 (the Medal of Honor with Purple ribbon) from the Emperor of Japan. He is a fellow of IEEE, IEICE, IPSJ, and RSJ.

Koichiro Yoshino, Nara Institute of Science and Technology (NAIST)

Koichiro Yoshino received his B.A. degree in 2009 from Keio University, M.S. degree in informatics in 2011, and Ph.D. degree in informatics in 2014 from Kyoto University, respectively. From 2014 to 2015, he was a research fellow (PD) of Japan Society for Promotion of Science. Currently, he is an Assistant Professor in Graduate School of Information Science, Nara Institute of Science and Technology.

His research interests include spoken language processing, especially spoken dialogue system, syntactic and semantic parsing, and language modeling. Dr. Koichiro Yoshino received the JSAI SIG-research award in 2013. He is an organizer of DSTC 5 and 6. He is a member of IEEE, ACL, IPSJ, and ANLP.

Mark Liao, Academia Sinica

Mark Liao received his Ph.D degree in electrical engineering from Northwestern University in 1990. In July 1991, he joined the Institute of Information Science, Academia Sinica, Taiwan and currently, is a Distinguished Research Fellow. He has worked in the fields of multimedia signal processing, computer vision, pattern recognition, and multimedia protection for more than 25 years.  During 2009-2011, he was the Division Chair of the computer science and information engineering division II, National Science Council of Taiwan. He is jointly appointed as a Chair Professor of National Chiao-Tung University and a Professor of the Department of Electrical Engineering and Computer Science of National Cheng Kung University. During 2009-2012, he was jointly appointed as the Multimedia Information Chair Professor of National Chung Hsing University. Since August 2010, he has been appointed as an Adjunct Chair Professor of Chung Yuan Christian University.  From  August 2014 to July 2016, he was appointed as an Honorary Chair Professor of National Sun Yat-sen University.  He received the Young Investigators’ Award from Academia Sinica in 1998; the Distinguished Research Award from the National Science Council of Taiwan in 2003, 2010 and 2013; the National Invention Award of Taiwan in 2004; the Academia Sinica Investigator Award in 2010; and the TECO Award from the TECO Foundation in 2016. His professional activities include: Co-Chair, 2004 International Conference on Multimedia and Exposition (ICME); Technical Co-chair, 2007 ICME; General Co-Chair, President, Image Processing and Pattern Recognition Society of Taiwan (2006-08); Editorial Board Member, IEEE Signal Processing Magazine (2010-13); Associate Editor, IEEE Transactions on Image Processing (2009-13), IEEE Transactions on Information Forensics and Security (2009-12) and IEEE Transactions on Multimedia (1998-2001).  He has been a Fellow of the IEEE since 2013 for contributions to image and video forensics and security.

Masaaki Fukumoto, Microsoft Research

He received a Ph.D. Degree from the University of Electro-Communications in 2000. He was with the NTT Human Interface Laboratories from 1990 to 1998, and the NTT DoCoMo Research Laboratories from 1998 to 2013. He is currently a Lead Researcher at the Microsoft Research (Beijing, China). His research interests include portable and wearable interface devices, and also interaction mechanisms that utilize characteristics or information of our living-body.

Masayuki Inaba, The University of Tokyo

Masayuki Inaba is a professor of Department of Creative Informatics, Graduate School of Information Science and Technology, The University of Tokyo.  He received Dr. of Engineering of Information Engineering from The University of Tokyo in 1986.  He was appointed as a lecturer in 1986, an associate professor in 1989, and a professor in 2000 at The University of Tokyo. His research interests include key technologies of robotic system, humanoid and software architecture for advanced robots.  His research projects have included hand-eye coordination in rope handling, vision-based robotic server system, remote-brained robot approach, whole-body behaviors in humanoids, robot sensor suit with electrically conductive fabric, musculoskeltal humanoid development, humanoid specialization for home assistance, and developmental integration systems with open source robot platforms. He received several awards including outstanding Paper Awards in 1987, 1998, 1999 and 2015 from the Robotics Society of Japan, JIRA Awards in 1994, ROBOMECH Awards in 1994 and 1996 from the division of Robotics and Mechatronics of Japan Society of Mechanical Engineers, and Best Paper Awards of International Conference on Humanoids in 2000 and 2006, ICRA Conference Best Paper Award in 2014 with JSK Robotics Lab members.

Pai-Chi Li, National Taiwan University

Pai-Chi Li received the B.S. degree in electrical engineering from National Taiwan University in 1987, and the M.S. and Ph.D. degrees from the University of Michigan, Ann Arbor in 1990 and 1994, respectively, both in electrical engineering: systems. He joined Acuson Corporation, Mountain View, CA, as a member of the Technical Staff in June 1994. His work in Acuson was primarily in the areas of medical ultrasonic imaging system design for both cardiology and general imaging applications. In August 1997, he went back to the Department of Electrical Engineering at National Taiwan University, where he is currently Associate Dean of College of Electrical Engineering and Computer Science, and Distinguished Professor of Department of Electrical Engineering and Institute of Biomedical Electronics and Bioinformatics.  He is also the TBF Chair in Biotechnology and Getac Chair Professor. He served as Founding Director of Institute of Biomedical Electronics and Bioinformatics in 2006-2009 and National Taiwan University Yong-Lin Biomedical Engineering Center in 2009-2011. His current research interests include biomedical ultrasound and medical devices. Dr. Li is IEEE Fellow, IAMBE Fellow, AIUM Fellow and SPIE Fellow. He was also Editor-in-Chief of Journal of Medical and Biological Engineering, and has been Associate Editor of Ultrasound in Medicine and Biology, Associate Editor of IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, and on the Editorial Board of Ultrasonic Imaging and Photoacoustics. He has won numerous awards including Distinguished Research Award, the Dr. Wu Dayou Research Award and Distinguished Industrial Collaboration Award.

Pascual Martínez-Gómez, National Institute of Advanced Industrial Science and Technology

Pascual Martínez-Gómez is a research scientist at the Artificial Intelligence Research Center in the National Institute of Advanced Industrial Science and Technology (AIST), Japan. Before moving to AIST, he worked as Assistant Professor at Ochanomizu University and as a visiting researcher at the National Institute of Informatics (2014-2016) where he researched on semantic parsing and recognizing textual entailment. He received his Ph.D. degree in Computer Science at the University of Tokyo in 2014 for his research on eye-tracking and readability diagnosis.  Pascual’s current main interests are in natural language processing, multi-modal user interfaces and machine learning.

Ren C. Luo, National Taiwan University

Prof. Luo received both Dipl.Ing, and Dr. Ing. degree from Technische Universitaet Berlin, Germany. He is currently a Chief Technology Officer of Fair Friend Group Company., an Irving T. Ho Chair and Life Distinguished Professor at National Taiwan University. He is a member of EU Echord Industrial Advisory Board. He also served two terms as President of National Chung Cheng Univ. (國立中正大学) and Founding President of Robotics Society of Taiwan. He was a tenured Full Professor in the Dept.of ECE for 15 years at North Carolina State Uni., in USA and Toshiba Chair Professor in the U. of Tokyo, Japan.

His professional career experiences include robotic control systems, multi-sensor fusion and integration, computer vision, 3D printing technologies. He has authored more than 450 papers on these topics, which have been published in refereed international journals and refereed international conference proceedings. He also holds more than 25 international patents.

Dr. Luo received IEEE Eugean Mittlemann Outstanding Research Achievement Award, IEEE IROS Harashima Innovative Technologies Award; ALCOA Company Foundation Outstanding Engineering Research Award, USA; Dr. Luo currently served as EIC of IEEE Transactions on Industrial Informatics (Impact factor 4.70)and  served 5 years as EIC of IEEE/ASME Transactions on Mechatronics (Impact Factor 3.85) as well. Dr. Luo served as President of IEEE Industrial Electronics Society and as Science and Technology Adviser to the Prime Minister office in Taiwan. Dr. Luo is a Fellow of IEEE and a Fellow of IET.

Ruihua Song, Microsoft Research

Dr. Song is a lead researcher in Microsoft Research Asia, located in Beijing, China. She received M.S. from Tsinghua University in 2003 and Ph.D. from Shanghai Jiao Tong University in 2010. She worked for Microsoft since 2003. Her research interests are Web information retrieval, information extraction, data mining, social and mobile computing, and artificial intelligence (AI) based text and conversation generation. She is working on personalized text conversation and AI based writing. Dr. Song has published more than 40 papers and served top conferences such as SIGIR, SIGKDD, CIKM, WWW, WSDM as a Senior PC or PC. She also proposed and organized NTCIR Intent tasks and serves EVIA2013 and 2014 as chairs.

Shou-De Lin, National Taiwan University

Shou-de Lin is currently a full professor in the CSIE department of National Taiwan University. He holds a BS degree in EE department from National Taiwan University, an MS-EE degree from the University of Michigan, an MS degree in Computational Linguistics and PhD in Computer Science both from the University of Southern California. He leads the Machine Discovery and Social Network Mining Lab in NTU. Before joining NTU, he was a post-doctoral research fellow at the Los Alamos National Lab. Prof. Lin’s research includes the areas of machine learning and data mining, social network analysis, and natural language processing. His international recognition includes the best paper award in IEEE Web Intelligent conference 2003, Google Research Award in 2007, Microsoft research award in 2008, 2015, 2016 merit paper award in TAAI 2010, 2014, 2016, best paper award in ASONAM 2011, US Aerospace AFOSR/AOARD research award winner for 5 years. He is the all-time winners in ACM KDD Cup, leading or co-leading the NTU team to win 5 championships. He also leads a team to win WSDM Cup 2016. He has served as the senior PC for SIGKDD and area chair for ACL. He is currently the associate editor for International Journal on Social Network Mining, Journal of Information Science and Engineering, and International Journal of Computational Linguistics and Chinese Language Processing. He is also a freelance writer for Scientific American.

Takeshi Oishi, The University of Tokyo

Takeshi Oishi is an Associate Professor at Institute of Industrial Science, The University of Tokyo, Japan. He received the B.Eng. degree in Electrical Engineering from Keio University in 1999, and the Ph.D. degree in Interdisciplinary Information Studies from the University of Tokyo in 2005. His research interests are in 3D modeling from reality, digital archiving of cultural heritage assets and mixed/augmented reality. He served as program committee members of a series of computer vision conferences such as ICCV, CVPR, ACCV, 3DIM/3DPVT (merged into 3DV), ISMAR etc. He has organized the e-Heritage Workshops.

Tao Mei, Microsoft Research

Tao Mei is a Senior Researcher with Microsoft Research Asia. His current research interests include multimedia analysis and computer vision. He has authored or co-authored over 150 papers with 10 best paper awards. He holds 18 granted U.S. patents and has shipped a dozen inventions and technologies to Microsoft products and services.  He is an Editorial Board Member of IEEE Trans. on Multimedia, ACM Trans. on Multimedia Computing, Communications, and Applications, IEEE MultiMedia Magazine, and Pattern Recognition. He is the Program Co-chair of ACM Multimedia 2018, CBMI 2017, IEEE ICME 2015, and IEEE MMSP 2015. Tao was elected as a Fellow of IAPR and a Distinguished Scientist of ACM for his contributions to large-scale video analysis and applications.

Tim Pan, Microsoft Research

Dr. Tim Pan is outreach senior director of Microsoft Research Asia, responsible for the lab’s academic collaboration in the Asia-Pacific region.

Tim Pan leads a regional team with members based in China, Japan, and Korea engaging universities, research institutes, and certain relevant government agencies. He establishes strategies and directions, identifies business opportunities, and designs various programs and projects that strengthen partnership between Microsoft Research and academia.

Tim Pan earned his Ph.D. in Electrical Engineering from Washington University in St. Louis. He has 20 years of experience in the computer industry and has co-founded two technology companies. Tim has a great passion for talent fostering. He served as a board member of St. John’s University (Taiwan) for 10 years, offered college-level courses, and wrote a textbook about information security. Between 2005 and 2007, Tim worked for Microsoft Research Asia as a university relations manager for Taiwan and Hong Kong. He rejoined Microsoft Research Asia in 2012.

Toshihiko Yamasaki, The University of Tokyo

He received the B.S. degree, the M.S. degree, and the Ph.D. degree from The University of Tokyo in 1999, 2001, and 2004, respectively.

He is currently an Associate Professor at Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo. He was a JSPS Fellow for Research Abroad and a visiting scientist at Cornell University from Feb. 2011 to Feb. 2013.

His current research interests include multimedia big data analysis, pattern recognition, machine learning, and so on. His publication includes three book chapters, more than 60 journal papers and more than 170 international conference papers. He has received around 60 awards.

Winston Hsu, National Taiwan University

Prof. Winston Hsu is an active researcher dedicated to large-scale image/video retrieval/mining, visual recognition, and machine intelligence. He is keen to realizing advanced researches towards business deliverables via academia-industry collaborations and co-founding startups. He is a Professor in the Department of Computer Science and Information Engineering, National Taiwan University, also a Visiting Scientist at Microsoft Research (2014) and IBM TJ Watson Research (2016) for visual cognition, and co-leads Communication and Multimedia Lab (CMLab). He is the Director and PI for NVIDIA AI Lab (NTU), the 1st in Asia. He received Ph.D. (2007) from Columbia University, New York. Before that, he was a founding engineer in CyberLink Corp. He serves as the Associate Editors for IEEE Multimedia Magazine and IEEE Transactions on Multimedia. He also lectured several highly rated and well attended technical tutorials in ACM Multimedia 2008/2009, SIGIR 2008, and IEEE ICASSP 2009/2011.

Xunying Liu, The Chinese University of Hong Kong

Xunying Liu is an Associate Professor in the Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong (CUHK). He received his PhD and MPhil degrees both from University of Cambridge, after his undergraduate study at Shanghai Jiao Tong University. He was a Senior Research Associate at the Machine Intelligence Laboratory of the Cambridge University Engineering Department, prior to joining CUHK. He is a co-author of the widely used HTK speech recognition toolkit and has continued to contribute to its current development in deep neural network based acoustic and language modelling. His current research interests include speech recognition, machine learning, statistical language modelling, speech synthesis, speech and language processing.

Yinqiang Zheng, National Institute of Informatics

Yinqiang obtained a Doctor of Engineering degree from Tokyo Institute of Technology in 2013, under the supervision of Prof. Masatoshi Okutomi. Before that, I got a Master degree from Shanghai Jiao Tong University in 2009 (Supervised by Prof. Yuncai Liu) and a Bachelor degree from Tianjin University in 2006. He has been working on 3D geometric computer vision and spectral imaging in the past six years, including the incremental structure-and-motion pipeline, with applications to large-scale 3D reconstruction from Internet image collections, the polynomial system solving techniques for a serious of fundamental geometric estimation problems, and spectral analysis relating to illumination/reflectance/fluorescence.

Yi-Bing Lin, National Chiao Tung University

Yi-Bing Lin received his Bachelor’s degree from National Cheng Kung University, Taiwan, in 1983, and his Ph.D. from the University of Washington, USA, in 1990. From 1990 to 1995 he was a Research Scientist with Bellcore (Telcordia). He then joined the National Chiao Tung University (NCTU) in Taiwan, where he remains. In 2010, Lin became a lifetime Chair Professor of NCTU, and in 2011, the Vice President of NCTU. During 2014 – 2016, Lin was Deputy Minister, Ministry of Science and Technology, Taiwan. Since 2016, Lin has been appointed as Vice Chancellor, University System of Taiwan (for NCTU, NTHU, NCU, and NYM).

Lin is an Adjunct Research Fellow, Institute of Information Science, Academia Sinica, Research Center for Information Technology Innovation, Academia Sinica, and a member of board of directors, Chunghwa Telecom. He serves on the editorial boards of IEEE Trans. on Vehicular Technology. He is General or Program Chair for prestigious conferences including ACM MobiCom 2002. He is Guest Editor for several journals including IEEE Transactions on Computers. Lin is the author of the books Wireless and Mobile Network Architecture (Wiley, 2001), Wireless and Mobile All-IP Networks (John Wiley,2005), and Charging for Mobile All-IP Telecommunications (Wiley, 2008). Lin received numerous research awards including 2005 NSC Distinguished Researcher, 2006 Academic Award of Ministry of Education and 2008 Award for Outstanding contributions in Science and Technology, Executive Yuen, 2011 National Chair Award, and TWAS Prize in Engineering Sciences, 2011 (The Academy of Sciences for the Developing World). He is in the advisory boards or the review boards of various government organizations including Ministry of Economic Affairs,Ministry of Education, Ministry of Transportation and Communications, and National Science Council. Lin is President of IEEE Taipei Section. He is AAAS Fellow, ACM Fellow, IEEE Fellow, and IET Fellow.

Yoshihiro Kawahara, The University of Tokyo

Yoshihiro Kawahara is an Associate Professor in the department of Information and Communication Engineering, The University of Tokyo.

His research interests are in the areas of Computer Networks and Ubiquitous and Mobile Computing. He is currently interested in developing energetically autonomous information communication devices. He’s trying to eliminate the power codes by the Energy Harvesting and the Wireless Power transmission. He’s not only interested in academic research activities but also enjoyed designing new business and its field trial while joining IT startup companies.

He received his Ph.D. in Information Communication Engineering in 2005, M.E. in 2002, and B.E. in 2000. He joined the faculty in 2005. He is a member of IEICE, IPSJ, and IEEE. He’s a committee member of IEEE MTT TC-24 (RFID Technologies.) He was a visiting assistant professor at Georgia Institute of Technology and MIT Media Lab.He is a technical advisor of AgIC, Inc and SenSprout, Inc.

Yuki Arase, Osaka University

Yuki Arase received her B.E. (2006), M.I.S. (2007), and Ph.D. of Information Science (2010) from Osaka University, Japan. She joined Microsoft Research in Beijing as an associate researcher on April 2010. Since 2014, she is an associate professor at the graduate school of information science and technology, Osaka University. She has been working on natural language processing, specifically, English/Japanese machine translation, language resource construction, paraphrasing, conversation systems, and learning assistance for English as the second language learners.

Yun-Nung (Vivian) Chen, National Taiwan University

Yun-Nung (Vivian) Chen is an assistant professor in the Department of Computer Science and Information Engineering at National Taiwan University. Her research interests include language understanding, dialogue systems, natural language processing, deep learning, and multimodality. She received Best Student Paper Awards from IEEE ASRU 2013 and IEEE SLT 2010 and a Student Best Paper Nominee from INTERSPEECH 2012. Chen earned the Ph.D. degree from School of Computer Science at Carnegie Mellon University, Pittsburgh in 2015. Prior to joining National Taiwan University, she worked for Microsoft Research in the Deep Learning Technology Center. (

Posters & Demos

#1. Progressive Graph-signal Sampling and Encoding for Static 3D Geometry Representation

  • Gene Cheung, National Institute of Informatics (NII)
  • Dinei Florencio, Microsoft Research

The goal of our research is to acquire, process and compactly represent 3D geometric data (e.g., depth images, meshes, 3D point cloud) for transmission over bandwidth-limited networks to a receiver for immersive visual communication (IVC) applications, such as holoportation. Unlike conventional 2D video conference tools like Skype, IVC renders captured human subjects in a virtual 3D space at the receiver side (observed using multi-view or head-mounted displays) so that “in-the-same-room” experience can be shared by the participants remotely located but connected via high-speed data networks. Advances in IVC, which include recent development in virtual reality (VR) and augmented reality (AR), can enable a new paradigm in distance human communication, resulting in cost reduction and quality improvement in a range of practical real-world applications, including distance learning, remote medical diagnosis, psychological counselling, etc.

#2. Cyber Archaeology of Greek and Roman Sculpture

  • Kyoko Sengoku-Haga*, Sae Buseki*, Min Lu**, Takeshi Masuda+, Takeshi Oishi**, *Tohoku University, **The University of Tokyo, +AIST
  • Katsu Ikeuchi, Microsoft Research

The goal of our project is acquiring a substantial quantity of 3D data of ancient sculpture, which enable us getting archaeologically significant results and thus proving the validity of cyber-archaeological method; the final goal is the construction of a cyber museum open to all the researchers in the world, which will enable them to try the new cyber-archaeological method in studying ancient sculpture, namely, the 3D shape comparison method developed by our project. It has potentiality to cause a paradigm shift in the field of art history/archaeology, but that is not all; this new method opens a great possibility for Asian researchers and students in the field of Greek and Roman studies. Due to the absolute lack of real works of Greek and Roman art in their countries, most Asian researchers of this field are obliged to remain in secondary level in the world. With the help of 3D models and the shape comparison tool, its research and education in Asian countries will possibly change drastically. Till 2015 we selected statues to be scanned with the view of solving specific art historical problems; now we are shifting to scan a series of notable statues of each epoch systematically, thus acquiring a mass of data applicable to different problems of numerous researchers.

#3. Contents-based assessment of the aesthetics of photography

  • Ichiro IDE, Nagoya University
  • Tao Mei, Microsoft Research

Aesthetics of photography and art work has been studied for a long time. The so-called “Rule of Thirds” based on the golden ratio is a well-known basic rule for deciding the framing. However, in reality, it is often the case that other constraints take precedence over the basic rule. Among the constraints are the purpose of photographing and the nature of the target contents-of-interest in the scene. In most situations, it is more preferable to include certain contents than other contents considering the purpose of photographing. So, the aesthetics of photography should actually be assessed according to the contents visible in the image in addition to general rules. Since the purpose of photographing varies case-by-case and in many cases not even explicitly describable, and also since it is nearly impossible to describe the nature of each content in the scene beforehand, it is very difficult to solve this problem in a general framework. So, the proposed project aimed to assess the aesthetics of especially food images whose purpose of photographing is clear (i.e. the target food should look delicious), and also whose contents are restricted and usually annotated (i.e. accompanied with dish names and/or ingredients).

#4. Engine That Listens (SETL)

  • Hideo Joho, University of Tsukuba
  • Ruihua Song, Microsoft Research

The increase of voice-based interaction has changed the way people seek information, making search more conversational. Development of effective conversational approaches to search requires better understanding of how people express information needs in dialogue. This project set the following goals to address the research challenge.

  • Develop a conceptual model that can represent information needs expressed in conversations of collaborative task
  • Identify effective features to detect dialogues that contain conversational information needs
  • Establish behavioral patterns of conversational information needs for a common collaborative task

#5. Cognition-aware Search System based on Brain Activity

  • Makoto P. Kato, Kyoto University

The purpose of this research project is to develop a cognition-aware search system that returns items such as documents, images, and music, in response to cognitive search intents (i.e. how the user wants to cognize the item). We develop methods to predict a cognitive search intent based on user brain activity during search, and to estimate the cognitive relevance of items by utilizing brain activity data as user profiles. We also investigate the relationship between brain activity and physiological data, and further propose a method of obtaining pseudo brain activity data for the case where brain activity data are not available. In this research project, we aim to extend the search engine ability from understanding what a user wants to understanding how a user wants to feel, and to initiate transferring findings in neuroscience into the industry.

#6. A Social Action Sharing System using Augmented Reality-based Reenactment

  • Yuta Nakashima, Osaka University and Hiroshi Kawasaki, Kyushu University
  • Katsu Ikeuchi, Microsoft Research

Learning actions, such as martial arts techniques or dance moves, is best done by imitating a demonstration. There are basically two ways to do this: one is by copying a teacher in real life who is performing the action, and another is by copying a video that has been recorded of the teacher. Both of these methods have drawbacks. Imitating a teacher in real life is dependent on the availability of the teacher. Using a video of the teacher is limited to the video’s viewpoint. If the action is ambiguous or hard to follow, the viewer may not change the viewpoint to see it better.

Thus, the goal of this project is to create a method that combines these two approaches, and to develop an application that is able to present it easily to users. Our proposed method is called a reenactment, and it is a 3D reconstruction of a motion sequence. In order to make it easy to capture, we restrict ourselves to using consumer depth cameras, in contrast to existing 3D reconstruction techniques that make use of multiple cameras or depth cameras. Our proposed application will use augmented reality, with the mirror metaphor: we will overlay our reenactment on top of a mirror of the user, which will copy the orientation of the user, in order for him or her to more easily compare actions with the reenactment.

#7. Extreme active 3D capturing system

  • Hiroshi Kawasaki, Kyushu University and Yuta Nakashima, Osaka University
  • Katsushi Ikeuchi, Microsoft Research

Active 3D scanning methods using a single image with static light pattern (a.k.a. one-shot 3D scan) have attracted interests from many researchers, because of their exclusive advantages, i.e., capability of capturing fast moving objects. The applicant has been researched on 3D shape reconstruction techniques based on the active 3D scanning method for more than a decade and published several papers and succeeded in recovering fast moving objects, i.e., a bursting balloon and a rotating fan. Such advantages contribute to various applications, such as medical system, product inspection, autonomous driving, etc. Among them, since a human sometimes moves so fast, motion capture of human is still a challenging problem, and thus, we set our goal to achieve capturing human in fast motion. One important difficulty of the system derived from noise, because human motion is so fast and shutter speed should be set at very short time, resulting in dark and noisy images. To compensate the light intensity, multiple projectors are frequently used, which is also useful to enlarge the recoverable region, however, this causes color crosstalk problem. Another issue is missing parts in reconstruction, which inevitably occurs because some parts of body are usually occluded by other parts. To solve those issues, we propose two approaches.

#8. Neural Network for Robust Japanese Word Segmentation

  • Mamoru Komachi, Tokyo Metropolitan University
  • Xianchao Wu, Microsoft Research

In this project, we present a neural network based model for robust Japanese word segmentation. As the growth of the web, there emerge large variations in the language use. Existing morphological analyzers are typically trained on a newswire corpus, and are not robust for processing web texts. However, there are few resources for robust Japanese natural language analysis. Thus, we aim at creating fundamental language resources for neural network-based Japanese word segmentation.

#9. Automatic Description of Human Motion and Its Reproduction by Robot Based on Labanotation

  • Shunsuke Kudoh, The University of Electro-Communications
  • Katsushi Ikeuchi, Microsoft Research

Learning from observation paradigm (LFO paradigm), in which a robot learns tasks by observing human demonstration, is an effective method for teaching motions to a robot. With this method users do not need to make programs explicitly every time they try to teach something new to a robot. However, since a human body and a robot body have very different joint structure and mass distribution, it is difficult to teach human motion by importing it directly. For example, angular trajectories of joints are difficult to directly import to a robot. Therefore, it is necessary in LFO-based learning that a robot first recognizes what a demonstrator is doing, and then from the recognition result, the robot reproduces motion that is both equivalent and feasible.Few studies have been done so far which describe human motion from the viewpoint described above. What is required for such a framework of motion description is that it be capable of both “recognizing” and “reproducing” human motion regardless of the domain of motion and the type of robot. The words “recognition” and “reproduction” in this document are defined as follows:

  • Recognition: generating motion description from observation of human motion
  • Reproduction: generating robot motion from motion description

In this project, we proposed a general method for describing human motion which was capable of both recognition and reproduction.

#10. Metric structure from motion with Wi-Fi based positioning technique

  • Takuya Maekawa and Yasuyuki Matsushita, Osaka University
  • Katsushi Ikeuchi, Microsoft Research

Construction of 3D maps of indoor environments can be a core technology for indoor real-world applications such as navigation for pedestrians and autonomous mobile robots, virtual tours of sightseeing spots and museums based on VR technologies, and so on. However, existing 3D reconstruction technologies require expensive devices such as laser range finders and depth sensors. Therefore, 3D reconstruction methods based on commodity devices are required. This study proposes a method for constructing a 3D model with real scale using a camera and Wi-Fi module, which are installed in recent smartphone devices.

#11. HCI Device Research @ MSRA

  • Masaaki Fukumoto, Microsoft Research

This project represents a somewhat “unusual” part of MSRA research as it’s hardware-based. Our research not only aims to improve existing devices, e.g., keyboard, pointing-devs, but focus more on creating brand-new interface devices.

#12. Positive-unlabeled learning with application to semi-supervised learning

  • Gang Niu (presented by Tomoya Sakai), University of Tokyo
  • Dr. Xianchao Wu, Microsoft Japan

Our original proposal was entitled “deep similarity learning in graph-based semi-supervised methods” that involves three topics: Deep learning, which is good at highly nonlinear representations of the raw data; Metric learning, which focuses on pairwise distance measures of the data such that under the ideal metric, data with a same label should be close and data with different labels should be far apart; Semi-supervised learning, which requires unlabeled data at training for classifying either test data or unlabeled data themselves. Deep similarity learning is extensively used for learning-to-rank/match features in modern search engines (where titles/short abstracts are matched to a query), and graph-based methods like random walks and label propagations are also useful in search engine companies (where doc info can be propagated using query-query graph and query info can be propagated using doc-doc graph).

However, due to some security reasons that will be explained later in “collaboration with Microsoft Research”, we cannot get access to the data possessed by Microsoft Japan in order to try our several novel ideas for the original proposal, we modified it into a closely related “positive-unlabeled learning with application to semi-supervised learning”. In positive-unlabeled (PU) learning, a binary classifier is trained from positive (P) and unlabeled (U) data without negative (N) data. This also belongs to semi-supervised learning and when submitting research papers to top learning conferences people choose the area of semi-supervised learning. In practice, PU learning has a lot of applications in detection, recognition, and retrieval problems.

The goal of this project is to better understand the state-of-the-art unbiased PU learning methods and further improve on it. The proposed non-negative PU learning is shown to be the new state of the art.

#13. Evolution Strategy Based Design of Low-Power and High Performance Compact Hardware Speech Sensors

  • Takahiro Shinozaki, Tokyo Institute of Technology
  • Frank Soong, Ningyi Xu, Microsoft Research

In our daily lives, it is often the case that we want to control electric divides such as an audio player and an illumination lamp, find a small item such as a wallet and eyeglasses, catch an event such as a baby is crying and a dog is barking. Sometimes, however, it is bothering to walk in a room interrupting what you are doing, is time-consuming to find something, and is impossible without a help of someone else. These problems can be solved if tiny and energy efficient speech sensors are ubiquitously embedded in our living environment. These sensors must be very small so that it can be attached to various things. The energy consumption must be minimum since it must continuously work with a tiny energy source so that it can react to a voice at any time. It must be noise robust since it is used in noisy environments and there is a distance between the user and the speech sensor, and the SNR is low. The goal of this project is to develop a speech recognition architecture that is suitable for such speech sensors.

#14. Supporting Query Formulations in Task-oriented Web Search

  • Takehiro Yamamoto, Kyoto University
  • Ruihua Song, Microsoft Research

Web searchers are often motivated by the needs to achieve his/her real-world tasks. For example, a user who is suffering from a sleeping problem may issue the query “sleeping pills,” intending to find a good sleeping pill to solve his/her sleeping problem. develop methods for supporting users in such task-oriented Web search. This research project particularly focused on supporting query formulations of users in task-oriented Web search by providing alternative actions to them. More specifically, we tackled the alternative action mining problem, where a system is required to find alternative actions for a given query. An alternative action for a query is defined as an action that can solve the same problem. For example, given the query “sleeping pills,” our objective is to find alternative actions such as “have a cup of hot milk” or “stroll before bedtime,” both these alternative actions can achieve the same goal behind the query, i.e., “solve the sleeping problem.” Mined alternative actions can be utilized for supporting a searcher in a task-oriented Web search. For example, by suggesting the alternative actions to the searcher issuing the query “sleeping pills,” he/she is able to notice different solutions and make an improved decision on how to solve his/her sleeping problem.

#15. Gamification-based Context Collection for Application Recommendation and Life-logging

  • Takahiro Hara, Osaka University
  • Xing Xie, Microsoft Research

Recently, a flood of applications often makes users difficult to know all available applications and choose appropriate one according to their situations (context). In our previous project under CORE 11, we have first tried to investigate relationships between high-level user context (e.g., how busy, how good in health, and with whom the user is) and application usage by analyzing a large amount of application usage logs collected through a monster-breeding game on smart-phones. We have then developed a preliminary prototype of a system which recommends applications suitable for user’s current context based on the analytical results. This system is effective for solving the above mentioned application-flood problem, especially for people who are not familiar with smart-phones such as elderly people. The high-level context information collected by our game is useful for not only application recommendation, but also many other applications such as life-logging. Existing life-logging services either require burdensome operations such as inputting complicated information of users or just record simple information that can be easily calculated from sensor data such as walking distance and sleeping time. We therefore have developed, in our previous project, a life-logging service which makes use of high-level context provided by our game, thus, users need not do any extra operations.

In this continuation project, we continued the above previous studies to further improve both of the preliminary developed systems. In particular, we focused on development of some application recommendation techniques such as that predict applications which will be used next to reduce the user’s burden to search the applications from a large number of installed applications.

#16. Wearable Human Interface Device Using Micro-Needles

  • Norihisa Miki, Keio University
  • Masaaki Fukumoto, Microsoft Research

The next generation wearable human interface devices mandate to acquire signals of human activity, such as EEG and EMG, with high sensitivity and accuracy and to transfer information to human with minimum loss and low power consumption. These challenges are essentially derived from stratum corneum, which covers the surface of the skin, is a good insulating layer to protect the body from the environment, and is to work as the interface between the human interface devices and the body. We highlight two micro-needle-based human interface devices, which can penetrate through the high-impedance stratum corneum without reaching the pain points; The needle-type electrotactile displays can transfer tactile information at much lower voltage than the conventional flat-electro-tactile type. The needle-type electrodes for EEG can successfully measure high-quality EEG from hairy parts with a help of its candle-like shape.

Although these results were new and highly evaluated from the research point of view, the needles may not be suitable for commercial applications, in particular, for long term use. Therefore, in this research project, we attempt to optimize the interface between the wearable devices and the human skin in terms of efficiency and user affinity. We will investigate the shape, material, density, etc. of the micro-needle electrodes. In addition, how the reliable interface can be maintained needs to be discussed for the user affinity.

#17. A Multi-tap CMOS Sensor for Dynamic Scene Estimation

  • Hajime Nagahara, Osaka University
  • Steve Lin, Microsoft Research

It is impossible to apply many computer vision methods, such as shape from shading [1], depth from defocus [2], high-dynamic range imaging [3] and specular/Lambartian separation [4], to dynamic scene, since they require to use multiple image acquisitions and assume that scene is static during the capturing the images. However, regular CCD or CMOS sensors have uniform exposure timings and are impossible to take multiple images at the same time. These methods cannot ignore the difference of exposure timings among the images when the scene has dynamic motions. In this proposal, we propose to use a multi-tap CMOS sensor [5] for applying these methods to dynamic scene. The multi-tap CMOS sensor is able to acquire multiple images at almost the same time, 100 micro seconds difference. We can ignore the exposure differences among the images, but also switch lightings. Using these images, we can estimate a shape of object of a dynamic scene by using shape from shading technique.

#18. Computer Assisted English Email Writing System

  • Jason S. Chang, National Tsing Hua University
  • Chin-Yew Lin, Micrsoft Research

Learners of English as a second language typically have problems getting up to speed and become a fluent and confident writer. In this project, we propose to develop a method for extracting grammar patterns, which can be used to provide instant writing suggestion in Microsoft Word. In our approach, we use partial parsing and pattern templates to extract grammar patterns and dictionary-like examples in genre-specific corpora. The method involves automatically derive base phrases of sentences in a given corpus, automatically generate and rank candidate patterns and examples matching templates, and filter high-ranking patterns and examples. At run-time, as the user types (or mouses over) a word, the system automatically retrieves and displays grammar patterns and examples, most relevant to the word and its surrounding context. The user can opt for patterns from a general corpus, academic corpus, or commonly overused dubious patterns found in a learner corpus. We present a prototype writing assistant, WriteAhead, that applies the method to reference and learner corpora, such as Gigaword English, CiteSeerx x, and WikEd Error Corpus. We expect intensive interactions provided by WriteAhead via writing suggestions on patterns and examples to continue the partial sentence. WriteAhead would minimize the time spent on hesitation and searching for the right word. Our methodology effectively turns the Microsoft word processor into a resource-rich Interactive Writing Environment, much like the Interactive development environments that are commonplace in writing software code.

#19. A Distributed Platform for Querying Big Graph Data

  • James Cheng, The Chinese University of Hong Kong
  • Bin Shao, Microsoft Research

The project aims to develop a distributed platform for efficiently querying big graphs potentially stored in distributed locations. Graph queries such as shortest-path distance queries, reachability queries, pattern matching queries, neighborhood queries, etc., have many important applications and have been extensively studied in the past. However, in recent years, we have witnessed a surge of graph data from various sources such as online social networks, online shopping networks, mobile and communication networks, financial and marketing networks, the WWW and Internet, etc. Most of these graphs are massively large, and existing graph query processing techniques are not scalable, while existing distributed graph computing systems were not designed for handling online graph query workloads. This motivates us to design a new type of distribute system for graph query processing. Such a system can advance research in the field of large scale graph query processing, where scalable techniques are still lacking, and also benefit industry, where massive volumes of graph data have been generated and online querying becomes increasingly critical.

#20. Development of Seizure Detection Headband

  • Herming Chiueh, Shih-kai Lin, National Chiao Tung University
  • Chin-Yew Lin, Microsoft Research

Epilepsy is a common neural disorder disease; about 1.7% of the global population has epilepsy. Most patients use antiepileptic drugs to reduce their seizures. Among them, nearly one-third of the patients are drug-resistant epilepsy. The alternative treatment is the resection surgery of removing the epileptogenic zone. However, all above patients will still have some seizures, which will influence the patients’ quality-of‐life, and further introduce danger and convenience to patients and people around. This project proposed to design and develop a smart headband for the epilepsy patients. The headband will consist of a textile headband with printed-circuit-board (PCB) inside, and textile electrodes on it.

#21. Chatting robot with behavior learning

  • Katsu IKEUCHI, Microsoft Research

Demand for service robots has been increasing due to the necessity of elderly care and daily-life support. MSRA robotics team and MS Strategic prototyping team are jointly developing intelligent service robots to meet this demand. The robots follow the remote/cloud brain architecture for flexibility and varsity. From the microphone, incoming voice signals are converted to text messages which are sent to the basic activity module on the cloud server. Based on the analysis by the module, several services on the cloud server are launched. The current capability of the robot includes general chatting, language translation, person identification, object recognition, and guiding.

Retention of fluency is one of the prerequisites in such service robot. By connecting a chatting engine to the robot, the conversation ability of the service robot is remarkably improved. In conversation, a gesture along with a spoken sentence is an important factor as it is referred to as body language. This is particularly true for humanoid service robots, because the key merit of such a humanoid robot is its resemblance to human shape as well as human behavior. We proposed a new method to generate gestures along with spoken sentences for such humanoid service robot.

#22. Scientific Document Summarization

  • Min-Yen Kan*, Kokil Jaidka+, Muthu Kumar Chandrasekaran*, *National University of Singapore, +University of Pennsylvania
  • Chin-Yew Lin, Microsoft Research

We developed resources and technologies that solve problems for scientific summarization. Current scientific summaries are written manually by scholars, synthesizing the goals and contributions of a study. Advances in automated document summarization, while significant, are not adapted to summarize the specialized scientific document format, typified with conventional argumentation patterns and use of technical terminology. Furthermore, automatic summarization systems do not support a researcher in the actual task of a literature survey – which may involve tracking a research over time, and following developments since a seminal publication, which could amass hundreds to thousands of citations per year. It is also difficult to quantitatively evaluate these summaries, because there is no single rubric of what comprises an ideal scientific summary. Importantly, the key resource of a standardised reference corpus is missing – this is needed to interest the research community in dedicating resources and manpower, as comparative objective benchmarking is critical to reproducibility and assessment.

#23. Performance Monitoring and Reliability Enhancement with Log Data Analysis for Large Scale Distributed Systems

  • Michael R. Lyu, The Chinese University of Hong Kong
  • Dongmei Zhang, Microsoft Research

This project aims at advancing the state-of-the-art techniques of log generation, selection and analysis for performance monitoring and reliability enhancement. We improve logging quality at the time logs are written, and investigate cost-effective logging mechanisms for large-scale distributed systems. The corresponding methods to collect and parse the logs generated in the target systems are also designed. We apply data mining techniques to select important and informative logs, and engage a log parser to structure raw logs with clean features for machine learning processing. With abundant information extracted in the log data, performance monitoring and system troubleshooting will be conducted accordingly. Finally the associated tools for performance monitoring and anomaly detection will be published for public access. There are totally three objectives.

#24. Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner

  • Tseng-Hung Chen, Min Sun, National Tsinghua University
  • Jianlong Fu, Microsoft Research

Datasets with large corpora of “paired” images and sentences have enabled the latest advance in image captioning. Many novel networks trained with these paired data have achieved impressive results under a domain-specific setting — training and testing on the same domain. However, the domain-specific setting creates a huge cost on collecting “paired” images and sentences in each domain. For real world applications, one will prefer a “cross-domain” captioner which is trained in a “source” domain with paired data and generalized to other “target” domains with very little cost (e.g., no paired data required).

We propose a cross-domain image captioner that can adapt the sentence style from source to target domain without the need of paired image-sentence training data in the target domain. Left panel: Sentences from MSCOCO mainly focus on location, color, size of objects. Right panel: Sentences from CUB-200 describe the parts of birds in detail. Bottom panel shows our generated sentences before and after adaptation.

#25. Provenance and Validation in an AI perspective - Interactive Global Histories as a Showcase

  • Andrea Nanetti, Siew Ann Cheong, Nanyang Technological University
  • Chin-Yew Lin, Microsoft Research

Automatic Acquisition of Historical Knowledge and Machine Reading for News and Historical Sources Indexing/Summary can move from the historians and the reporters experiences in finding out more and more background information surrounding the event. In this context, the New Silk Road is quite a fortunate an exquisite case study. The first mention of the Silk Road (Seidensdtrasse) can be found in Ferdinand von Richthofen’s China (1877-1912) to name a segment of the intercontinental communication network in a specific time period: the first-century AD Marinus of Tyre’s overland route from the Mediterranean to the borders of the land of silk. But across time, the Silk Road became a double synecdoche (i.e., a form of speech, in which a part is made to represent the whole): the Road represents the entire intercontinental connectivity networks; the Silk is for all sorts of goods and trade. In September-October 2013 PRC President Xi’s proposal to the surrounding countries for a new silk road used that concept as a metaphor (i.e., a figure of speech, in which a word or phrase is applied to an object or action to which it is not literally applicable) to brand the launch of the Asian Infrastructure Investment Bank and the Silk Road Infrastructure Bank.

#26. Performance-Centric Scheduling with Service Guarantees for Datacenter Jobs

  • Wei Wang, HKUST
  • Thomas Moscibroda, Microsoft Research

With the wide deployment of data-parallel frameworks like Spark and Hadoop, it has become a norm to run data analytics applications in a large cluster of machines. Having different applications coexisting in a cluster, data analytics jobs, each consisting of many parallel tasks, expect predictable performance with guarantees on the maximal completion delay. Cluster operators, on the other hand, aim to minimize the response times of jobs, i.e., the time between the instants of job arrivals and completions.

Prevalent cluster schedulers deployed in today’s datacenters rely on fair sharing to provide predictable performance, e.g., Dryad’s Quincy, Hadoop Fair and Capacity Scheduler, and YARN’s DRF scheduler. By seeking max-min fair allocations at all times, fair schedulers aim to assure that each job receives equal amounts of cluster resources (to the degree possible), regardless of the behaviors of the other jobs, therefore, achieving performance isolation from one another. However, it has been widely confirmed that fair schedulers can be inefficient, and may result in significantly long response times.

#27. An Image to Poetry System with an Evaluation Framework

  • Chao-Chung Wu, Shou-De Lin, National Taiwan University
  • Mi-Yen Yeh, Academia Sinica
  • Ruihua Song, Microsoft Research

Recently, with the development of deep learning, natural language generation such as image to caption and dialogue generation has gained better and amazing results with respect to either accuracy or surprising output to human, especially in creative language generation like poetry generation. Among poetry generation, the creativity and readability of ancient poetry leave more imagination space for reader to understand and sometimes the constraint of ancient poetry such as length, rhyme, Part-of-speech make the poetry exactly same like the original poem in words. In this project, we develop a model that exploits the given image to generate modern Chinese poems. While generating poems that follow the constraints such as length, rhyme, and part-of-speech, the model also wants to show some “creativity” of a machine. That is, the model does not just copy the poem line of those exiting famous ones, but also adds some new ideas.

#28. Seeing Bot

  • Ting Yao, Tao Mei, Microsoft Research

#29. Predicting Winning Price in Real Time Bidding with Censored Data

  • Wush Chi-Hsuan*, Mi-Yen Yeh*, Ming-Syan Chen#, *Academia Sinica, #National Taiwan University
  • Xing Xie, Microsoft Research

#30. FallCare+: An IoT Surveillance Solution with Microsoft Kinects & CNTK for Fall Accidents

  • Charles HP Wen, National Chiao Tung University
  • Chin-Yew Lin, Microsoft Research