Contact: Sitaram Lanka
The BrainWave deep learning platform running on field-programmable gate array (FPGA)-based hardware microservices supports democratizing AI for all of Microsoft. Hardware microservices enable direct, ultra-low-latency access to hundreds of thousands of FPGAs from software running anywhere in the datacenter. Towards this end we will show two demos — 1) a hardware microservices implementation of a large-scale deep learning model to improve Bing query relevance; and 2) compiler and runtime support to enable developers working in either CNTK or TensorFlow to easily leverage the BrainWave platform.
Contact: Lucas Joppa
Understanding the land cover types and locations within specific regions enables effective environmental conservation. With sufficiently high spatial and temporal resolution, scientists and planners can identify which natural resources are at risk and the level of risk. This information helps inform decisions about how and where to focus conservation efforts. Current land cover products don’t meet these spatial and temporal requirements. Microsoft AI for Earth Program’s Land Cover Classification Project will use deep learning algorithms to deliver a scalable Azure pipeline for turning high-resolution US government images into categorized land cover data at regional and national scales. The first application of the platform will produce a land cover map for the Puget Sound watershed. This watershed is Microsoft’s own backyard and one of the nation’s most environmentally and economically complex and dynamic landscapes.
Contact: Linjun Yang
Visual search, AKA search by image, is a new way of searching for information using an image or part of an image as the query. Similar to text search, which connects keyword queries to knowledge on the web, the ultimate goal of visual search is to connect camera captured data or images to web knowledge. Bing has been continuously improving its visual search feature, which is now available on Bing desktop, mobile, and apps, as well as Edge browser. It can be used not only for searching for similar images but also for task completion, such as looking for similar products while shopping. Bing image search now also features image annotation and object detection, to further improve the user experience. This demo will show these techniques and the scenarios for which the techniques were developed.
Contact: Anna Roth
This demo shows how Custom Vision Service can be applied to many AI vision applications. For example, if a client needs to build a custom image classifier, they can submit a few images of objects, and a model is deployed at the touch of a button. Microsoft Office is also using Custom Vision Service to automatically caption images in PowerPoint.
Contact: Olivier Nano
Two of the most important components of speech recognition systems are the acoustic model and the language model. Those models behind Microsoft’s speech recognition engine have been optimized for certain usage scenarios, such as interacting with Cortana on a smart phone, searching the web by voice, or sending text messages to a friend. But if a user has specific needs, such as recognizing domain-specific vocabulary or the ability to understand accents, then the acoustic and language models need to be customized. This demo will show the benefits of customizing acoustic and language models to improve the accuracy of speech recognition for lectures. Using the Custom Speech Service (Cognitive Service) technics, the demo will show how the technology can tune speech recognition for specific topic and lecturers.
This demo will show the benefits of customizing acoustic and language models to improve the accuracy of speech recognition for lectures. Using the Custom Speech Service (Cognitive Service) technics, the demo will show how the technology can tune speech recognition for specific topic and lecturers.
Contact: Gang Hua
This demo demonstrates several applications of Microsoft’s recent work in artistic style transfer for images and videos. One technology, called StyleBank, provides an explicit representation for visual styles with a feedforward deep network that can clearly separate the content and style from an image. This framework can render stylized videos online, achieving more stable rendering results than in the past. In addition, the Deep Image Analogy technique takes a pair of images, transferring the visual attributes from one to the other. It enables a wide variety of applications in artistic effects.
Contact: Dan Deutsch
DeepFindSearching within web documents on mobile devices is difficult and unnatural: ctrl-f searches only for exact matches, and it’s hard to see the search results. DeepFind takes a step toward solving this problem by allowing users to search within web documents using natural language queries and displays snippets from the document that answer the user’s questions.
Users can interact with DeepFind on bing.com, m.bing.com, and the Bing iOS App in two different ways: as an overlay experience, which encourages exploration and follow-up questions, or as a rich carousel of document snippets integrated directly into the search engine results pages, which proactively answers the user’s question.
Contact: Nilesh Bhide
As we move into the world of messaging apps, bots and botification of content, users are starting to move from keyword searches to relying on bots and assistants for their information seeking needs. Bing has built InfoBots, a set of AI- and Bing-powered QnA capabilities that bots can leverage to help users with their information-seeking needs. InfoBots QnA capabilities are tuned for answering any information-seeking question from a wide variety of content (Open domain content from the Internet, specific vertical domain content, etc.). InfoBots supports conversational QnA through multi-turn question and answer understanding to answer natural-language-based questions. InfoBots capabilities have applications in both consumer and enterprise contexts.
Contact: Silviu-Petru Cucerzan
This demo shows how InstaFact brings the information and intelligence of the Satori knowledge graph into Microsoft’s productivity software. InstaFact can automatically complete factual information in the text a user is writing or can verify the accuracy of facts in text. It can infer the user’s needs based on data correlations and simple natural-language clues. It can expose in simple ways the data and structure Satori harvests from the Web, and let users populate their text documents and spreadsheets with up-to-date information in just a couple of clicks.
Contact: Mahmoud Adada
Maluuba’s vision is to build literate machines. The research team has built deep learning models that can process written unstructured text and answer questions against it. The demo will showcase Maluuba’s machine reading comprehension (MRC) system by ingesting a 400-page automotive manual and answering users’ questions about it in real time. The long-term vision for this product is to apply MRC technology to all types of user manuals, such as cars, home appliances, and more.
Contact: Jina Suh
Building machine learning (ML) models is an involved process requiring ML experts, engineers, and labelers. The demand of models for common-sense tasks far exceeds the available “teachers” that can build them. We approach this problem by allowing domain experts to apply what we call Machine Teaching (MT) principles. These include mining domain knowledge, concept decomposition, ideation, debugging, and semantic data exploration.
PICL is a toolkit that originated from the MT vision. It enables teachers with no ML expertise to build classifiers and extractors. The underlying SDK enables system designers and engineers to build customized experiences for their problem domain. In PICL, teachers can bring their own dataset, search or sample items to label using active learning strategies, label these items, create or edit features, monitor model performance, and review and debug errors, all in one place.
Contact: David Baumert
This demonstration project, created jointly by the Microsoft Artificial Intelligence and Research (AI+R) Strategic Prototyping team and MSRA, uses Softbank’s Pepper robot as testbed hardware to show a set of human-collaboration activities based on Microsoft Cognitive Services and other Microsoft Research technologies.
As both a research and prototype-engineering effort, this project is designed to implement software technology and learn from concepts such as Brooks’ subsumption architecture, which distributes the brain activities of the robot between the local device for reflex functions, the local facility infrastructure for recognition functions, and remote API services hosted in the cloud for cognitive functions. This implementation is designed to be machine-independent and relevant to all robots requiring human-collaboration capabilities. This approach has supported new investigations such as non-verbal communication and body movements expressed and documented using Labanotation, making it possible for a robot to process conversations with humans and automatically generate life-like and meaningful physical behaviors to accompany its spoken words.
Contact: Chris Wendt
Microsoft Translator live enables users to hold translated conversations across two or more languages, with up to 100 participants participating at the same time using PowerPoint, iOS, Android, Windows and web endpoints. Businesses, retail stores, and organizations around the world need to interact with customers who don’t speak the same language as the service providers, and Microsoft Translator live is an answer to all these needs.
Contact: Dan Bohus
This demo shows our work on a mobile robot that gives directions to visitors. Currently, this robot is navigating Microsoft Building 99, leading people, escorting and interacting with visitors and generally providing a social presence in the building. This robot uses Microsoft’s Platform for Situated Intelligence and Windows components for human interaction, as well as a robot operating system running under Linux for robot control, localization and navigation.
Contact: Ivan Tarapov
Project Inner Eye is a new AI product targeted at improving the productivity of oncologists, radiologists and surgeons when working with radiological images. The project’s main focus is in the treatment of tumors and monitoring the progression of cancer in temporal studies. InnerEye builds upon many years of research in computer vision and machine learning. It employs decision forests (as used already in Kinect and Hololens) to help radiation oncologists and radiologists deliver better care, more efficiently and consistently to their cancer patients.
Contact: Katja Hofmann
Project Malmo is an open source AI experimentation platform that supports fundamental AI research. With the platform, Microsoft provides an experimentation environment in which promising approaches can be systematically and easily compared, and that fosters collaboration between researchers. Project Malmo achieves is built on top of Minecraft, which is particularly appealing due to its design; open-ended, collaborative, and creative. Project Malmo particularly focuses on Collaborative AI – developing AI agents that can learn to collaborate with other agents, including humans, to help them achieve their goals. To foster research in this area, Microsoft recently ran the Malmo Collaborative AI Challenge, in which more than 80 teams of students worldwide, competed to develop new algorithms that facilitate collaboration. This demo demonstrates the results of the collaborative AI challenge task and shows selected agents and how new tasks and agents can be easily implemented.
Contact: Ying Wang
Zo is a sophisticated machine conversationalist with the personality of a 22-year-old with #friendgoals. She hangs out on Kik and Facebook and is always interested in a casual conversation with her growing crowd of human friends. Zo is an open-domain chatbot and her breadth of knowledge is vast. She can chime into a conversation with context-specific facts about things like celebrities, sports, or finance but she also has empathy, a sense of humor, and a healthy helping of sass. Using sentiment analysis, she can adapt her phrasing and responses based on positive or negative cues from her human counterparts. She can tell jokes, read your horoscope, challenge you to rhyming competitions, and much more. In addition to content, the phrasing of the conversations must sound natural, idiomatic, and human in both text and voice modalities. Zo’s “mind” is a sophisticated array of multiple machine learning (ML) techniques all working in sequence and in parallel to produce a unique, entertaining and, at times, amazingly human conversational experience. This demo shows some of Zo’s latest capabilities and how the team has achieved these technical accomplishments.