Each year during TechFest, Microsoft Research displays a collection of cutting-edge research projects that offer new functionalities for Microsoft products and, often, for the greater research ecosystem. Many of those projects are discussed below.
Deep Zoom technology from Microsoft Silverlight enables you to interact with the TechFest project posters. You can zoom in or out, and smoothly load and pan the poster images. In this fashion, you can enjoy an immersive experience almost as satisfying as being on the show floor.
Bored of having an ordinary-looking avatar? Want to create something unique? A dragon? A lobster? An alien? BodyAvatar is a natural interface that lets Kinect players create 3-D avatars of any shape they can imagine, using their bodies as the input. Based on a first-person, “you’re the avatar” metaphor, the player simply scans his or her body posture into an initial shape for the avatar and then performs various intuitive gestures that change the shape of the avatar on the screen. BodyAvatar unleashes the creativity of everybody, letting people turn their wildest imagination into reality without needing to learn complex 3-D modeling tools. Learn more »
In 2012, Microsoft formed a unique partnership with the International Union for Conservation of Nature’s Red List of Threatened Species. Central to the partnership is creating the Red List Threat Mapping Tool — a spatial database application that enables experts and decision-makers around the world to find, map, explore, add, modify, and notate the various threats to any focal species. This SQL Server 2012 application enables visitors to query global biodiversity, protected area, and threat databases in real time. New software is being built to make it easy for anyone to construct these kinds of geo-data applications “at the speed of thought,” without having to write a line of code. The software natively understands spatial data and spatial search, introduces a new, iterative search method, and produces databases that remain flexible, so that all aspects of the database and the application can be modified at any time.
Have you ever encountered a situation for which it is hard to describe your sensation? Have you ever wanted to transfer your facial and other biometric physics to someone close via tactile, audio, and visual signals? This project reflects on the meaning of interaction and communication, from the perspective of our innate sensing and beyond verbal communication. Considering facial expressions and head poses as meaningful indicators, the project maps them onto a plethora of interrelated aural, tactile, and visual responses. The project also aims to be a platform for study of different sensing techniques for information retrieval and communication. For example, the projection of music beats to vibrations in a person’s joints would be a natural way to aid dancing by hearing-impaired people. Other usages could include mapping eye gaze, laughter, eye blinks, or voice pitch to audio, visual, and vibration to create intimacy with another person. Learn more »
Augmented reality is an important technique for improving user experience in many applications, especially in the mobile Internet era, in which smart devices are cheap and popular. This project features augmented-reality scenarios for mobile phones or tablets based on 3-D reconstruction technologies. A typical scenario: Assuming that all sellers such as Amazon or IKEA make 3-D models for their products with a portable 3-D scanner app, if you want to buy a vase for your desk, you can find candidates by keyword or visual search. By then photographing your desk with your phone camera, a photo of the vase on the desk will be shown on the screen. With this true, 3-D vase model, you can walk around to evaluate the effect to see which vase is most desirable. Other scenarios include 3-D facial modeling, social-network sharing, and 3-D printing. Learn more »
Obtaining a steady video from hand-held video cameras, mobile phones, and Surface is becoming increasingly necessary for normal users. Achieving high-quality output from existing video editors remains challenging, though. For example, some results still have jitters and undesired, low-frequency motion, too much cropping, or annoying shearing and wobbling. This project demonstrates a new optimization technique, without hardware support, that effectively can suppress these artifacts altogether. Moreover, this technique also can be applied to different devices for different scenarios: a video post-processing editor on the desktop and on Surface, or real-time stabilization on mobile phones for better viewfinders or face-to-face communication.
Kinect has brought full-body tracking to your living room, enabling you to control games and apps with your gestures. One promising direction in Kinect’s evolution is hand-gesture recognition. By capturing a large, varied set of images of people’s hands, the project uses machine learning to train Kinect to determine reliably whether your hand is open or closed. A handgrip detector, the gestural equivalent of the mouse click, then can be built. This detector will be included in a forthcoming release of the Kinect for Windows SDK and should open a new wave of natural-user-interaction applications.
Big data usually refers to the volume of data to process, but in a real-time environment, velocity is equally important. Direct processing of real-time data enables quicker reaction to events, providing a competitive advantage over processing offline data. The software-and-services industry is embracing machine learning to make its offerings more intelligent. This project combines technology for efficient temporal stream processing with support for machine learning. The project shows how to compose temporal processing and Infer.NET machine learning into a reasoning flow running in StreamInsight and how to provide incremental online updates of the machine-learning model at runtime. Also featured is how to go between online stream processing and offline data analysis, as well as how to operationalize an offline, validated reasoning flow in a production system. This work adds value to concrete customer scenarios in the manufacturing and cloud/IT services domains.
Textbooks are acknowledged as the educational input most consistently associated with gains in student learning. They are the primary conduits for delivering content knowledge to students, and teachers base lesson plans primarily on the material in textbooks. This project features a data-mining-based approach for enhancing the quality of textbooks. The approach includes a diagnostic tool for authors and educators to identify algorithmically any deficiencies in textbooks. Techniques are provided for algorithmically augmenting sections of a book with links to selective web content. The focus is on augmenting textbook sections with links to relevant videos, mined from an abundant collection of free, high-quality educational videos available on the web. These techniques have been validated over a corpus of high school textbooks spanning various subjects and grades.
Intelligent Tutoring Systems (ITS) can enhance significantly the educational experience, both in the classroom and online. Problem generation, an important component of ITS, can help avoid copyright or plagiarism issues and help generate personalized workflows. This capability, for a variety of subject domains, can be demonstrated with user-interaction models:
- Algebraic-proof problems: Given an example problem, the tool generates similar problems.
- SAT sentence-completion problems: Given a vocabulary word w, the tool generates a sentence completion whose correct answer is w, along with a few incorrect alternates.
- Logic-proof problems: Given an input problem, the tool generates variants. Given parameters such as number or size of variables or clauses, the tool generates fresh problems.
- Board-game problems: Given rules of a board game—such as 4×4 tic-tac-toe with only row/column sequences—and hardness level, the tool generates starting configurations that require few steps to win.
Learn more »
Recent efforts by organizations such as Coursera, edX, Udacity, and Khan Academy have produced thousands of educational videos logging hundreds of millions of views in attempting to make learning freely available to the masses. While the presentation style of the videos varies by the author, they all share a common drawback: Videos are time-consuming to produce and are difficult to modify after release. VidWiki is an online platform to take advantage of the massive numbers of online students viewing videos to improve video-presentation quality and content iteratively, similar to other crowdsourced information projects such as Wikipedia. Through the platform, users annotate videos by overlaying content atop a video, lifting the burden on the instructor to update and refine content. Layering annotations also assists in video indexing, language translation, and the replacement of illegible handwriting or drawings with more readable, typed content. Learn more »
Since 2007, the Computational Ecology and Environmental Science (CEES) group at Microsoft Research Cambridge has been pursuing the fundamental research needed to build predictive models of critical global environmental systems. Such predictions are needed urgently at a variety of scales—and to support effective decision-making, they must include uncertainty. In recent years, the philosophy of how to make such predictions has become clear: A “defensible modeling pipeline” is needed in which data and models are integrated in a Bayesian context and which is transparent and repeatable enough to stand up in court. The technology, though, is lagging far behind, making this pipeline impossible to build for all but the most technically savvy. Enter CEES Distribution Modeler, a browser app that enables users to visualize data, define a complex model, parameterize it using Bayesian methods, make predictions with uncertainty, and then share all that in a fully transparent and repeatable form.
Information workers (IWs) need to gather structured data from various sources, combine that with their own data, analyze the data, and take business decisions based on the data. Discovering and importing the data into Excel is tedious and cumbersome, and data analysis is either time-consuming or requires programming skills. This project presents tools for non-expert Excel users to discover and analyze data quickly and easily. For data discovery, it offers technology that extracts structured data from the web, indexes them, and enables IWs to search over them. IWs can perform the searches directly from Excel, easily import the data into a spreadsheet, and combine them with their own data. For data analysis, this project presents a set of machine-learning tools seamlessly integrated into Excel. The technology automatically can infer the values of missing cells, detect outliers, and enable users to analyze data tables more productively.
This project uses and extends the narrative storytelling attributes of whiteboard animation with interactive information-visualization techniques to create a new, engaging form of storytelling with data. SketchInsight is an interactive whiteboard system for storytelling with data through real-time sketching. It facilitates the creation of personalized, expressive data charts quickly and easily. The presenter sketches an example icon, and SketchInsight automatically completes the chart by synthesizing from example sketches. Furthermore, SketchInsight enables the presenter to interact with the data charts. Learn more »
This project is a novel method for real-time, 3-D scene capture and reconstruction. Using several live color and depth images, this technology builds a high-resolution voxelization of visible surfaces. Unlike previous methods, this effort captures dynamic scene geometry, such as people moving and talking. The key to the approach is an efficient, sparse voxel representation ideally suited to GPU acceleration. Rather than allocating voxel memory as a 3-D array corresponding to the entire volume in a space, the project stores only those voxels that contain the visible surfaces; leading to a much more compact representation for the same voxel resolution. As a result, the project captures and processes ultra-high-resolution voxelizations from fused image data, utilizing depth, silhouette, and color cues consistently.
Though the phrase “going viral” has permeated popular culture, the concept of virality itself is surprisingly elusive, with past work failing to define rigorously or even definitively show the existence of viral content. By examining nearly a billion information cascades on Twitter—involving the diffusion of news, videos, and photos—this project has developed a quantitative notion of virality for social media and, in turn, identified thousands of viral events. ViralSearch lets users interactively explore the diffusion structure of popular content. After selecting a story, users can view a time-lapse video of how the story spread from one user to the next, identify which users were particularly influential in the process, and examine the chain of tweets along any path in the diffusion cascade. The science and technology behind ViralSearch can help identify topical experts, detect trending topics, and provide virality metrics for a variety of content.
Today, mobile users decide which business to visit next based only on distance information, stale business reviews, and old ratings. But because users need to decide what to do next, real-time information about the business — such as the current occupancy level, the music level, and the type or exact music playing — can be invaluable. This project proposes to crowdsource real-time business metadata through real-user check-in events. Every time a user checks into a business, this project uses the phone’s microphone and advanced signal processing to infer the occupancy level, the exact song playing, and the music and noise levels in the business. The extracted metadata either can be shown in the search results as business info or can be indexed to enable a new generation of queries, such as “crowded bars playing hip-hop music.” Using real business audio traces recorded on multiple devices, the project achieves accuracy of better than 80 percent in inferring real-time business metadata.
A strategy is proposed for mining, browsing, and searching through documents consisting of text, images, and other modalities: A collection of documents is represented as a grid of keywords with varying font sizes that indicate the words’ weights. The grid is based on the counting-grid model so that each document matches in its word usage the word-weight distribution in some window in the grid. This strategy leads to denser packing and higher relatedness of nearby documents—documents that map to overlapping windows literally share the words found in the overlap. Smooth thematic shifts become evident in the grid, providing connections among distant topics and guiding the user’s attention in search for the spot of interest. Images and other modalities are embedded into the grid, too, providing a multimodal surface for interactive, touch-based browsing and search for documents. An example can be found in browsers of four months of CNN news, cooking recipes, and scientific papers.
The natural user interface meets big data meets visualization: SandDance is a web-based visualization system that exploits 3-D hardware acceleration to explore the relationships between hundreds of thousands of items. Arbitrary data tables can be loaded, and results can be filtered using facets and displayed using a variety of layouts. Natural-user-interaction techniques including multitouch and gesture interactions are supported. Learn more »
Kinect Fusion enables high-quality scanning and reconstruction of 3-D models using just a handheld Kinect for Windows sensor. The implementation leverages C++ Accelerated Massive Parallelism, enabling support for a variety of graphics hardware. Simple samples are demonstrated to get developers up to speed with 3-D scanning.
We are heading toward a world of a “society of appliances,” where every connected device can use its strengths and complement each other’s. At the same time, large displays are becoming ubiquitous. Soon everyone potentially could have a large office display. This project addresses two important things in the context of an augmented office: 1) when the user is close to the large display, a new user experience designed for large displays, with commands appearing directly next to the finger in combination with a pen. 2) when the user is far from the large display: a model that shows that the phone can be used as a proxy for a large display, whether it is used as a remote mouse or keyboard for digital inclusion; an extension in the context of the current experience, such as a palette for a painting application; or as a device to initiate document sharing on a large display.
This project features a device that enables the natural visual and haptic exploration of a 3-D data set. It is the start of an investigative research tool that will enable the exploration of various natural touch interactions in 3-D with both visual and haptic feedback. A table-top system enables the user to explore a 3-D data set in X, Y, and Z with natural touch interactions. The X and Y interactions come via X and Y touch interaction on the screen, visually scrolling in X and Y through the data set. As the user naturally explores in depth, a gentle push on the touch screen physically moves the screen in Z with appropriate video rendering at the appropriate XY cutting plane. At appropriate Z positions, haptic detents and other Z-axis force feedback will be rendered as the user explores along the Z axis.