Each year during TechFest, Microsoft Research displays a collection of cutting-edge research projects that offer new functionalities for Microsoft products and, often, for the greater research ecosystem. Many of those projects are discussed below.
One of the biggest challenges facing teachers in a classroom is to gauge whether students are keeping up with the lesson. This challenge is especially acute in distance-education programs, because of the physical separation between students and teachers. This project delivers a new, low-cost technique for instantly polling students in the classroom. The approach enables teachers to ask a multiple-choice question to the class. Students respond by holding up a sheet of paper, which has a printed code, similar to a QR code, which encodes their answers, as well as their student IDs. A webcam automatically recognizes the response and using computer vision technology aggregates the responses for immediate evaluation by the teacher. Initial trials in schools in Bangalore, India show the system is as accurate as a written test, as fast as a show of hands, and is at least 10 times cheaper than alternative electronic solutions.
•Telepresence using Wedge Technology: Glassless 3-D display with a correct camera pose and view pose for a live view-dependent 3-D Window Telepresence experience.•Behind the Screen Overlay Interactions: Behind-the-screen interaction with a transparent OLED with view-dependent, depth-corrected gaze.•Seeing Displays: Uses flat lenses (wedge) to see through a semi-transparent OLED for novel above screen gesture and scanning interactions.•High-Performance Touch: A touch-display system with two orders of magnitude less latency than current systems.•Mayhem: A freely available, open source Windows application that lets almost anyone use their computer to do stuff automatically across all their devices. Just select an event (e.g. your favorite stock hit a trigger value, a change in the weather, say something to your Kinect, etc.) and then select a reaction (e.g. advance a PowerPoint slide, turn on a lamp, start playing a movie, etc.), and within seconds, you have a connection running.
Intensive interest exists in intelligent online 3-D exploration and navigation in urban areas. Thus, understanding the 3-D structures of urban areas from captured images or videos becomes indispensable. Automatic Building Parsing in Urban Areas is a tool that can automatically detect the façades in a single image or in multiple images. Beyond locating the position, it also can compute the geometry of each façade—the orientation of the plane— without human interaction. The tool is implemented using our newly developed Robust Principal Component Analysis SDK and requires little response time. The interactive speed is demonstrated by a touch-based application that lets users dive into a 3-D tour of an urban area from a single image. This tool directly benefits the navigation and smooth transition of bird’s-eye-view images and could become a fundamental tool for many applications in urban areas. Learn more >>
The Bing home page provides teaser captions for an interesting image in the form of Bing tiles. The images are chosen carefully and the captions are written by a person to make it interesting. The Automatic “Text Pop-Up” for Web Images application automatically generates similar text descriptions for a large fraction of the most popular images on the web. At the core of the system is an offline text-extraction process, in which the application mines the web for meaningful captions that relate to a given image. During this, the application checks sentence semantics for relevancy, diversity, and optimal structure, and performs content filtering. The results are indexed in a database. The front end of the application is integrated into the Bing Toolbar in IE. Whenever a user navigates to a webpage, the application queries the database and overlays text descriptions for the images on the webpage in the form of Bing tiles: the text pop-up.
Beamatron is a new, augmented-reality concept that combines a projector and a Kinect camera on a pan-tilt moving head. The moving head is used to place the projected image almost anywhere in a room. Meanwhile, the depth camera enables the correct warping of the displayed image for the shape of the projection surface and for the projected graphics to react in physically appropriate ways. For example, a projected virtual car can be driven on the floor of the room but will bump into obstacles or run over ramps. As another application, we consider the ability to bring notifications and other graphics to the attention of the user by automatically placing the graphics within the user’s view.
The goal of this project is to leverage the opportunity for enterprise applications to significantly benefit in novel ways from our Bing data assets, specifically, query logs, web crawl and social media data. This project illustrates the progress made so far. It identifies key Azure data services that have the potential to be widely useful for enterprises by leveraging the combination of Bing data assets, the Microsoft cloud computing infrastructure and deep data analytics. To bring home the opportunities, the project shows how Microsoft’s enterprise software can leverage these data services, and illustrates Bing-enabled enhancements that SharePoint Search and Microsoft Office products and services can potentially leverage.
There are thousands of digital libraries, archives, collections and repositories and no easy way to find these datasets for teaching, learning and research. To truly bridge humanities and sciences and pull them out of their silos we need a dynamic cloud based data visualization tool where educators, researchers and students can easily consume, compare and understand the history of the cosmos, earth, life and humanity. Where they can easily consume rich media sets like: audio, video, text, pdfs, charts, graphs and articles in one place and discover new possibilities. ChronoZoom will enable:
Transitioning effortlessly between scales of one year to billions of years.
Putting historical episodes, events, and trends in context without sacrificing precision.
Comparing vast amounts of time-related data across different fields and disciplines.
Gaining insight and the ability to shape the future by better understanding the cause-and-effect interplay between disciplines.
A still photograph is a limited format for capturing moments that span an interval of time. Video is the traditional method for recording durations of time, but the subjective “moment” that one desires to capture is often lost in the chaos of shaky camerawork, irrelevant background clutter, and noise that dominates most casually recorded video clips. This work provides a creative lens used to focus on important aspects of a moment by performing spatiotemporal compositing and editing on video-clip input. This is an interactive app that uses semi-automated methods to give users the power to create “cliplets”—a type of imagery that sits between stills and video from handheld videos. Learn more >>
A huge amount of climate data is available, covering the whole of the Earth surface. But even the experts find it ludicrously difficult to get the climate information they need: locate data sets, negotiate permissions, download huge files, make sense of file formats, get to grips with yet another library, filter, interpolate, regrid, etc! Enter FetchClimate, a fast, intelligent climate-data-retrieval service that operates over Windows Azure. FetchClimate can be used through a Silverlight web interface or from inside any .NET program. FetchClimate works at any grid resolution from global to a few kilometers, in a range of years from 1900 to 2010, on days within a year, and for hours within a day. When multiple data sources could answer your query, FetchClimate automatically selects the most appropriate, returning the requested values along with the level of uncertainty and the origin of the data. The entire query can be shared as a single URL, enabling others to retrieve the identical information. Learn more >>
This project presents next-generation webcam hardware and software prototypes. The new prototype webcam has an extremely wider view angle than traditional webcams and can capture stereo movie and high-accuracy depth images simultaneously. Users can chat with stereoscopic video. Accurate depth-image processing can support not only all Kinect scenarios on a PC, but also a gesture-control user interface without a touch screen. Besides computer vision, the webcam includes a hardware accelerator and a new image-sensor design. The cost of the design is similar to that of current webcams, and the webcam potentially could be miniaturized as a mobile camera. The project showcases new user scenarios in playing games with this webcam.
High-Fidelity Facial-Animation Capturing presents a new approach for acquiring high-ﬁdelity, 3-D facial performances with realistic dynamic wrinkles and ﬁnely scaled facial details. This approach leverages state-of-the-art motion-capture technology and advanced 3-D scanning technology for facial-performance acquisition. The system can capture facial performances that match both the spatial resolution of static face scans and the acquisition speed of motion-capture systems.
Holoflector is a unique, interactive augmented-reality mirror. Graphics are superimposed correctly on your own reflection to enable an augmented-reality experience unlike anything you have seen before. It also leverages the combined abilities of Kinect and Windows Phone to infer the position of your phone and render graphics that seem to hover above it.
IllumiShare enables remote people to share any physical or digital object on any surface. It is a low-cost, peripheral device that looks like a desk lamp, and just like a lamp lights up a surface at which it is pointed, IllumiShare shares a surface. To do this, IllumiShare uses a camera-projector pair where the camera captures video of the local workspace and sends it to the remote space and the projector projects video of the remote workspace onto the local space. With IllumiShare, people can sketch together using real ink and paper, remote meeting attendees can interact with conference room whiteboards, and children can have remote play dates in which they play with real toys. Learn more >>
Language-Learning Games on WP7 and Kinect is a language learning project focusing on how to facilitate delightful “edutainment” experiences across a range of Microsoft platforms:
SpatialEase: An Xbox 360 Kinect game for learning the language of space using “embodied” learning that connects language with thought and action. The learner must quickly interpret second-language commands, such as the translation of “move your left hand right,” and move his or her body accordingly.
Tip Tap Tones: A Windows Phone game for learning Chinese sounds—a highly effective mobile game for retraining the ears and the brain to perceive tonal Chinese syllables quickly and accurately.
Polyword Flashcards: Cloud flashcards with integrated skill-based games. Based on our adaptive learning algorithm, transferred to Bing Dictionary, we have created an HTML5 platform for deeply personalized learning that blends language study, gaming, and discovery. Learn more >>
Microsoft Translator Hub implements a self-service model for building a highly customized automatic translation service between any two languages. Microsoft Translator Hub empowers language communities, service providers and corporations to create automatic translation systems, allowing speakers of one language to share and access knowledge with speakers of any other language. By enabling translation to languages that aren’t supported by today’s mainstream translation engines, this also keeps less widely spoken languages vibrant and in use for future generations. This Azure based service allows users to upload language data for custom training, and then build and deploy custom translation models. These machine translation services are accessible using the Microsoft Translator APIs or a Webpage widget. Learn more >>
This project explores ways for people to experience search that are complementary to fast, relevant search in response to queries. In particular, these concepts focus on new ways to spend time, rather than save time on the Web. Project components include:
An organic kind of search that presents results which grow over time, drawing attention to the things you are most passionate about.
A way of picturing and encapsulating search journeys so you can get pleasure from the voyage, as well as the destination.
A way of packaging up search results so that they are collectable and can be given to others.
As an ensemble, these demos emphasize self-expression and the creative use of search results over seeking and finding. They also focus on the importance of the search journey rather than the speed of delivering a result, recognizing that users often want to wander and explore the Web rather than quickly dip into it. Learn more >>
A voice user interface needs to output responses in text-to-speech (TTS) synthesized speech. Sometimes, it is desirable to have the response in mixed languages; in a foreign country, it would be convenient if a car-navigation system not fluent in that particular foreign language could hear instructions in mixed codes—entities such as street names synthesized in the local language and routing directions in the user’s native language. The mixed-coded TTS can be built by a truly bilingual speaker, but it is usually difficult to find such a person. This project shows a new approach in turning monolingual TTS into a multilingual one. From a speaker’s monolingual recordings, the algorithm can render speech sentences of different languages for building mixed-coded, bilingual TTS systems. Recordings of 26 languages are used to build the TTS of corresponding languages. By using this new approach, we can synthesize any mixed-language pair out of the 26 languages. Learn more >>
This project is a depth-sensing and projection system that enables interactive multitouch applications on everyday surfaces. Beyond a shoulder-worn system, there is no instrumentation of the user or the environment. Foremost, on such surfaces—without calibration—Wearable Multitouch Interaction provides capabilities similar to those of a mouse or a touchscreen: X and Y locations in 2-D interfaces and whether fingers are “clicked” or hovering, enabling a wide variety of interactions. Reliable operation on the hands, for example, requires buttons to be 2.3 centimeters in diameter. Thus, it is now conceivable that anything one can do on today’s mobile devices can be done in the palm of a hand.
The idea of natural user interfaces has motivated researchers to discover new modalities of interaction such as gesture, voice and touch. Through various demonstrations, we explore how one particular interaction mechanism, Kinect-based gestural interaction, can open up new experiences in different ways and in different contexts. In one demonstration, we show how a touchless system for 3D image use in vascular surgery requires a constrained space for gestural movements. In another, we show how Kinect technology can open up new interactions in the dark, for example helping us to ‘feel’ an invisible shape through sound feedback. Whether such demonstrations are natural is open for debate, but there is no doubt that these new user experiences can fire the imagination as to what the possibilities for interaction may be in future. Learn more >>