Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

A New Spin for Photosynth

January 7, 2014 | By Microsoft blog editor

“Makes HDTV look low-res.”

“Wow, how did they make this?”

“Hyper-detailed and actually looks 3-D.”

These are just some of the words reviewers have used to describe the new Photosynth. A technical preview, launched December 10, enables photography enthusiasts to create even more realistic 3-D views of objects and locations captured through still photos.

A user can upload multiple images of the same object or physical space to the Photosynth cloud service, which then processes the images into a “synth”—a composite of overlapping photographs that create a 3-D model of the space, with added depth and transitional images for smooth 3-D viewing.


Photosynth turns ordinary photos into 3-D views that travel along or spin around an axis.

Photosynth stitches together the photos to create spin, panorama, walk, or wall synths that draw the viewer along a path or a spin around an axis. It’s obvious there’s sophisticated technology at work here. Less apparent is the extent and duration of the relationship between Microsoft Research’s Interactive Visual Media (IVM) group and the Photosynth product team.

It Began with Photo Tourism

“The first bit of ground-breaking work was back in 2006 with Photo Tourism,” says Eric Stollnitz, IVM principal developer. “It began as a collaboration between Noah Snavely and Steve Seitz at the University of Washington and my colleague Richard Szeliski. The goal was to take images of the same landmark from the web—Paris’ Notre-Dame or the Eiffel Tower, for example—taken by different photographers and at different times, and combine them to produce a 3-D visualization of a landmark. It was a photo-crowdsourcing idea.”

A Microsoft product group refined Photo Tourism and, in 2008, launched Photosynth, a desktop product that stitched together a user’s collection of images into a 3-D model that could be uploaded to the Photosynth website for sharing.

Photosynth was enhanced in 2010 with the incorporation of Image Composite Editor (ICE), an advanced image stitcher that combines a set of overlapping photographs to create a seamless, 360-degree panorama at full resolution.

“The Bing team was really excited about this,” Stollnitz recalls, “and they added a panorama-viewing feature to the website, while the IVM team added the ability for ICE to export panoramas to the Photosynth desktop application, which meant Photosynth could upload your panorama to the website for sharing.”

In quick succession, Bing launched its Mobile Panoramas capture-and-display application for iOS devices in 2011 and added a Windows Phone app and improved social sharing in 2012. And now, the new Photosynth delivers even more 3-D realism, thanks to innovative work on transitions and navigation.

The Parallax View

Transitions are reconstructions of plausible views between actual photographs, synthesized to fill in gaps and provide smoother movement. As part of the IVM’s Spin project, a 2009 paper titled Piecewise Planar Stereo for Image-based Rendering—by researcher Sudipta N. Sinha; Drew Steedly, principal development manager at Microsoft; and Szeliski—proposed a way to create more realistic transitions when moving from photograph to photograph.

“They decided to do something about synths that looked as though they’d been projected onto a single plane,” Stollnitz explains. “Essentially, we were showing flat screen after flat screen joined together. But in real life, you’d see objects from different angles and depths as you move along. We had to fill in more information gaps.”

The answer to more realistic transitions was to use computer-vision techniques to calculate the depth of each pixel. This required analyzing each pair of overlapping photographs and comparing objects that appeared in both images to determine how far they were from the camera.


Photosynth’s internal representation of this synth includes coarse 3-D surfaces culminating in a vanishing point at the end of the walk.

“Objects that shift just a little bit from one image to the next image are farther away,” Stollnitz says, “while objects that have shifted quite a bit from one image from to the next are closer in. It’s all about parallax. Sudipta really focused on this challenge. He’s a master at this ‘depth from stereo’ technique.”

With a depth calculation for every pixel, the researchers were able to simplify the information and construct 3-D surfaces from a relatively small number of planes. By projecting images onto these coarse 3-D approximations instead of a single plane, the team created transitions that are far more immersive, with different depths and angles.

Spinning Through 3-D

Another enhancement the Spin project brought to Photosynth was simpler navigation. Depending on the number of images and their relationship to the physical space, there could be many potential paths through which to travel in a synth.

“That means many 3-D relationships,” Stollnitz says, “which is ultimately very powerful, but it can also be overwhelming and confusing if you’re able to rotate as well as move forward and back, left and right, up and down as you’re making your way through a collection. Furthermore, in situations where photos are only loosely connected to each other, it becomes difficult aligning points in each image and handling the transition in a smooth manner if you try to accommodate every possible path.”

Instead, researcher Johannes Kopf simplified navigation and, in doing so, also enhanced the 3-D experience. By restricting navigation to a circular path—either an outward-looking “panorama” from a fixed spot or a “spin” around an object—transitions between photos became much smoother.

“The computer-vision work for pixel depth was that much easier,” Stollnitz says, “and projections onto the 3-D surfaces looked really good. It was definitely a ‘wow’ moment when we saw how well these two approaches came together. The transitions in these synths offer a much more realistic sense of depth.”

The “spin” navigation and depth from stereo also impressed the Bing team, which immediately added the new technology to Photosynth and suggested two more synth scenarios: “walk,” for images taken with the camera moving forward into a scene, and “wall,” in which the camera takes images perpendicular to the direction of movement.

Another key decision was to host the new Photosynth in the cloud. Both the Bing and IVM teams felt the amount of processing demanded for 3-D spins would take too much time on smaller devices, so they re-engineered the technology to run on Microsoft Azure and to handle processing for thousands of incoming photo collections, a feat Stollnitz considers a significant engineering accomplishment.

“Photography—particularly travel photography—is one of my hobbies,” he says. “My wife and I always come back from trips with tons of photos. Collaboration with the Photosynth product team has been particularly satisfying, not just because I get to indulge my passion for visual media, but also because we’ve been able to amplify the ideas of the original research work and develop a robust, powerful, yet easy-to-use tool and viewing experience. And now, by releasing Photosynth as a cloud-based service, we’ve made it possible for just about anybody to use our technology.”

Close, Ongoing Collaboration

Stollnitz has been the interface between the IVM and Photosynth teams, responsible for product contributions, making code reliable and usable not just for research purposes but also solid enough to provide to product developers.

“A number of researchers at IVM have been involved with Photosynth,” Stollnitz says. “Rick with Photo Tourism and the first generation of Photosynth, for example, and Matt Uyttendaele with ICE. They’ve been great to have as technical advisers for this latest generation of Photosynth.

“On the Photosynth product team, David Gedye has been the lead program manager since the beginning, so we’ve had that continuity of vision over a multiyear collaboration, which we feel has been very productive.”

Gedye feels the same.

“Over the years,” he says, “the IVM group has moved beyond contributing just ideas and prototypes. Now, they’re providing shipping-quality code and development support. Just as an example, with the new Photosynth, Sudipta contributed the core computer-vision algorithms, made important technical-design contributions, and came to every product meeting. He’s been one of the driving forces behind this release. We feel very tightly coupled with the research team.”

The process culminating in the new Photosynth, Szeliski observes, has been a delight to behold.

“When you look back,” he says, “and watch how Photosynth capabilities have evolved, it’s terrific how different members of the IVM have contributed a diverse set of research ideas, including Sudipta’s foundational work in 3-D reconstruction, Johannes’ work on image-based rendering and 3-D navigation, and Eric’s work on user interfaces and cloud services. The Spin project, which produced all of these breakthrough features, also represents our closest working relationship yet with the Photosynth product team.”

The fruits of that partnership are now yours to enjoy. Sign up, and you’ll receive confirmation and access within 24 hours.

Up Next

Artificial intelligence, Computer vision, Graphics and multimedia

Microsoft HoloLens facilitates computer vision research by providing access to raw image sensor streams with Research Mode

Microsoft HoloLens is the world’s first self-contained holographic computer. Remarkably, in Research Mode, available in the newest release of Windows 10 for HoloLens, it’s also a potent computer vision research device. Application code can not only access video and audio streams but can also at the same time leverage the results of built-in computer vision […]

Marc Pollefeys

Partner Director of Science

Artificial intelligence, Computer vision, Graphics and multimedia

ChatPainter: Improving text-to-image generation by using dialogue

Generating realistic images from a text description is a challenging task for a bot. A solution to this task has potential applications in the video game and image editing industries, among many others. Recently, researchers at Microsoft and elsewhere have been exploring ways to enable bots to draw realistic images in defined domains, such as […]

Microsoft blog editor

Microsoft Pix before and after panoramic photo of Miners Landing

Artificial intelligence, Graphics and multimedia

New Microsoft Pix features let you take bigger, wider pictures and turns your videos into comics

Microsoft has released two new features with today’s update to Microsoft Pix for iOS, an app powered by a suite of intelligent algorithms developed by Microsoft researchers to take the guesswork out of getting beautiful photos and videos. The first of these features, Photosynth, helps create photos that take in more of the perspective or […]

Nicky Budd-Thanos