Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Envisioning privacy preserving image-based localization for augmented reality

June 13, 2019 | By Microsoft blog editor

New camera localization technology for sensitive environments can keep images and map data confidential

Advances in augmented reality (AR) and mobile robotics promise to revolutionize how we see and interact with our physical world in the future. Today, AR and mixed reality (MR) devices, in both smart phone or eyeglass form factors, superimpose digital content relating to the world around us. To accomplish this, MR devices need to know their precise location in relation to the physical world. This is known as camera localization (or camera pose estimation) and is a core task in MR, drones, self-driving cars, and mobile robotics. Because GPS does not function indoors and is not accurate enough for next generation mixed reality and autonomous platforms, MR devices must determine their indoors position using images from device cameras. Camera localization techniques require access to a 3D digital map of the scene, that is often stored persistently. Although images are not stored along with these maps, entities with access to these maps would sometimes be able to infer sensitive information about the scene. This sensitive information could include knowledge about the geometry, appearance, and layout of private spaces and knowledge about objects contained in that space.

As companies and organizations race to build the required “MR Cloud” infrastructure for these MR systems, the general public has become increasingly concerned with the privacy and security implications of using MR in sensitive environments such as homes, offices, hospitals, schools, and industrial spaces or confidential facilities. Yet in view of wider adoption of MR technologies, surprisingly little attention has been given to the critical nature of the privacy and security implications of MR.

A team of scientists at Microsoft and academic collaborators have been investigating new algorithmic techniques to address these privacy implications. Their pioneering ideas are described in two papers to be presented at the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2019) June 16-20 in Long Beach, California. In their first paper, they show for the first time that 3D point clouds and features required for camera localization are prone to a new type of privacy attack. This work aims to alert the community of new privacy implications of saving 3D maps of the environment—implications that are more serious than what is currently assumed. In the second paper, the team formulated a new research problem – the privacy preserving image-based localization problem and presented the first solution for this problem. Their solution involves geometrically transforming the 3D points in a way that conceals the scene geometry, defends against the new privacy attack, but importantly still allows one to efficiently and accurately compute camera pose from images. Let’s take a closer look.

Figure 1: (a) Traditional image-based localization methods require 3D point cloud map of the scene. Such point clouds reveal potentially confidential scene information. (b) A new privacy attack on 3D point cloud maps can reconstruct detailed images of the scene. Such reconstructed images can reveal the scene appearance in much higher detail. In this example, the reconstructed image is virtually indistinguishable from an image captured by a camera from a similar viewpoint.

Figure 1: (a) Traditional image-based localization methods require 3D point cloud map of the scene. Such point clouds reveal potentially confidential scene information. (b) A new privacy attack on 3D point cloud maps can reconstruct detailed images of the scene. Such reconstructed images can reveal the scene appearance in much higher detail. In this example, the reconstructed image is virtually indistinguishable from an image captured by a camera from a similar viewpoint.

To alert the community about the privacy implications, in “Revealing Scenes by Inverting Structure from Motion ReconstructionsFrancesco Pittaluga and Sanjeev J. Koppal of the University of Florida along with Sing Bing Kang and Sudipta N. Sinha of Microsoft Research demonstrate that 3D point clouds of scenes reconstructed using structure from motion or simultaneous localization and mapping (SLAM) techniques retain enough information even after the source images are discarded, to reconstruct detailed images of the scene. Their attack uses a deep neural network to reconstruct color images of the scene. Given 2D projections of sparse 3D points and their SIFT features in a specific viewpoint, the network outputs a color image of the scene from that viewpoint (see Figure 1.) The underlying model consists of multiple U-Nets, a common architecture for neural networks used in computer vision. Surprisingly, the model reconstructs recognizable images even when certain key attributes of the 3D points such as color, visibility etc. are absent or when the points in the map are sparse and irregularly distributed. The attack is demonstrated on a wide variety of scenes and the privacy implication of storing various point attributes is methodically analyzed in the paper. Finally, the team demonstrated that even novel views of the scenes could be reconstructed and that virtual tours of the private spaces could be generated.

Figure 2: The 3D line cloud representation (shown on the right) protects user privacy by concealing the scene geometry. Privacy attacks are thwarted while accurate and efficient camera localization is still possible.

Figure 2: The 3D line cloud representation (shown on the right) protects user privacy by concealing the scene geometry. Privacy attacks are thwarted while accurate and efficient camera localization is still possible.

In parallel, the team has been investigating new camera pose estimation algorithms that will safeguard the user’s privacy. While there are some existing approaches for recognizing objects in images and videos in a privacy-aware manner, those methods cannot be used for camera pose estimation or other 3D computer vision tasks used in MR and robotics. In “Privacy Preserving Images-Based Localization”, Kang and Sinha, along with Pablo Speciale, of ETH Zurich, along with Johannes Schönberger and Marc Pollefeys of the Microsoft Mixed Reality & AI lab in Zurich (Pollefeys is also with ETH Zurich), asked how disclosing confidential information about captured 3D scenes might be avoided yet while allowing reliable camera pose estimation. They developed the first solution for privacy-preserving image-based localization. The key idea in their method is lifting 3D points in the map to randomly oriented 3D lines that pass through the original points. Subsequently the 3D points are discarded. This new map representation, referred to as a 3D line cloud, resembles a spaghetti of random straight lines and the scene cannot be discerned anymore (see Figure 2.) Remarkably, line clouds still allow the camera location to be found accurately, because the correspondence between a 2D image point and a 3D line provides sufficient geometric constraints to enable robust and accurate camera pose estimation.

The researchers have evaluated their method in many settings and demonstrated the high practical relevance of their approach. Microsoft is excited that this technology gives it the capability to address some of the user and enterprise privacy concerns in MR. This technology might allow users to share maps with 3rd party applications without compromising privacy. The danger of location spoofing with unauthorized user inadvertently gaining access to sensitive maps would also be mitigated. The team is also investigating privacy preserving algorithms for camera localization services that will run on the cloud. There, the MR device will upload information extracted from images to the cloud service to compute its position. If the camera on the MR device accidentally records other people (or transient objects) in the scene, the new techniques being explored will prevent the cloud from inferring who or what was observed in the user’s image.

The researchers look forward to demonstrating these ideas and getting your input at CVPR 2019 later this month. In the meantime, they invite you to read the papers and provide feedback.

Up Next

Artificial intelligence, Computer vision

Holograms, spatial anchors and the future of computer vision with Dr. Marc Pollefeys

Episode 71, April 10, 2019 - On today’s podcast, Dr. Pollefeys brings us up to speed on the latest in computer vision research, including his innovative work with Azure Spatial Anchors, tells us how devices like Kinect and HoloLens may have cut their teeth in gaming, but turned out to be game changers for both research and industrial applications, and explains how, while it’s still early days now, in the future, you’re much more likely to put your computer on your head than on your desk or your lap.

Microsoft blog editor

Computer vision

Giant steps and liberating spaces – Virtual Reality is making cool moves

Cool innovations are happening in how virtual reality researchers are resolving natural locomotion challenges and how they relate to story space, as well as in liberating users from the small, object-free player settings of today, to allow them to safely roam the real world. Virtual Reality (VR) has become familiar to many people in past […]

Microsoft blog editor

Artificial intelligence, Computer vision, Graphics and multimedia

Microsoft HoloLens facilitates computer vision research by providing access to raw image sensor streams with Research Mode

Microsoft HoloLens is the world’s first self-contained holographic computer. Remarkably, in Research Mode, available in the newest release of Windows 10 for HoloLens, it’s also a potent computer vision research device. Application code can not only access video and audio streams but can also at the same time leverage the results of built-in computer vision […]

Marc Pollefeys

Partner Director of Science