Enabling interaction between mixed reality and robots via cloud-based localization

Published October 28, 2020

By Jeffrey Delmerico , Senior Scientist Helen Oleynikova , Senior Scientist Juan Nieto , Principal Research SDE Marc Pollefeys , Partner Director of Science

Share this page

As of November 2024, the Azure Spatial Anchors (ASA) service has been retired (see announcement (opens in new tab)). The service and SDK are no longer available at the links below.

You are here. We see some representation of this every day—a red pin, a pulsating blue dot, a small graphic of an airplane. Without a point of reference on which to anchor it, though, here doesn’t help us make our next move or coordinate with others. But in the context of an office building, street, or U.S. map, “here” becomes a location that we can understand in relation to other points. We’re near the lobby; at the intersection of Broadway and Seventh Avenue; above Montana. A map and an awareness of where we are in it is important to knowing where here is and what we have to do to get there.

The answer to “Where am I?” is important to us as humans, but having this spatial intelligence is also a key capability for digital devices. Understanding where they are and what is around them lets them bridge the digital and physical worlds and use that digital information to do more in the real world. Mixed reality facilitates this connection between the digital and physical in a host of different ways—from enabling the visualization of digital data over the real world to simulating interactions with virtual objects in a realistic way. Mixed reality devices such as Microsoft HoloLens and mixed reality–capable mobile devices are able to build visual maps of their environments and recognize their place in them. Then using these maps, they’re able to create and maintain holograms and other digital content in the correct places in the real world over time. Since the release of Azure Spatial Anchors (opens in new tab), a cloud-based localization service, the ability to localize to a space not only across time but also across devices has become more available and easier, making it possible for multiple people with different devices to localize to the same environment and see the same digital content persistently in the same place. We’re excited to make available the Azure Spatial Anchors Linux SDK (opens in new tab), a special research software release for mobile robot use cases. With the Azure Spatial Anchors Linux SDK, robots can now use Azure Spatial Anchors to localize and share information within this mixed reality ecosystem.

SDK Azure Spatial Anchors Linux SDK ROS Wrapper

The Azure Spatial Anchors Linux SDK is compatible with Ubuntu and the Robot Operating System (ROS) to easily enable the robotics research community to begin exploring novel applications of robotics that utilize mixed reality. Researchers can use the SDK, which allows robots with an onboard camera and a pose estimation system to access the service, to localize robots to the environment, to other robots, and to people using mixed reality devices, opening the door to better human-robot interaction and greater robot capabilities.

Azure Spatial Anchors: How it works

To create digital content in the places that users expect to see it and then keep it there over time, HoloLens and other mixed reality devices need to estimate how they’re moving through the world using visual simultaneous localization and mapping (SLAM). By tracking salient feature points in a sequence of images from their onboard cameras and fusing that with inertial measurements, mixed reality devices can both estimate how they’re moving and build a sparse local map of where these feature points are in 3D. Android and iOS mobile devices utilize the same type of visual SLAM algorithms—via ARCore and ARKit, respectively—to render augmented reality content on screen, and these algorithms produce the same kind of sparse maps as mixed reality devices.

Azure Spatial Anchors (ASA) works by taking these sparse local maps from devices and matching them to larger, global maps in the cloud. In addition to the 3D feature points, these global maps consist of descriptors computed at the feature points, which enable devices to recognize that they’re seeing the same features when they observe that spot again. When an individual captures feature points at a location in the world and adds those to global maps in the cloud, they define a coordinate system relative to the local map. This coordinate system allows mixed reality apps to attach spatial data to that physical place. We call this coordinate system a spatial anchor because it provides an anchor for digital content in the real world and enables this content to persist there over time.

When a mixed reality device observes the same place in the world at some later time and the device queries ASA with a local map of the place, some of the feature points in the query map should match the ones in the cloud map, which allows ASA to robustly compute a relative six-degree-of-freedom pose for the device using these correspondences. Knowing the relative pose of the device to the anchor coordinate system enables all of the spatial data attached to that anchor to be displayed at the correct place in the physical world. If more than one mixed reality device localizes to an anchor at the same time, they can each visualize the same digital information but from their own perspective while looking at the scene. This effectively colocalizes the devices to each other indirectly by sharing the coordinate frame of the anchor.

Spatial anchors—and the opportunities they present—for robotics

Mobile robots are solving the same problem as HoloLens and mixed reality–capable mobile devices: estimating how they—and their sensors—are moving in a particular environment. This makes mobile robots a natural fit for ASA. Enabling robots to localize to spaces will offer them the ability to access data connected to Spatial Anchors. For example, a robot inspecting an industrial site could access information about a particular machine if it localizes to a spatial anchor next to the machine This ability becomes even more powerful in multi-robot scenarios. If two robots are localized to the same spatial anchor, they can automatically share a common reference frame without explicit colocalization such as tracking each other with fiducial markers. Several robots working in the same environment could be assigned tasks based on their locations; for example, order pickups in a warehouse could be determined based on which robot is closest to the desired inventory. In addition to utilizing spatial anchors, robots with the ASA Linux SDK can create them. Mapping an environment and populating it with spatial anchors can be automated using a robot with this SDK, improving efficiency and helping to expand and improve the global map in the cloud.

Intern Oswaldo Ferro, using a HoloLens 2 device, has placed a spatial anchor, and now a robot with an onboard camera is able to localize to this anchor. With the HoloLens and robot both localized to the same anchor, they’re effectively colocalized by sharing the anchor’s coordinate system.

Enabling robots to colocalize with different types of devices, especially mixed reality devices and mixed reality–capable devices, opens up new opportunities for research and innovation in human-robot interaction. We envision mixed reality as an important tool for robot spatial intelligence and autonomy, and our ambition is to unite humans and robots through mixed reality in ways that result in improved teamwork. In the same way that colocalization of two robots enables them to share spatial data and collaborate by having a common reference frame, robots colocalized with mixed reality devices can interact with contextual data in a way that both humans and machines can understand. This unlocks more intuitive interaction, such as individuals using a HoloLens employing a “come here” gesture to call a robot over rather than having to teleoperate the robot or translate their position as a navigation goal to the robot’s frame of reference.

Once a robot is colocalized with other devices, human-robot interaction becomes much more natural because spatial information can be easily shared. Here, the robot and HoloLens 2 are colocalized through the same spatial anchor, and the HoloLens leverages its hand-tracking capabilities to recognize gestures such as “come here” using research on action recognition from colleagues Federica Bogo and Jan Stühmer (who has since left Microsoft) at the Mixed Reality and AI Lab in Zürich. Since the robot and HoloLens share a coordinate system through ASA, the location of the HoloLens is directly understandable to the robot, and the “come here” gesture triggers it to plan a path from its location to just in front of the HoloLens user.

The ASA Linux SDK is being released in two parts—closed-source binaries and an open-source ROS wrapper—and is targeting the Ubuntu 18.04 and 20.04 distributions. This release is for research use only and may not be used commercially. For HoloLens, Android devices, and iOS devices, applications handle pose estimation, via HoloLens head tracking, ARCore, and ARKit, respectively, as well as anchor localization. These pose estimation processes are tightly coupled with their ASA SDKs and so applications can only use the camera tracking system of their respective devices; individuals aren’t free to run their own SLAM algorithms while using ASA. Because of the diversity of robot sensor configurations, for this new SDK, people need to provide the camera’s pose independently via some other pose estimation process. This can happen directly, as in the case of a robot navigating with visual SLAM that localizes to an anchor with the same camera. Another example would be a LIDAR-based ground robot—also equipped with a camera—navigating in a 2D map and leveraging the transformation from the robot base to the camera to estimate the camera pose in the world frame. In addition to estimating the pose independently, the SDK requires a calibrated camera, as the user is also responsible for undistorting the images before they’re provided to the SDK with their poses.

See the ASA Linux SDK in action

The capabilities of the Azure Spatial Anchors Linux SDK are being demonstrated as part of a tutorial at the 2020 International Conference on Intelligent Robots and Systems (IROS) (opens in new tab). This year, the conference is using a virtual format featuring on-demand videos, which are available now. (The conference is also free to attend this year, so in addition to our tutorial, attendees will have access to all the papers, talks, and workshops.)

The goal of our tutorial, Mixed Reality and Robotics, is to provide resources so those without prior mixed reality experience can integrate some mixed reality tools into their robotics research. The tutorial includes several conceptual talks about human-robot interaction through mixed reality and methods of colocalization, including with the ASA Linux SDK. Several demos include sample code and video walkthroughs. On the topic of human-robot interaction, we provide a sample mixed reality app for HoloLens and mobile devices that allows those using it to interact with a virtual robot running in a simulator on a local computer. Attendees will also learn how to use the ASA Linux SDK with prerecorded datasets in a colocalization demo. These demos are intended to be deployable with minimal prerequisite software or hardware, but we also provide instruction on how to adapt both of these demos to work with attendees’ own robots. For those interested in the tutorial, register for free access to the IROS on-demand content (opens in new tab). And check out our GitHub repository (opens in new tab) for instructions on how to install the ASA Linux SDK yourself and leave us feedback.