The need

Correctly capturing an object’s position, orientation, and identity is a major challenge—without prior information, stereo optics, or measurements—it can be hard to measure scale or distance, and object recognition requires a large labeled dataset.

The idea

Convolutional neural networks (CNN) has made significant strides in object recognition, classification, and segmentation, as used in self-driving vehicles, for example. PoseTracker leverages the power of CNN to recognize and track objects in 3D.

The solution

PoseTracker uses a patented optical marker approach to infer an object’s pose from 2D images, then tracks the position from one image to all subsequent images—based on comparisons to a predefined 3D orientation.

Technical details for PoseTracker

Convolutional neural networks, a class of deep neural network, has made significant strides in the recent years in terms of object recognition, classification and segmentation leading to significant development in self driving vehicles and a great variety of computer vision application.

However, there have been very few practical implementations of these advanced approaches in object 3D pose estimation. The ability to recognize and track the object in the 3D reference space is still a difficult problem to resolve due to some several challenging issues:

  1. The 3D pose information is hard to capture, requiring complicated setups involving stereo optical or magnetic localization apparatus.
  2. The lack of prior information about the object of interest.
  3. A labeled dataset with the proper pose information is very hard to obtain in large quantity. The traditional image manipulation like axis scaling and transformations will inevitably corrupt 3D pose information.

The idea is to leverage the power of CNN and implement an application to recognize and track the pose (position and orientation) of objects in 3D with a patented optical marker that will help to identify the rotation and estimate the pose of the object.

PoseTracker is a proof of concept for a simple object pose detection pipeline, integrated with rotation information based on a 3D pose tracking solution (an optical marker).

The application analyzes the 2D images taken from a camera with the optical marker always visible. The application, with a supervised training, detects the marker, that infers its orientation information from one image to all subsequent images based on comparison to a predefined 3D orientation.

This different approach to solve the pose tracker issues will help in the future, to use your phone camera get the angle, orientation, and distance that an object is from you in real time.

Resources:

Projects related to PoseTracker

Browse more business scenario projects

Explore the possibilities of AI

Jumpstart your own AI innovations with learning resources and development solutions from Microsoft AI.