Posted by
(opens in new tab)You may already have heard about the Microsoft technology that can automatically identify objects in a picture and write an accurate caption (opens in new tab) for it, but those types of research advancements don’t occur in a vacuum.
Indeed, interdisciplinary research combining computer vision (opens in new tab), machine learning, artificial intelligence (opens in new tab), computer systems and networking (opens in new tab) are just some of Microsoft’s research areas (opens in new tab) at the core of the burgeoning field commonly referred to as “deep learning (opens in new tab).” Advancements in deep learning technology are fundamental to Microsoft’s mission to empower every person and organization on the planet to achieve more.
Deep learning also is fundamental to a bevy of research being presented this week at the 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR) in Boston (opens in new tab).
The latest breakthroughs include dramatic speed improvements that accelerate computer vision image recognition and new algorithms that improve the clarity of 3D-scanned images using Kinect or Kinect-like sensors.
In Convolutional Neural Networks at Constrained Time Cost (opens in new tab) (324 KB .pdf), lead researcher Kaiming He (opens in new tab) and principal researcher Jian Sun (opens in new tab) address the issue of time-consuming computations required by continuous advancements in computer vision image classification accuracy. They propose models that are faster and more accurate than existing fast models and also practical for widespread use.
Sun and He also collaborated with researchers from Xi’an Jiaotong University (opens in new tab) on Efficient and Accurate Approximations of Nonlinear Convolutional Networks (opens in new tab) (541 KB .pdf), which proposes a method that accelerates such networks (opens in new tab) by as much as four times with an error rate of less than 1 percent.
At CVPR, Microsoft researchers also will present advancements in 3D digitization and 3D scanning using Kinect and Kinect-like sensors.
In Large-Scale and Drift-Free Surface Reconstruction (opens in new tab) (9.6 MB .pdf), researcher Jonathan Taylor (opens in new tab) and principal researchers Andrew Fitzgibbon (opens in new tab) and Shahram Izadi (opens in new tab) collaborated with researchers from the University of Bologna to introduce a method of large-scale 3D scanning that computes in minutes, not hours, and works even in low-lighting conditions or other challenging conditions such as in complete darkness.
“As shown by the body of work at CVPR, the Kinect has accelerated research on 3D scanning to the point now where even capturing models of moving scenes or large-scale scenes is possible,” Izadi said.
Microsoft researchers also are presenting new research that significantly improves scans of objects that are in motion. In 3D Scanning Deformable Objects with a Single RGBD Sensor (opens in new tab) (10.1 MB .pdf), Taylor, Fitzgibbon and Izadi collaborated with researchers from the University of North Carolina at Chapel Hill (opens in new tab) to develop a scanning method that uses only a single Kinect sensor without heavily constraining user or camera motion.
“The logical next step is to use these models for recognition and bring the worlds of deep learning and reconstruction together,” Izadi added. “This brings us closer to computers that understand the user and their environments in much richer ways.”
Additional research presented at the 28th IEEE Conference on Computer Vision and Pattern Recognition
Learning an Efficient Model of Hand Shape Variation From Depth Images (opens in new tab) (1.9 MB.pdf)
Contributing Microsoft researchers: Sameh Khamis, Jonathan Taylor (opens in new tab), Jamie Shotton (opens in new tab), Cem Keskin (opens in new tab), Shahram Izadi (opens in new tab), Andrew Fitzgibbon (opens in new tab)
A new method of scanning human hands to generate a low-dimensional generic hand model, using machine learning.
Exploiting Uncertainty in Regression Forests for Accurate Camera Relocalization (opens in new tab) (875 KB .pdf)
Contributing Microsoft researchers: Jamie Shotton (opens in new tab), Andrew Fitzgibbon (opens in new tab), Shahram Izadi (opens in new tab)
Presents a new method that improves camera relocalization by up to 40% more frames than the current state of the art.
A Light Transport Model for Mitigating Multipath Interference in Time-of-Flight Sensors (opens in new tab) (3.0 MB .pdf)
Contributing Microsoft researchers: Nikhil Naik, Christoph Rhemann (opens in new tab), Shahram Izadi (opens in new tab), Sing Bing Kang (opens in new tab)
Presents a method for correcting multi-path interference in the time-of-flight-based Kinect sensor in the Xbox One camera, using a new computational camera technique.
Computationally Bounded Retrieval (opens in new tab) (522 KB .pdf)
Contributing Microsoft researchers: Cem Keskin, Pushmeet Kohli (opens in new tab), Shahram Izadi (opens in new tab)
A new method for image search retrieval that shows improvements in accuracy and speed over current methods.
A Geodesic-Preserving Method for Image Warping (opens in new tab) (8.4 MB .pdf)
Contributing Microsoft researchers: Kaiming He (opens in new tab), Jian Sun (opens in new tab)
A new method that improves the visual quality of panoramic and wide-angle images.
Cascaded Hand Pose Regression (opens in new tab) (1.1 MB.pdf)
Contributing Microsoft researchers: Yichen Wei (opens in new tab), Jian Sun (opens in new tab)
A novel approach that demonstrates accurate, high-speed hand tracking using consumer depth sensors.
Learning a Convolutional Neural Network for Non-Uniform Motion Blur Removal (opens in new tab) (9.4 MB .pdf)
Contributing Microsoft researcher: Jian Sun (opens in new tab)
Proposes a deep learning-based approach to correcting non-uniform motion blur in images.
Sparse Projections for High-Dimensional Binary Codes (opens in new tab) (304 KB .pdf)
Contributing Microsoft researchers: Yan Xia, Kaiming He (opens in new tab), Pushmeet Kohli (opens in new tab), Jian Sun (opens in new tab)
A new method that increases accuracy and speed of image retrieval and image classification by an order of magnitude.
Convolutional Feature Masking for Joint Object and Stuff Segmentation (opens in new tab) (2.8 MB .pdf)
Contributing Microsoft researchers: Jifeng Dai (opens in new tab), Kaiming He (opens in new tab), Jian Sun (opens in new tab)
A new method that demonstrates state-of-the-art in object recognition and labeling at fast speeds.
Global Refinement of Random Forest (opens in new tab) (627 KB .pdf)
Contributing Microsoft researchers: Xudong Cao (opens in new tab), Yichen Wei (opens in new tab), Jian Sun (opens in new tab)
Proposes two new methods within machine learning — global refinement and global pruning — that both greatly improves accuracy and reduces the storage needed for the random forest learning method.
Light Field Layer Matting (opens in new tab) (2.7 MB.pdf)
Contributing Microsoft researcher: Rick Szeliski (opens in new tab)
Applying a “matting” technique to clean up and sharpen images that contain an obscure foreground, like a picture of a bird outside of a dirty window.