Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition

Yanhua Cheng; Xin Zhao; Rui Cai; Zhiwei Li (李志伟); Kaiqi Huang; Yong Rui

Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition

Yanhua Cheng ,
Xin Zhao ,
Rui Cai ,
Zhiwei Li (李志伟) ,
Kaiqi Huang ,
Yong Rui

Proc. of the 25th International Joint Conference on Artificial Intelligence (IJCAI-16) | July 2016

Download BibTex

This paper studies the problem of RGB-D object recognition. Inspired by the great success of deep convolutional neural networks (DCNN) in AI, researchers have tried to apply it to improve the performance of RGB-D object recognition. However, DCNN always requires a large-scale annotated dataset to supervise its training. Manually labeling such a large RGB-D dataset is expensive and time consuming, which prevents DCNN from quickly promoting this research area. To address this problem, we propose a semi-supervised multimodal deep learning framework to train DCNN effectively based on very limited labeled data and massive unlabeled data. The core of our framework is a novel diversity preserving co-training algorithm, which can successfully guide DCNN to learn from the unlabeled RGB-D data by making full use of the complementary cues of the RGB and depth data in object representation. Experiments on the benchmark RGB-D dataset demonstrate that, with only 5% labeled training data, our approach achieves competitive performance for object recognition compared with those state-of-the-art results reported by fully-supervised methods.