Abstract

Hashing techniques have been intensively investigated for large scale vision applications. Recent research has shown that leveraging supervised information can lead to high quality hashing. However, most existing supervised hashing methods only construct similarity-preserving hash codes. Observing that semantic structures carry complementary information, we propose the idea of co-training for hashing, by jointly learning projections from image representations to hash codes and classification. Specifically, a novel deep semantic-preserving and ranking-based hashing (DSRH) architecture is presented, which consists of three components: a deep CNN for learning image representations, a hash stream of a binary mapping layer by evenly dividing the learnt representations into multiple bags and encoding each bag into one hash bit, and a classification stream. Meanwhile, our model is learnt under two constraints at the top loss layer of hash stream: a triplet ranking loss and orthogonality constraint. The former aims to preserve the relative similarity ordering in the triplets, while the latter makes different hash bit as independent as possible. We have conducted experiments on CIFAR-10 and NUS-WIDE image benchmarks, demonstrating that our approach can provide superior image search accuracy than other state-of-the-art hashing techniques.