Over the last decade, feature point descriptors such as SIFT and similar methods have become indispensable tools in the computer vision community. They are usually represented as high-dimensional vectors, such as the 128-dimensional SIFT or the 64-dimensional SURF vectors. While the descriptor’s high dimensionality is not an issue when only a few hundreds points need to be represented, it becomes a significant concern when millions have to be on a device with limited computational and storage resources.
In this talk, we will therefore discuss our approach to turning such floating-point descriptors into compact binary ones that require far less storage space and can be matched much faster than floating-point ones at no loss in precision and recall.
Binarizing real-values descriptors is achieved first by projecting them using a matrix designed either to solely minimize the in-class covariance of the descriptors or to jointly minimize the in-class covariance and maximize the covariance across classes and then thresholding the resulting projections. Retrieving the resulting binary descriptors involves an Approximate Nearest Neighbor algorithm that is efficient in Hamming space, which most state-of-the-art algorithms are not.