Abstract

Visual descriptor learning seeks a projection to embed local descriptors (e.g., SIFT descriptors) into a new Euclidean space where pairs of matching descriptors (positive pairs) are better separated from pairs of non-matching descriptors (negative pairs). The original descriptors often confuse the positive pairs with the negative pairs, since local points labeled “non-matching” yield descriptors close together (irrelevant-near) or local points labeled “matching” yield descriptors far apart (relevant-far). This is because images differ in terms of viewpoint, resolution, noise, and illumination. In this paper, we formulate an embedding as a regularized discriminant analysis, which emphasizes relevant-far pairs and irrelevant-near pairs to better separate negative pairs from positive pairs. We then extend our method to nonlinear mapping by employing recent work on explicit kernel mapping. Experiments on object retrieval for landmark buildings in Oxford and Paris demonstrate the high performance of our method, compared to existing methods.