Many application problems such as data visualization, document retrieval, image annotation, collaborative filtering, and machine translation can be formalized as a task that utilizes a similarity function between objects in two heterogeneous spaces. In this paper, we address the problem of automatically learning such a similarity function using labeled training data. Conventional metric learning can be viewed as learning of similarity function over one single space, while the ‘metric learning’ problem in this paper can be regarded as learning of similarity function over two different spaces. We assume that the objects in the two original spaces are linearly mapped into a new space and dot product in the new space is defined as the similarity function. The metric learning problem then becomes that of learning the two linear mapping functions from training data. We then give a general and theoretically sound solution to the learning problem. Specifically, we prove that although the learning problem is non-convex, the global optimal solution exists and one can find the optimal solution using Singular Value Decomposition (SVD).We also show that the solution is ‘generalizable’ to unobserved data and it is possible to kernelize the method. We conducted two experiments; one experiment shows that keywords and images can be visualized in the same space based on the similarity function learned with our method, and the other experiment shows that the accuracy of document retrieval can be improved with the similarity function (relevance function) learned with our method.