Information Retrieval using a Singular Value Decomposition Model of Latent Semantic Structure

  • George W. Furnas
  • Thomas K. Landauer
  • Richard A. Harshman
  • Lynn A. Streeter
  • Karen E. Lochbaum

Proceedings of SIGIR 1998 |

SIGIR Test-of-Time Award

In a new method for automatic indexing and retrieval, implicit higher-order structure in the association of terms with documents is modeled to improve estimates of term-document association, and therefore the detection of relevant documents on the basis of terms found in queries. Singular-value decomposition is used to decompose a large term by document matrix into 50 to 150 orthogonal factors from which the original matrix can be approximated by linear combination; both documents and terms are represented as vectors in a 50- to 150- dimensional space. Queries are represented as pseudo-documents vectors formed from weighted combinations of terms, and documents are ordered by their similarity to the query. Initial tests find this automatic method very promising.