Book Search Experiments: Investigating IR Methods for the Indexing and Retrieval of Books
- Gabriella Kazai ,
- Mike Taylor
Published by Springer
Through mass-digitization projects and with the use of OCR
technologies, digitized books are becoming available on the Web and in
digital libraries. The unprecedented scale of these efforts, the unique
characteristics of the digitized material as well as the unexplored possibilities
of user interactions make full-text book search an exciting area of
information retrieval (IR) research. Emerging research questions include:
How appropriate and effective are traditional IR models when applied to
books? What book specific features (e.g., back-of-book index) should receive
special attention during the indexing and retrieval processes? How
can we tackle scalability? In order to answer such questions, we developed
an experimental platform to facilitate rapid prototyping of a book
search system as well as to support large-scale tests. Using this system,
we performed experiments on a collection of 10 000 books, evaluating the
efficiency of a novel multi-field inverted index and the effectiveness of the
BM25F retrieval model adapted to books, using book-specific fields.