{"id":155288,"date":"2008-04-01T00:00:00","date_gmt":"2008-04-01T00:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/msr-research-item\/book-search-experiments-investigating-ir-methods-for-the-indexing-and-retrieval-of-books\/"},"modified":"2019-06-11T14:26:43","modified_gmt":"2019-06-11T21:26:43","slug":"book-search-experiments-investigating-ir-methods-for-the-indexing-and-retrieval-of-books","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/book-search-experiments-investigating-ir-methods-for-the-indexing-and-retrieval-of-books\/","title":{"rendered":"Book Search Experiments: Investigating IR Methods for the Indexing and Retrieval of Books"},"content":{"rendered":"<div class=\"asset-content\">\n<p>Through mass-digitization projects and with the use of OCR<br \/>\ntechnologies, digitized books are becoming available on the Web and in<br \/>\ndigital libraries. The unprecedented scale of these efforts, the unique<br \/>\ncharacteristics of the digitized material as well as the unexplored possibilities<br \/>\nof user interactions make full-text book search an exciting area of<br \/>\ninformation retrieval (IR) research. Emerging research questions include:<br \/>\nHow appropriate and effective are traditional IR models when applied to<br \/>\nbooks? What book specific features (e.g., back-of-book index) should receive<br \/>\nspecial attention during the indexing and retrieval processes? How<br \/>\ncan we tackle scalability? In order to answer such questions, we developed<br \/>\nan experimental platform to facilitate rapid prototyping of a book<br \/>\nsearch system as well as to support large-scale tests. Using this system,<br \/>\nwe performed experiments on a collection of 10 000 books, evaluating the<br \/>\nefficiency of a novel multi-field inverted index and the effectiveness of the<br \/>\nBM25F retrieval model adapted to books, using book-specific fields.<\/p>\n<\/div>\n<p><!-- .asset-content --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Through mass-digitization projects and with the use of OCR technologies, digitized books are becoming available on the Web and in digital libraries. The unprecedented scale of these efforts, the unique characteristics of the digitized material as well as the unexplored possibilities of user interactions make full-text book search an exciting area of information retrieval (IR) [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"gabkaz"},{"type":"user_nicename","value":"mitaylor"}],"msr_publishername":"Springer","msr_publisher_other":"","msr_booktitle":"Advances in Information Retrieval, 30th European Conference on IR Research, ECIR 2008","msr_chapter":"","msr_edition":"Advances in Information Retrieval, 30th European Conference on IR Research, ECIR 2008","msr_editors":"","msr_how_published":"","msr_isbn":"978-3-540-78645-0","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"234-245","msr_page_range_start":"","msr_page_range_end":"","msr_series":"Lecture Notes in Computer Science","msr_volume":"4956","msr_copyright":"","msr_conference_name":"","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":2008,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13555],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-155288","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-search-information-retrieval","msr-locale-en_us"],"msr_publishername":"Springer","msr_edition":"Advances in Information Retrieval, 30th European Conference on IR Research, ECIR 2008","msr_affiliation":"","msr_published_date":"","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"Advances in Information Retrieval, 30th European Conference on IR Research, ECIR 2008","msr_pages_string":"234-245","msr_chapter":"","msr_isbn":"978-3-540-78645-0","msr_journal":"","msr_volume":"4956","msr_number":"","msr_editors":"","msr_series":"Lecture Notes in Computer Science","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":"","msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"gabkaz","user_id":31838,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=gabkaz"},{"type":"user_nicename","value":"mitaylor","user_id":32956,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=mitaylor"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/155288","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/155288\/revisions"}],"predecessor-version":[{"id":592159,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/155288\/revisions\/592159"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=155288"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=155288"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=155288"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=155288"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=155288"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=155288"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=155288"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=155288"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=155288"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=155288"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=155288"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=155288"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=155288"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}