A Search Engine for Natural Language Applications

April 29, 2005
Michael Cafarella | University of Washington

Many modern natural language-processing applications utilize search engines to locate large numbers of Web documents or to compute statistics over the Web corpus. Yet Web search engines are designed and optimized for simple human queries—they are not well suited to support such applications. As a result, these applications are forced to issue millions of successive queries resulting in unnecessary search engine load and in slow applications with limited scalability.

As a replacement, we propose the Bindings Engine (BE), which supports queries that contain typed variables and string-processing functions.

These primitives are well-suited to the needs of natural language applications. Further, BE’s novel neighborhood index enables it to process such queries very efficiently. As a result, BE can yield several orders of magnitude of speedup for large-scale applications, incurring only a modest cost in index storage space and computation overhead.

Speaker Details

Michael Cafarella is a Ph.D. candidate in Computer Science at the University of Washington, under the supervision of Dan Suciu and Oren Etzioni. His research focus is databases and artificial intelligence, with a particular emphasis on information extraction. He is especially interested in applying his research to Web data. In addition to his Ph.D. studies, Mike has worked as an intern at Google and as an engineer at two successful startups. He is also the co-creator of the Hadoop open-source project, which is deployed widely in both academia and industry.