Abstract

Web search engines incorporate results from structured data sources to answer semantically rich user queries, i.e. ‘Samsung 50 inch led tv’ can be answered from a table of television data. However, users are not domain experts and quite often enter values that do not match precisely the underlying data, so a literal execution will return zero results. A search engine would prefer to return at least a minimum number of results as close to the original query as possible while providing a time-bound execution guarantee. In this paper, we formalize these requirements, show the problem is NP-Hard and present approximation algorithms that produce rewrites that work in practice. We empirically validate our algorithms on large-scale data from a major search engine.