Instance-based Schema Matching for Web Databases by Domain-specific Query Probing
- Jiying Wang ,
- Ji-Rong Wen ,
- Fred Lochovsky ,
- Wei-Ying Ma
Published by Very Large Data Bases Endowment Inc.
In a Web database that dynamically provides information in response to user queries, there are two distinguishing schemas, interface schema and result schema, presented to users. Each of them partially reflect schema of the backend database. Most previous works merely studied the problem of schema matching across query interfaces of Web databases. In this paper, we propose a novel schema model that, in particular, distinguishes the interface schema (the schema users can query) and the result schema (the schema users can browse) of a Web database in a specific domain. In this model, we address two significant schema matching problems for Web databases, intra-site schema matching and inter-site schema matching. The first problem is crucial in automatically extracting data from Web databases, while the second problem plays a significant role in meta-retrieving and integrating data from different Web databases. We also investigate the feasibility of a unified solution to the two problems based on query probing and instance-based schema matching techniques. Benefiting form the model, a cross validation technique is also proposed to improve the accuracy of various schema matchings. Our experiments on real Web databases demonstrate that the two problems can be solved at the same time with high precision and recall.
All articles published in this journal are protected by copyright, which covers the exclusive rights to reproduce and distribute the article (e.g., as offprints), as well as all translation rights. No material published in this journal may be reproduced photographically or stored on microfilm, in electronic data bases, video disks, etc., without first obtaining written permission from Very Large Data Bases Endowment Inc.