Transformation-based Framework for Record Matching

Proceedings of the 24th International Conference on Data Engineering, ICDE 2008 |

Published by IEEE Computer Society

Today’s record matching infrastructure does not allow a flexible way to account for synonyms such as “Robert” and “Bob” which refer to the same name, and more general forms of string transformations such as abbreviations. We propose a programmatic framework of record matching that takes such user-defined string transformations as input. To the best of our knowledge, this is the first proposal for such a framework. This transformational framework, while expressive, poses significant computational challenges which we address. We empirically evaluate our techniques over real data.