Yeye He

Principal Researcher


I am a principal researcher at the Data Management, Exploration and Mining (DMX) group at Microsoft Research. I finished my PhD from University of Wisconsin-Madison with Prof. Jeffrey Naughton.

Recently I have been working on Self-service Data Preparation, where we develop technologies to automate a variety of labor-intensive data-preparation tasks, such as transform data by-examples (TDE), automatically join tables (Auto-Join, Sema-Join), automatically detect data errors in tables (Auto-Detect, Uni-Detect), automatically recognize rich semantic data types (Auto-Type), split data into tables without examples (Auto-Split), produce mapping relationships (Auto-Map), match records across tables (Auto-EM), etc.

Some of these technologies have shipped in Microsoft products such as Power Query (natively integrated in Excel under the “Data” tab, also available in Power BI), Azure Machine Learning Data Prep, and Microsoft Dynamics 365 Customer Insights (Record Matching).

Previously I worked on Synonym-Mining (e.g., Entity-Synonym, Attribute-Synonym, Acronym, etc.) using search engine query logs. The technologies are used in applications like Bing Snapp and Bing Knowledge Widget.