I am a SR. Principal Research Manager in the DKI (Data, Knowledge, Intelligence) area at Microsoft Research Asia. I have been working in the same research group since I joined Microsoft in April 2006. Now I lead the Data Analytics Research team, research directions spanning machine learning, multi-dimensional data mining, explainable AI, causal inference, graph models, and their applications in tabular data intelligence, survey forms analytics, and software engineering. Key technologies have been/are being shipped to Office (Excel, Forms, Word), Power BI & Dynamics, Windows, and Bing Search. I would like to summarize my research in the past 15 years as using data-driven techniques to enable industry-leading features of Microsoft products. Before joining Microsoft Research, I received my M.E. and B.E. from Zhejiang University in 2006 and 2003, respectively.
- Machine learning, multi-dimensional data mining, explainable AI
- Applications in tabular data intelligence, survey forms analytics, software engineering
EAReco was the project name of our research and tech-transfer of HMM-based handwriting recognition for East-Asian languages (i.e., Simplified Chinese, Traditional Chinese, Korean, and Japanese). It was my first project after joining Microsoft Research in Beijing. We advanced the state-of-the-art recognition accuracy, especially for the cursive writing style to an industry-leading height at that time. In addition, it also involved research and engineering to allow for fast performance and small model size. Our EAReco engine and models had been shipped with Windows 7.
As a continued collaboration with Windows after the EAReco project, we decided to further improve Windows quality and user experience at the core and fundamental levels. And I started the StackMine project to help scale up the analysis for identifying Windows performance issues. Incorporating machine learning, data mining, large-scale computing, and system domain knowledge, StackMine was a technology suite and scalable system for automatic mining and recommendation of performance bottlenecks based on large scale (i.e., millions of) execution traces. Since its tech-transfer to Windows, StackMine had identified 19 high-impact performance bugs for Windows 8.
Auto Insights (2014-present)
I have been leading the research and tech-transfer of Auto Insights since Nov 2014. Auto Insights is a research framework for automatic mining and recommendation of various insights from multi-dimensional data. It also involves research and engineering to allow near real-time experiences of insight mining based on commodity database systems, or even in cloud environments. As an enabling technique towards smart analytics, Auto Insights has been helping Microsoft demonstrate industry-leading vision and technical strengths in the Business Intelligence market, via a series of releases with Power BI and reviews with Gartner.
Tabular Data Intelligence (2017-present)
I have been leading a team of researchers working on spreadsheet intelligence to enable on-click intelligent experiences in Excel of Microsoft Office 365. Our vision is to solve the grand challenges behind such on-click intelligence for spreadsheets, including table range detection, table structure analysis, table metadata understanding, table format recommendation, etc. With our techniques for both spreadsheet intelligence and auto insights, we collaborate with Excel and shipped Ideas in Excel on March 1, 2019. Our TableSense technology and SDK have been powering intelligent features of multiple key products in Microsoft Office 365.
Impact on Microsoft Products
- Windows 7 handwriting recognition engine and models for East-Asian languages (i.e., Simplified Chinese, Traditional Chinese, Korean, and Japanese)
- StackMine transferred to Windows, helped identify and fix 19 high-impact performance bugs for Windows 8 before release
- Quick Insights of Power BI released on Dec 1, 2015, powered by Auto Insights
- Forms Ideas announced in Sept 2018, powered by Auto Insights
- Ideas in Excel released to GA (General Availability) on March 1, 2019, powered by Spreadsheet Intelligence and Auto Insights
Talks, Lectures, and Events
- Data, Knowledge, Intelligence Workshop – co-organized with Dr. Winnie Ycui at Microsoft Research Asia Innovation Partnership 2020
- Future of Spreadsheeting Workshop – co-organized with Prof. Andy Gordon and Dr. Ben Zorn at Microsoft Research Faculty Summit 2019
- Smart Analytics Workshop – co-organized with Prof. Andy Gordon at Microsoft Research Week, Mar 6th–10th, 2017
- “Auto Insights for Multi-dimensional Data Analysis”, a lecture at Microsoft TechFest, Mar 9th, 2016
- “Software Analytics” at the Dagstuhl Seminar “Programming with Big Code”, Nov 15th–18th, 2015
- “Software Analysis Technology”, (co-teaching) a graduate course at the School of Electronics Engineering and Computer Science, Peking University, 2014 and 2015
- “Data-Driven OS Performance Analysis”, a lecture at Microsoft TechFest, Mar 5th, 2014
- “Context-Sensitive OS Performance Analysis”, an invited talk at the 3rd International Symposium on High Confidence Software, Dec 21st–22nd, 2013
- “Software Analytics in Practice”, an invited talk at the 2nd Verified Software Workshop by Microsoft Research, Aug 23rd–24th, 2012; and a tutorial at the 25th Conference on Software Engineering Education and Training, Apr 17th–19th, 2012
- Microsoft Gold Star Award 2008
- 12 representative technologies in MSRA’s first 10 years (the first technology – digital ink)
- Thought Leadership Award, Microsoft Science Fair 2011
- 20 representative papers in MSRA’s first 20 years (the paper in the year 2012)