I am a Principal Researcher and Research Manager in the DKI (Data, Knowledge, Intelligence) area at Microsoft Research Asia. I have been working in the same research group since I joined Microsoft in April 2006. I would like to summarize my research in the past over ten years as using data-driven techniques (e.g., machine learning, data mining, etc.) to enable industry-leading features and quality of Microsoft products. Before joining Microsoft Research, I received my M.E. and B.E. from Zhejiang University in 2006 and 2003, respectively.
- Machine learning, especially for software/system quality, programming languages, (semi-)structured data analytics
- Data mining, especially for multi-dimensional data analysis
EAReco was the project name of our research and tech-transfer of HMM-based handwriting recognition for East-Asian languages (i.e., Simplified Chinese, Traditional Chinese, Korean, and Japanese). It was my first project after joining Microsoft Research in Beijing. We advanced the state-of-the-art recognition accuracy, especially for the cursive writing style to an industry-leading height at that time. In addition, it also involved research and engineering to allow for fast performance and small model size. Our EAReco engine and models had been shipped with Windows 7.
As a continued collaboration with Windows after the EAReco project, we decided to further improve Windows quality and user experience at the core and fundamental levels. And I started the StackMine project to help scale up the analysis for identifying Windows performance issues. Incorporating machine learning, data mining, large-scale computing, and system domain knowledge, StackMine was a technology suite and scalable system for automatic mining and recommendation of performance bottlenecks based on large scale (i.e., millions of) execution traces. Since its tech-transfer to Windows, StackMine had identified 19 high-impact performance bugs for Windows 8.
Auto Insights (2014-present)
I have been leading the research and tech-transfer of Auto Insights since Nov 2014. Auto Insights is a research framework for automatic mining and recommendation of various insights from multi-dimensional data. It also involves research and engineering to allow near real-time experiences of insight mining based on commodity database systems, or even in cloud environments. As an enabling technique towards smart analytics, Auto Insights has been helping Microsoft demonstrate industry-leading vision and technical strengths in the Business Intelligence market, via a series of releases with Power BI and reviews with Gartner.
Spreadsheet Intelligence (2017-present)
I have been leading a team of researchers working on spreadsheet intelligence to enable on-click intelligent experiences in Excel of Microsoft Office 365. Our vision is to solve the grand challenges behind such on-click intelligence for spreadsheets, including table range detection, table structure analysis, table metadata understanding, table format recommendation, etc. With our techniques for both spreadsheet intelligence and auto insights, we collaborate with Excel and shipped Ideas in Excel on March 1, 2019.
Impact on Microsoft Products
- Windows 7 handwriting recognition engine and models for East-Asian languages (i.e., Simplified Chinese, Traditional Chinese, Korean, and Japanese)
- StackMine transferred to Windows, having identified 19 high-impact performance bugs for Windows 8
- Quick Insights of Power BI released on Dec 1, 2015, powered by Auto Insights
- Ideas in Excel released to GA (General Availability) on March 1, 2019, powered by Spreadsheet Intelligence and Auto Insights
Talks, Lectures, and Events
- Future of Spreadsheeting Workshop – co-organized with Prof. Andy Gordon and Dr. Ben Zorn at Microsoft Research Faculty Summit 2019
- Smart Analytics Workshop – co-organized with Prof. Andy Gordon at Microsoft Research Week, Mar 6th–10th, 2017
- “Software Analytics” at the Dagstuhl Seminar “Programming with Big Code”, Nov 15th–18th, 2015
- “Software Analysis Technology”, (co-teaching) a graduate course at the School of Electronics Engineering and Computer Science, Peking University, 2014 and 2015
- “Data-Driven OS Performance Analysis”, a lecture at Microsoft TechFest, Mar 5th, 2014
- “Context-Sensitive OS Performance Analysis”, an invited talk at the 3rd International Symposium on High Confidence Software, Dec 21st–22nd, 2013
- “Software Analytics in Practice”, an invited talk at the 2nd Verified Software Workshop by Microsoft Research, Aug 23rd–24th, 2012; and a tutorial at the 25th Conference on Software Engineering Education and Training, Apr 17th–19th, 2012