I am a Lead Researcher in the Software Analytics & Data Intelligence group at Microsoft Research, Beijing. I have been working in the same research group since I joined Microsoft at April 2006. I would like to summarize my research in the past 10+ years as using data-driven techniques (e.g., machine learning, data mining, etc.) to enable industry-leading features and quality of Microsoft products. Before joining Microsoft Research, I received my M.E. and B.E. from Zhejiang University in 2006 and 2003, respectively.
Data mining, especially for multi-dimensional data analysis
Machine learning, especially for software/system quality and programming languages
EAReco was the project name of our research and tech-transfer of HMM-based handwriting recognition for East-Asian languages (i.e., Simplified Chinese, Traditional Chinese, Korean, and Japanese). It was my first project after joining Microsoft Research in Beijing. We advanced the state-of-the-art recognition accuracy, especially for the cursive writing style to an industry-leading height at that time. In addition, it also involved research and engineering to allow for fast performance and small model size. Our EAReco engine and models had been shipped with Windows 7.
As a continued collaboration with Windows after the EAReco project, we decided to further improve Windows quality and user experience at the core and fundamental levels. And I started the StackMine project to help scale up the analysis for identifying Windows performance issues. Incorporating machine learning, data mining, large-scale computing, and system domain knowledge, StackMine was a technology suite and scalable system for automatic mining and recommendation of performance bottlenecks based on large scale (i.e., millions of) execution traces. Since its tech-transfer to Windows, StackMine had identified 19 high-impact performance bugs for Windows 8.
Auto Insights (2014-present)
I have been leading the research and tech-transfer of Auto Insights since Nov 2014. Auto Insights is a research framework for automatic mining and recommendation of various insights from multi-dimensional data. It also involves research and engineering to allow near real-time experiences of insight mining based on commodity database systems, or even in cloud environments. As an enabling technique towards smart analytics, Auto Insights has been helping Microsoft demonstrate industry-leading vision and technical strengths in the Business Intelligence market, via a series of releases with Power BI and reviews with Gartner.
Impact on Microsoft Products
Windows 7 handwriting recognition engine and models for East-Asian languages (i.e., Simplified Chinese, Traditional Chinese, Korean, and Japanese)
StackMine transferred to Windows, having identified 19 high-impact performance bugs for Windows 8
Quick Insights of Power BI released on Dec 1st, 2015, powered by Auto Insights
Talks, Lectures, and Events
Smart Analytics Workshop – co-organized with Prof. Andy Gordon at Microsoft Research Week, Mar 6th–10th, 2017
“Software Analytics” at the Dagstuhl Seminar “Programming with Big Code”, Nov 15th–18th, 2015
“Software Analysis Technology”, (co-teaching) a graduate course at the School of Electronics Engineering and Computer Science, Peking University, 2014 and 2015
“Data-Driven OS Performance Analysis”, a lecture at Microsoft TechFest, Mar 5th, 2014
“Context-Sensitive OS Performance Analysis”, an invited talk at the 3rd International Symposium on High Confidence Software, Dec 21st–22nd, 2013
“Software Analytics in Practice”, an invited talk at the 2nd Verified Software Workshop by Microsoft Research, Aug 23rd–24th, 2012; and a tutorial at the 25th Conference on Software Engineering Education and Training, Apr 17th–19th, 2012
Microsoft Gold Star Award 2008
12 best techniques in MSRA’s first 10 years
Thought Leadership Award, Microsoft Science Fair 2011