I am a Researcher in the Data Management, Exploration and Mining (DMX) group at Microsoft Research. Before joining Microsoft, I completed my Ph.D. in Computer Science at University of Illinois at Urbana-Champaign under the supervision of Prof. Jiawei Han, my M.Phil. at The Chinese University of Hong Kong, advised by Jeffery Xu Yu, and my B.S. at Renmin University of China, advised by Shan Wang and Qing Zhu.

Research Interests

My research goals and interests span different aspects of large-scale data management, including querying and exploring “big” data, optimizing database systems, data mining algorithms and applications, and privacy-preserving data analytics. I am particularly interested in (randomized or approximation) algorithms which have performance guarantees in theory, and are effective and robust in practice as well. More recently, I am interested in:

  • Searching and Exploring Big Data: a) processing analytical queries on large-scale data (e.g., with billions of rows) with approximate answers in interactive response time (e.g., one hundred milliseconds); b) inventing new search models and interfaces to help people explore structured/semi-structured (text) data more easily, and developing efficient algorithms and index structures to support them.
  • Query Processing in Database Systems: a) faster algorithms for building block components (e.g., set intersection); b) progress estimation in query processing.
  • Data Mining: developing data mining algorithms for various applications.
  • Data Privacy
  • Graphs in Databases

Recent Papers

[VLDB 2016] Online Minimum Matching in Real-Time Spatial Data: Experiments and Analysis
Yongxin Tong, Jieying She, Bolin Ding, Lei Chen, Tianyu Wo, and Ke Xu

[VLDB 2016] Design of Policy-Aware Differentially Private Algorithms
Samuel Haney, Ashwin Machanavajjhala, and Bolin Ding

[SIGMOD 2016] Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee
Bolin Ding, Silu Huang, Surajit Chaudhuri, Kaushik Chakrabarti, and Chi Wang

[SIGMOD 2016] Quickr: Lazily Approximating Complex Ad-Hoc Queries in Big Data Clusters
Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, and Bolin Ding

[SIGMOD 2016] Operator and Query Progress Estimation in Microsoft SQL Server Live Query Statistics
Kukjin Lee, Arnd Christian Konig, Vivek Narasayya, Bolin Ding, Surajit Chaudhuri, Brent Ellwein, Alexey Eksarevskiy, Manbeen Kohli, Jacob Wyant, Praneeta Prakash, Rimma Nehme, Jiexing Li, and Jeff Naughton

[ICDE 2016] Online Mobile Micro-Task Allocation in Spatial Crowdsourcing
Yongxin Tong, Jieying She, Bolin Ding, Libin Wang, and Lei Chen

[ICDCS 2016] Enabling Privacy-Preserving Incentives for Mobile Crowd Sensing Systems
Haiming Jin, Lu Su, Bolin Ding, Klara Nahrstedt, and Nikita Borisov

[SIGMOD 2015] S4: Top-k Spreadsheet-Style Search for Query Discovery
Fotis Psallidas, Bolin Ding, Kaushik Chakrabarti, and Surajit Chaudhuri

[VLDB 2015] Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers
Mohan Yang, Bolin Ding, Surajit Chaudhuri, and Kaushik Chakrabarti

[KDD 2014] Scalable Near Real-Time Failure Localization of Data Center Networks
Herodotos Herodotou, Bolin Ding, Shobana Balakrishnan, Geoff Outhred, and Percy Fitter

[SIGMOD 2014] Discovering Queries based on Example Tuples
Yanyan Shen, Kaushik Chakrabarti, Surajit Chaudhuri, Bolin Ding, and Lev Novik

[SIGMOD 2014] Blowfish Privacy: Tuning Privacy-Utility Trade-offs using Policies
Xi He, Ashwin Machanavajjhala, and Bolin Ding

[VLDB 2013] Attraction and Avoidance Detection from Movements
Zhenhui Li, Bolin Ding, Fei Wu, Tobias Kin Hou Lei, Roland Kays, and Margaret C. Crofoot

[KDD 2013] EventCube: Multi-Dimensional Search and Mining of Structured and Text Data
Fangbao Tao, et al.

More in DBLP and Google Scholar


Quickr: Cost-effective data analytics at scale

Established: March 8, 2016

We are inundated with data. Resources to analyze the data are finite and expensive. Approximate answers allow us to explore much larger amounts of data than otherwise possible given available resources. Reducing the cost, if doable for a large fraction of the complex queries that run on this data, is of strategic importance because the savings can be re-invested into more sophisticated algorithms or be used as a key differentiator for analytics-as-a-service offerings. Unfortunately, state-of-art…

Data Exploration

Established: June 8, 2004

This is a project area rather than a specific project. This project area focuses on novel ways to query, browse, extract, explore, mine and manage various kinds of data residing within the enterprise and on the web: structured data in relational databases, tabular data embedded in web pages, enterprise documents and spreadsheets as well as unstructured data in query logs, text documents and social media. Our research is relevant to both enterprise and consumer scenarios…






Attraction and Avoidance Detection from Movements
Zhenhui Li, Bolin Ding, Fei Wu, Tobias Kin Hou Lei, Roland Kays, Margaret C. Crofoot, in Proceedings of the VLDB Endowment, the 40th International Conference on Very Large Data Bases (VLDB 2014), Proceedings of the VLDB Endowment, the 40th International Conference on Very Large Data Bases (VLDB 2014), VLDB – Very Large Data Bases, September 1, 2013, View abstract, Download PDF
EventCube: Multi-Dimensional Search and Mining of Structured and Text Data
Fangbao Tao, Kin Hou Lei, Jiawei Han, ChengXiang Zhai, Xiao Cheng, Marina Danilevsky, Nihit Desai, Bolin Ding, Jing Ge, Heng Ji, Rucha Kanade, Anne Kao, Qi Li, Yanen Li, Cindy Xide Lin, Jialiu Iiu, Nikunj Oza, Ashok Srivastava, Rod Tjoelker, Chi Wang, Duo Zhang, Bo Zhao, in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2013), ACM – Association for Computing Machinery, August 1, 2013, View abstract, Download PDF









Service, Interns, and Other

Professional Activities

  • Program Committee Memberships:
    • International Conference on Very Large Data Bases (PVLDB): 2017
    • Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD): 2016, 2015, 2014, 2013
    • International Workshop on Privacy-Preserving Data Publication and Analysis (PrivDB, in conjunction with ICDE): 2013
  • NSF Panelist: 2016
  • Reviewer for Journals: ACM Transactions on Database Systems, IEEE Transactions on Knowledge and Data Engineering, ACM Transactions on Knowledge Discovery from Data, Theoretical Computer Science, Pattern Recognition, Information Sciences, Knowledge and Information Systems


I have worked with some amazing interns: Fabian Hüske (2013), Yanyan Shen (2013), Mohan Yang (2013), Fotis Psallidas (2014), Saravanan Thirumuruganathan (2014), Silu Huang (2015), Vasileios Verroios (2015)