I am a Researcher in the Data Management, Exploration and Mining (DMX) group at Microsoft Research. Before joining Microsoft, I completed my Ph.D. in Computer Science at University of Illinois at Urbana-Champaign under the supervision of Prof. Jiawei Han, my M.Phil. at The Chinese University of Hong Kong, advised by Jeffery Xu Yu, and my B.S. at Renmin University of China, advised by Shan Wang and Qing Zhu.
Research Interests
My research goals and interests center around large-scale data management and analytics, including interactively querying and exploring “big” data, privacy-preserving data analytics, query processing and optimization, and data mining algorithms. I am particularly interested in randomized and approximation algorithms which have guarantees in theory, and are effective and implementable in practice. The comment I like the most from engineering groups about my research results is:
“it is pretty simple, but it works…”
More recently, I enjoy developing algorithms and building systems in the following projects:

Approximate query processing: how to process analytical queries on large-scale data (e.g., with billions of rows) with approximate answers in interactive response time (e.g., one hundred milliseconds). We build approximate query engine based on precomputed samples and indexes, and bound errors in the approximate answers. We also insert online samplers to the query plans without precomputation. There are both theory and system challenges.

Privacy-preserving data analytics: how to process analytical queries and analytics tasks with precision guarantees while protecting data owners’ privacy with formal notations, e.g., differential privacy. We build differentially private data cubes as a private “API” to support OLAP queries. We revise the notion of differential privacy to tune privacy-utility trade-offs. Recently, under the local model of differential privacy, we invent data collection mechanisms for different data types, and pair them with estimation algorithms to support approximate data analytics (read this Microsoft Research Blog).

Querying and Searching Large-Scale Data: search models, algorithms, and indexes to help people explore large-scale structured or semi-structured data, e.g., text and knowledge graphs; query optimization and query processing, e.g., a fast operator for set intersection and an estimator for query-processing progress.

Data Mining: developing data mining algorithms for various applications, e.g., failure localization and pattern mining.
Recent Papers
* alphabetic ordering of authors

[AAAI 2018] Comparing Population Means under Local Differential Privacy: with Significance and Power (oral)
Bolin Ding, Harsha Nori, Paul Li, and Joshua Allen

[NIPS 2017] Collecting Telemetry Data Privately (and an Microsoft Research Blog)
* Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin

[VLDB 2017] Flexible Online Task Assignment in Real-Time Spatial Data
Yongxin Tong, Libin Wang, Zimu Zhou, Bolin Ding, Lei Chen, Jieping Ye, and Ke Xu

[SIGMOD 2017] Approximate Query Processing: No Silver Bullet
Surajit Chaudhuri, Bolin Ding, and Srikanth Kandula

[CHI 2017] Trust, but Verify: Optimistic Visualizations of Approximate Queries for Exploring Big Data (video)
Dominik Moritz, Danyel Fisher, Bolin Ding, and Chi Wang

[VLDB 2016] Online Minimum Matching in Real-Time Spatial Data: Experiments and Analysis
Yongxin Tong, Jieying She, Bolin Ding, Lei Chen, Tianyu Wo, and Ke Xu

[VLDB 2016] Design of Policy-Aware Differentially Private Algorithms
Samuel Haney, Ashwin Machanavajjhala, and Bolin Ding

[SIGMOD 2016] Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee
Bolin Ding, Silu Huang, Surajit Chaudhuri, Kaushik Chakrabarti, and Chi Wang

[SIGMOD 2016] Quickr: Lazily Approximating Complex Ad-Hoc Queries in Big Data Clusters
Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, and Bolin Ding

[SIGMOD 2016] Operator and Query Progress Estimation in Microsoft SQL Server Live Query Statistics
Kukjin Lee, Arnd Christian Konig, Vivek Narasayya, Bolin Ding, Surajit Chaudhuri, Brent Ellwein, Alexey Eksarevskiy, Manbeen Kohli, Jacob Wyant, Praneeta Prakash, Rimma Nehme, Jiexing Li, and Jeff Naughton

[ICDE 2016] Online Mobile Micro-Task Allocation in Spatial Crowdsourcing
Yongxin Tong, Jieying She, Bolin Ding, Libin Wang, and Lei Chen

[ICDCS 2016] Enabling Privacy-Preserving Incentives for Mobile Crowd Sensing Systems
Haiming Jin, Lu Su, Bolin Ding, Klara Nahrstedt, and Nikita Borisov

[SIGMOD 2015] S4: Top-k Spreadsheet-Style Search for Query Discovery
Fotis Psallidas, Bolin Ding, Kaushik Chakrabarti, and Surajit Chaudhuri

[VLDB 2015] Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers
Mohan Yang, Bolin Ding, Surajit Chaudhuri, and Kaushik Chakrabarti

[KDD 2014] Scalable Near Real-Time Failure Localization of Data Center Networks
Herodotos Herodotou, Bolin Ding, Shobana Balakrishnan, Geoff Outhred, and Percy Fitter

[SIGMOD 2014] Discovering Queries based on Example Tuples
Yanyan Shen, Kaushik Chakrabarti, Surajit Chaudhuri, Bolin Ding, and Lev Novik

[SIGMOD 2014] Blowfish Privacy: Tuning Privacy-Utility Trade-offs using Policies
Xi He, Ashwin Machanavajjhala, and Bolin Ding

[VLDB 2013] Attraction and Avoidance Detection from Movements
Zhenhui Li, Bolin Ding, Fei Wu, Tobias Kin Hou Lei, Roland Kays, and Margaret C. Crofoot

[KDD 2013] EventCube: Multi-Dimensional Search and Mining of Structured and Text Data (demo)
Fangbao Tao, et al.

More in DBLP and Google Scholar




Selected Awards:

  • (2017) FY17 Technical Excellence, Microsoft Privacy
  • (2012) Gold, The 2nd Yahoo!-DAIS Research Excellence Award Competition
  • (2007) Best Student Paper Award, ICDE’07
  • (2007-2008) Richard T. Cheng Fellowship Award, University of Illinois at Urbana-Champaign
  • (2007) 1st place, TopCoder Programming Competition College Tour at University of Illinois
  • (2005) Honorable Mention, 2005 ACM-ICPC Programming Contest World Finals
  • (2004) 3rd place out of 255 teams, Gold Medal, ACM-ICPC Asia Regional Contest, Shanghai Site


I have worked with some amazing interns:


Program Committee Memberships:

  • International Conference on Very Large Data Bases (PVLDB): 2018, 2017
  • ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD): 2018, 2017
  • ACM International Conference on Information and Knowledge Management (CIKM): 2017
  • Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD): 2018, 2017, 2016, 2015, 2014, 2013
  • International Workshop on Privacy-Preserving Data Publication and Analysis (PrivDB, in conjunction with ICDE): 2013

NSF Panelist: 2016

Reviewer for Journals: ACM Transactions on Database Systems, IEEE Transactions on Knowledge and Data Engineering, ACM Transactions on Knowledge Discovery from Data, Theoretical Computer Science, Pattern Recognition, Information Sciences, Knowledge and Information Systems