About

I am a Researcher in the Data Management, Exploration and Mining (DMX) group at Microsoft Research. Before joining Microsoft, I completed my Ph.D. in Computer Science at University of Illinois at Urbana-Champaign under the supervision of Prof. Jiawei Han, my M.Phil. at The Chinese University of Hong Kong, advised by Jeffery Xu Yu, and my B.S. at Renmin University of China, advised by Shan Wang and Qing Zhu.

Research Interests

My research goals and interests center around large-scale data management, including interactively querying and exploring “big” data, privacy-preserving data analytics, query processing and optimization, and data mining algorithms. I am particularly interested in randomized and approximation algorithms which have performance guarantees in theory, and are effective and robust in practice as well. More recently, I am interested in:

Approximations in Big Data: “Approximations” have twofold meanings here. First, under resource budgets (e.g., storage cost and computation power), how to enable interactive analytics by trading off accuracy for instant responses. Second, under constraints of data privacy, how to enable data analytics with both privacy and precision guarantees.

  • Approximate query processing: how to process analytical queries on large-scale data (e.g., with billions of rows) with approximate answers in interactive response time (e.g., one hundred milliseconds).
  • Privacy-preserving data analytics: how to process analytical queries and analytics tasks with precision guarantees while protecting data owners’ privacy with formal notations (e.g., differential privacy).

Querying and Searching Large-Scale Data:

  • Inventing new search models and interfaces to help people explore structured/semi-structured data (e.g., text and knowledge graphs), and developing efficient search algorithms and index structures.
  • Query optimization and query processing (e.g., set intersection and progress estimation).

Data Mining: developing data mining algorithms for various applications (e.g., knowledge graphs and pattern mining).

Recent Papers

* alphabetic ordering of authors

[NIPS 2017] Collecting Telemetry Data Privately
* Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin

[VLDB 2017] Flexible Online Task Assignment in Real-Time Spatial Data
Yongxin Tong, Libin Wang, Zimu Zhou, Bolin Ding, Lei Chen, Jieping Ye, and Ke Xu

[SIGMOD 2017] Approximate Query Processing: No Silver Bullet
Surajit Chaudhuri, Bolin Ding, and Srikanth Kandula

[CHI 2017] Trust, but Verify: Optimistic Visualizations of Approximate Queries for Exploring Big Data (Video)
Dominik Moritz, Danyel Fisher, Bolin Ding, and Chi Wang

[VLDB 2016] Online Minimum Matching in Real-Time Spatial Data: Experiments and Analysis
Yongxin Tong, Jieying She, Bolin Ding, Lei Chen, Tianyu Wo, and Ke Xu

[VLDB 2016] Design of Policy-Aware Differentially Private Algorithms
Samuel Haney, Ashwin Machanavajjhala, and Bolin Ding

[SIGMOD 2016] Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee
Bolin Ding, Silu Huang, Surajit Chaudhuri, Kaushik Chakrabarti, and Chi Wang

[SIGMOD 2016] Quickr: Lazily Approximating Complex Ad-Hoc Queries in Big Data Clusters
Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, and Bolin Ding

[SIGMOD 2016] Operator and Query Progress Estimation in Microsoft SQL Server Live Query Statistics
Kukjin Lee, Arnd Christian Konig, Vivek Narasayya, Bolin Ding, Surajit Chaudhuri, Brent Ellwein, Alexey Eksarevskiy, Manbeen Kohli, Jacob Wyant, Praneeta Prakash, Rimma Nehme, Jiexing Li, and Jeff Naughton

[ICDE 2016] Online Mobile Micro-Task Allocation in Spatial Crowdsourcing
Yongxin Tong, Jieying She, Bolin Ding, Libin Wang, and Lei Chen

[ICDCS 2016] Enabling Privacy-Preserving Incentives for Mobile Crowd Sensing Systems
Haiming Jin, Lu Su, Bolin Ding, Klara Nahrstedt, and Nikita Borisov

[SIGMOD 2015] S4: Top-k Spreadsheet-Style Search for Query Discovery
Fotis Psallidas, Bolin Ding, Kaushik Chakrabarti, and Surajit Chaudhuri

[VLDB 2015] Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers
Mohan Yang, Bolin Ding, Surajit Chaudhuri, and Kaushik Chakrabarti

[KDD 2014] Scalable Near Real-Time Failure Localization of Data Center Networks
Herodotos Herodotou, Bolin Ding, Shobana Balakrishnan, Geoff Outhred, and Percy Fitter

[SIGMOD 2014] Discovering Queries based on Example Tuples
Yanyan Shen, Kaushik Chakrabarti, Surajit Chaudhuri, Bolin Ding, and Lev Novik

[SIGMOD 2014] Blowfish Privacy: Tuning Privacy-Utility Trade-offs using Policies
Xi He, Ashwin Machanavajjhala, and Bolin Ding

[VLDB 2013] Attraction and Avoidance Detection from Movements
Zhenhui Li, Bolin Ding, Fei Wu, Tobias Kin Hou Lei, Roland Kays, and Margaret C. Crofoot

[KDD 2013] EventCube: Multi-Dimensional Search and Mining of Structured and Text Data
Fangbao Tao, et al.

More in DBLP and Google Scholar

Projects

Quickr: Cost-Effective Data Analytics at Scale

Established: March 8, 2016

We are inundated with data. Resources to analyze the data are finite and expensive. Approximate answers allow us to explore much larger amounts of data than otherwise possible given available resources. Reducing the cost, if doable for a large fraction of the complex queries that run on this data, is of strategic importance because the savings can be re-invested into more sophisticated algorithms or be used as a key differentiator for analytics-as-a-service offerings. Unfortunately, state-of-art…

Data Exploration

Established: June 8, 2004

This is a project area rather than a specific project. This project area focuses on novel ways to query, browse, extract, explore, mine and manage various kinds of data residing within the enterprise and on the web: structured data in relational databases, tabular data embedded in web pages, enterprise documents and spreadsheets as well as unstructured data in query logs, text documents and social media. Our research is relevant to both enterprise and consumer scenarios…

Publications

2017

2016

2015

2014

2013

Attraction and Avoidance Detection from Movements
Zhenhui Li, Bolin Ding, Fei Wu, Tobias Kin Hou Lei, Roland Kays, Margaret C. Crofoot, in Proceedings of the VLDB Endowment, the 40th International Conference on Very Large Data Bases (VLDB 2014), Proceedings of the VLDB Endowment, the 40th International Conference on Very Large Data Bases (VLDB 2014), VLDB – Very Large Data Bases, September 1, 2013, View abstract, Download PDF
EventCube: Multi-Dimensional Search and Mining of Structured and Text Data
Fangbao Tao, Kin Hou Lei, Jiawei Han, ChengXiang Zhai, Xiao Cheng, Marina Danilevsky, Nihit Desai, Bolin Ding, Jing Ge, Heng Ji, Rucha Kanade, Anne Kao, Qi Li, Yanen Li, Cindy Xide Lin, Jialiu Iiu, Nikunj Oza, Ashok Srivastava, Rod Tjoelker, Chi Wang, Duo Zhang, Bo Zhao, in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2013), ACM – Association for Computing Machinery, August 1, 2013, View abstract, Download PDF

2012

2011

2010

2009

2008

2007

2006

Projects

Other

Selected Awards:

  • (2017) FY17 Technical Excellence, Microsoft Privacy
  • (2012) Gold, The 2nd Yahoo!-DAIS Research Excellence Award Competition
  • (2007) Best Student Paper Award, ICDE’07
  • (2007-2008) Richard T. Cheng Fellowship Award, University of Illinois at Urbana-Champaign
  • (2007) 1st place, TopCoder Programming Competition College Tour at University of Illinois
  • (2005) Honorable Mention, 2005 ACM-ICPC Programming Contest World Finals
  • (2004) 3rd place out of 255 teams, Gold Medal, ACM-ICPC Asia Regional Contest, Shanghai Site

Interns

I have worked with some amazing interns:

Services

Program Committee Memberships:

  • International Conference on Very Large Data Bases (PVLDB): 2018, 2017
  • ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD): 2017
  • ACM International Conference on Information and Knowledge Management (CIKM): 2017
  • Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD): 2017, 2016, 2015, 2014, 2013
  • International Workshop on Privacy-Preserving Data Publication and Analysis (PrivDB, in conjunction with ICDE): 2013

NSF Panelist: 2016

Reviewer for Journals: ACM Transactions on Database Systems, IEEE Transactions on Knowledge and Data Engineering, ACM Transactions on Knowledge Discovery from Data, Theoretical Computer Science, Pattern Recognition, Information Sciences, Knowledge and Information Systems