Data Systems Group

We are hiring! Click here for details. (opens in new tab)

The Data Systems group at Microsoft Research works on problems in data management. Our current areas of focus include infrastructure for large-scale cloud database systems, storage and indexing in key-value stores and vector databases, leveraging modern hardware to accelerate performance in database systems, query optimization, reducing the total cost of ownership of databases through auto-tuning, and enabling flexible ways to discover, transform and clean data sets containing both structured and unstructured data.

Our research has had a significant impact on the industry. Technology developed in our projects has shipped in several Microsoft products and services and found wide-spread adoption when released to open source. Examples of these technologies are: physical design tuning in the Database Tuning Advisor (opens in new tab) in Microsoft SQL Server, flexible resource allocation techniques (opens in new tab) in Azure SQL Database cloud infrastructure, fuzzy matching and fuzzy deduplication in Power Query (opens in new tab), Azure Data Factory (opens in new tab), Dynamics 365 (opens in new tab), Microsoft SQL Server Integration Services (SSIS) (opens in new tab), data transformation by-example in Power Query (opens in new tab), lock-free indexing for Microsoft SQL Server’s in-memory OLTP engine (“Hekaton”) (opens in new tab), technology for enabling real-time analytics in Microsoft SQL Server (“Apollo”) (opens in new tab), fast parsing of CSV and JSON data (opens in new tab) in Azure Synapse (“Mison”), the Trill event processing engine in Azure Stream Analytics (opens in new tab), the FASTER key-value store which is open source on GitHub (opens in new tab) and is used by Azure Stream Analytics, Azure Durable Functions and Microsoft Teams, the Orleans open source actor framework (opens in new tab) for building distributed applications, and the mapping compiler for Microsoft’s open source Entity Framework (opens in new tab).

Our research has also had significant impact on the academic community. We have published in top conferences such as ACM SIGMOD, VLDB, ACM SIGKDD, ACM SIGIR, The Web Conference, IEEE ICDE, CIDR, etc. Our work has resulted in two VLDB 10-Year Best Paper Awards, an ICDE Influential Paper Award as well as Best Paper Awards at ACM SIGMOD, VLDB, IEEE ICDE and CIDR.