Schema-Agnostic Indexing with Azure DocumentDB

  • Dharma Shukla ,
  • Shireesh Thota ,
  • Karthik Raman ,
  • Madhan Gajendran ,
  • Ankur Shah ,
  • Sergii Ziuzin ,
  • Krishnan Sundaram ,
  • Miguel Gonzalez Guajardo ,
  • Anna Wawrzyniak ,
  • Samer Boshra ,
  • Renato Ferreira ,
  • Mohamed Nassar ,
  • Michael Koltachev ,
  • Ji Huang ,
  • Sudipta Sengupta ,
  • Justin Levandoski ,
  • David Lomet

Proceedings of the VLDB Endowment |

Publication

Azure DocumentDB is Microsoft’s multi-tenant distributed database service for managing JSON documents at Internet scale. DocumentDB is now generally available to Azure developers. In this paper, we describe the DocumentDB indexing subsystem. DocumentDB indexing enables automatic indexing of documents without requiring a schema or secondary indices. Uniquely, DocumentDB provides real-time consistent queries in the face of very high rates of document updates. As a multi-tenant service, DocumentDB is designed to operate within extremely frugal resource budgets while providing predictable performance and robust resource isolation to its tenants. This paper describes the DocumentDB capabilities, including document representation, query language, document indexing approach, core index support, and early production experiences.