Cost-Effective, Low Latency Vector Search with Azure Cosmos DB

Nitish Upreti; Krishnan Sundaram; Hari Sudan Sundar; Samer Boshra; Balachandar Perumalswamy; S. Atri; Martin Chisholm; Revti Raman Singh; Greg Yang; Subramanyam Pattipaka; Tamara Hass; Nitesh Dudhey; James Codella; Mark Hildebrand; M. Manohar; Jack Moffitt; Haiyang Xu; Naren Datha; Suryansh Gupta; Ravishankar Krishnaswamy; Prashant Gupta; Abhishek Sahu; Ritika Mor; Santosh Kulkarni; Hemeswari Varada; Sudhanshu Barthwal; Amar Sagare; Dinesh Billa; Zishan Fu; Neil Deshpande; Shaun Cooper; Kevin Pilch; S. Moreno; Aayush Kataria; Vipul Vishal; H. Simhadri

Cost-Effective, Low Latency Vector Search with Azure Cosmos DB

Nitish Upreti ,
Krishnan Sundaram ,
Hari Sudan Sundar ,
Samer Boshra ,
Balachandar Perumalswamy ,
S. Atri ,
Martin Chisholm ,
Revti Raman Singh ,
Greg Yang ,
Subramanyam Pattipaka ,
Tamara Hass ,
Nitesh Dudhey ,
James Codella ,
Mark Hildebrand ,
M. Manohar ,
Jack Moffitt ,
Haiyang Xu ,
Naren Datha ,
Suryansh Gupta ,
Ravishankar Krishnaswamy ,
Prashant Gupta ,
Abhishek Sahu ,
Ritika Mor ,
Santosh Kulkarni ,
Hemeswari Varada ,
Sudhanshu Barthwal ,
Amar Sagare ,
Dinesh Billa ,
Zishan Fu ,
Neil Deshpande ,
Shaun Cooper ,
Kevin Pilch ,
S. Moreno ,
Aayush Kataria ,
Vipul Vishal ,
H. Simhadri

VLDB | May 2025 , Vol abs/2505.05885

Download BibTex

Vector indexing enables semantic search over diverse corpora and has become an important interface to databases for both users and AI agents. Efficient vector search requires deep optimizations in database systems. This has motivated a new class of specialized vector databases that optimize for vector search quality and cost. Instead, we argue that a scalable, high-performance, and cost-efficient vector search system can be built inside a cloud-native operational database like Azure Cosmos DB while leveraging the benefits of a distributed database such as high availability, durability, and scale. We do this by deeply integrating DiskANN, a state-of-the-art vector indexing library, inside Azure Cosmos DB NoSQL. This system uses a single vector index per partition stored in existing index trees, and kept in sync with underlying data. It supports < 20ms query latency over an index spanning 10 million vectors, has stable recall over updates, and offers approximately 43× and 12× lower query cost compared to Pinecone and Zilliz serverless enterprise products. It also scales out to billions of vectors via automatic partitioning. This convergent design presents a point in favor of integrating vector indices into operational databases in the context of recent debates on specialized vector databases, and offers a template for vector indexing in other databases.