Azure Cosmos DB

Sumit Sengupta, Cloud Solution Architect, US One Commercial Partner

On May 10, 2017, Microsoft announced the general availability of Azure Cosmos DB—Microsoft’s globally distributed, massively scalable, multi-model database service running on all Azure regions. Prior to that, the service was called DocumentDB. Databases that existed under DocumentDB are now accessible under the “SQL” API of Cosmos DB, with no change required from users. Since May 2017, we added more of the APIs—MongoDB, Table (key-value), Cassandra, and Graph. More on the APIs later.

On the popular Database Engine Ranking site, where databases are ranked in terms of external references, job postings, discussions, and general interest, Cosmos DB is the only one that takes a spot in the top 5 for all major categories: Key Value, Document, Wide Column, and Graph. In most cases you can “lift and shift” your application to point to Cosmos DB, without changing programs, and gain all the benefits it has to offer in cloud.

It is a true Platform as a Service (PaaS) database in every sense. It can be used from any of the regions in Azure cloud. With cloud comes high availability, fault tolerance, performance, and distributed nature, included right out of the box. Plus many more features that are unique to Cosmos DB and not available with other NoSQL databases. It is a PaaS in service offering as well as a PaaS in billing, as we’ll explain later.

Cosmos DB is well-tested and proven. Microsoft uses it for many of its own internal systems. If you ever made a call using Skype, you used it perhaps without knowing.

Getting started

If you do not have an Azure account or just want to try the database, there is a free 7-day trial service. No credit card or subscription needed. In addition, there is a Cosmos DB emulator that can run Windows or Docker on Windows if you want to try it locally.

If you do have an Azure account, the first step in getting started with Cosmos DB is from the portal,  where you create a resource ID named “Azure Cosmos DB” under a resource group. The only thing you have to decide, besides the Azure region to host it, is the preferred API. This ID or Account becomes the global owner of all databases created underneath it.

Think of the ID as a logical container—like a virtual machine which hosts databases. A database can have as many collections (tables) as needed. Collections contain documents, just like regular tables contain rows in relational databases. Collections can also have stored procedures, triggers, or user-defined functions. Just like regular databases, you can create users within a database, and each user can have separate access on collections, as shown below.

What’s unique in Cosmos DB

Automatic Indexing

Cosmos DB is already set up to take care of all the administration challenges for a DBA. All fields in all collections—including nested and array fields—are indexed. In the Jason document sample below, we have an index on all fields, including “familyName” and “city”.

{             “_id” : ObjectId(“5a8722dd67bd4715ac660128”),

“id” : “Shivangi”,

“parents” : [       { familyName” : “Wakefield”, “givenName” : “Robin”               },

{“familyName” : “Miller”, “givenName” : “Ben”                         }             ],

“children” : [      {“familyName” : “Merriam”, “gender” : “female”, “grade” : 1,  },

{  “familyName” : “Lia”, “gender” : “female”,”grade” : 8             }   ],

“address” :          { “state” : “NY”, “county” : “Manhattan”, “city” : “NY” },


Sometimes you know all the expected queries for your database and do not want to have all the fields indexed to avoid writing index changes for new and updated records. In that case, you can use a custom index policy and skip these fields not to be indexed.

Automatic scaling – storage

Cosmos DB is a “planet” scale database, literally. Microsoft uses Cosmos DB for many of its own internal systems. If you set the size as unlimited when creating a collection, the collection size can grow unbounded. For small collections, you can choose a 10GB fixed maximum size.

Geo replication

Cosmos DB can be replicated globally across all regions, except the Sovereign Azure regions, such as Azure Government or Azure China/Germany. Replicated regions can be set up with a failover priority, so that in the extreme case of multiple Azure regions failing, there is a priority order for which region becomes primary. You can failover to any of the replicated regions manually with an unprecedented SLA for this operation.

Both read and write operations from an application connecting to Cosmos DB connect to a local region with a proper choice of partition key. This enables applications like IoT and gaming to scale limitlessly—without caring for write latency or changing primary write region for the database.

Guaranteed SLA for performance and high availability

Thanks to the scale of Azure cloud, we can offer 99.999% read availability. There are SLAs for individual operations, like reads, or for creating a database (more details on that  here). Even when your database is not geo-replicated, and is located in a single region, you have four copies of the data. Replication within the region is synchronous, and a write is only acknowledged when a majority of write is successful within the region.

Consistency choices  

Relational databases offer strong consistency for a variety of situations, such as you and your spouse being able to withdraw money from different ATMs without overdrawing. For financial transactions, this “serializability” of transactions makes things accurate. But for global, distributed applications or for streaming applications, it also inhibits performance. If you are streaming video all over the world, it does not matter if people see the same picture at the same exact time—as long as they see it in the right order and without interruptions. This is where eventual consistency comes to play. If you update your Facebook status constantly, it does not matter if your mother across the globe sees your status instantly. What matters is she gets the updates eventually, in the right order.

Many databases offer eventual consistency as an option, but Cosmos DB goes far beyond that. Besides eventual and strong (similar to RDBMS) consistency, it has three other models.

Cosmos DB consistency can be defined at the account/Resource ID level. All collections inherit the parent consistency level of the ID. However, the consistency can be overridden in an individual operation against any collection.

To explain this concept of the intermediary levels, let’s imagine a simple case of a single writer process writing data changes A, B, C, D, E in this order every 2 seconds. Let’s assume that all these changes are document inserts only. And for simplicity’s sake, let’s assume all these writes happened in the East US region and a reader wants to read the data in Australia.

In a Bounded staleness consistency, data in a remote region is strong consistent except for the “bounded” limit, which can be expressed in two numbers: v (versions) or t (time in seconds). What that means is that at most, the read in Australia can be v versions behind or t seconds behind from the write. So, if v = 2 and t = 3, read can be at most 2 versions behind (A, B, C) or 3 seconds behind (A, B, C, D). Note that the reader does not get the writes out of order at any time, i.e. not A, C, B order.

Session consistency is the default in Cosmos DB. It guarantees strong consistency within a session. A writer process always reads its own write and in the same order as it was written. This is the most popular consistency level—a nice compromise of performance latency and availability. In case of a single session connecting to the same database using multiple clients, this session can preserve its identity across these clients. The identity can be defined by a session token. This can be saved as a browser cookie, and a subsequent read will preserve its session identity as the original writer and get the same reads.

In consistent prefix, the only guarantee is that your writes will not appear out of order in another region. In other words, reads will always appear either A, B, C or A, B, but never A, C, B.

In the case of eventual consistency, the replica sets will eventually converge to the same data set, but in-between, the reads may appear out of order.

Performance scaling – RU

The performance capacity of Cosmos DB is measured in Resource Unit (RU). One RU is a normalized resource (CPU, IO, Disk) consumed for reading 1KB of data. RU is consumed anytime you do an operation with Cosmos DB, whether it is for read, write, or stored procedure execution. You provision certain RU per second for a collection. RU can be changed, anytime, up or down, and changes can be made programmatically, using the portal or any SDK and API. In case your application gets a sudden burst of read requests that exceed the provisioned RU, the queries get a preemptive exception message and are throttled. You can capture the exception to throw an alert or increase the RU provision.

Be sure to note that Cosmos DB billing takes place at the reserved RU level, not the RU consumed. Part of the billing is for RU, and the other part is for the storage capacity. As of now, size of a collection can be either fixed to cap at 10 GB (RU range 400–10000) or as unlimited with no cap (RU range 1000–100,000). You can get an estimate of your RU using this calculator. When you run a query, you can see how much RU it consumed to plan for your capacity.

Transaction Support

 Unlike most NoSQL databases, Cosmos DB supports atomic transactions with multiple documents, as long as the documents belong to a single partition. This is one factor to consider during design of the partition key—more on that design below.

Database design

With database performance and high availability considerations taken care of, there is still one factor that you need to plan for designs. For collections that are not capped in size, Cosmos DB requires you to choose a partition/shard key. Your partition key is hashed into logical partitions, and on the backend, Cosmos DB maps them to physical partitions. This makes the distribution of storage even across all the partitions. The database engine will automatically redistribute the data across partitions, depending on the distribution of new incoming data set.

As a designer, you have to choose the right partition key. A good key ensures that in aggregate, queries target all partitions equally. In general, queries should include the partition key as part of the “where clause”, otherwise all partitions will need to be searched for data. For a streaming incoming IOT data, the event timestamp is a bad choice for partition key as it will lead to “hot” partitions for all new data of the same timestamp. In general, it is a good idea to choose a key that has many distinct values – as opposed to few. As the values sets a limit of the upper bound for the physical partitions with the lower bound can be as low as one – depending on the volume of data.

Some tips

Here are some of the cool things you can do – this is just a small tip of the iceberg. You can use a spark connector to do Machine Learning and Data Science in Spark using the Spark connector for Cosmos DB and get the benefit of distribution of data. Here are some examples for you to try out.

You can use Cosmos DB’s change feed to allow serverless coding, including Azure functions, to be triggered based on data changes in Cosmos DB.

Besides viewing your data using the standard APIs and SDKs, you can use Azure storage explorer to view Cosmos DB data. Simply add the Cosmos DB account to your storage explorer as if it was a storage account.

Run aggregate queries straight out of the database using SQL, LinQ, or even MongoDB aggregation framework for Mongo API.

Typical Cosmos DB applications

As a globally distributed database offering guaranteed performance SLA even in the face of sustaining writes, any web, mobile, gaming, and IoT application can be the ideal application for Cosmos DB. With the choice of different API, you can take a running MongoDB application, change its connection string to Cosmos DB, and as long as the data was copied over, it would run without changing a single line of code (check out the explanation here). Check out use cases for more information.


With Azure Cosmos DB, you can distribute your data to any number of Azure regions, which enables you to put your data where your users are, ensuring the lowest possible latency to your customers. It supports multiple data models and APIs, including SQL API, MongoDB API, Cassandra API, Table API, and Graph API—with more models and APIs coming soon. Azure Cosmos DB also guarantees end-to-end low latency at the 99th percentile, with 99.99% availability SLA for all single-region database accounts, and all 99.999% read availability on all multi-region database accounts. It can provide this even while sustaining incoming data. With Cosmos DB, the possibilities are limitless, and all developers need to focus is on their application logic.

Join our Data & AI Partner community call on April 6, where we’ll be discussing key benefits, use cases, and more about Azure Cosmos DB.

Data & AI Partner Community