I wrote the first draft of this article already half a year ago. Since then it has been hanging around on my drafts list. There has always been another article that I for some reason have chosen to complete first. Now, after I wrote my previous article about the new free tier in Azure Cosmos DB, I guess it’s finally time for me to complete this article too. The free tier is just one more item on this list.
In this article, I’ll outline the main things in Azure Cosmos DB that I think make it such a great product. I will also, for the sake of completeness, cover a few things I am still missing from Cosmos DB.
Things I Love
First, let’s have a look at the things I love about Azure Cosmos DB. I’ll describe each of these more in detail in the chapters below. I’ll just summarize them in this bullet list.
- The free tier
- No schema
- Simple SDK
- Flexible scaling options
- Powerful query support
The Free Tier
This probably does not come as a surprise. I really love the fact that you now can start for free with Cosmos DB. And best of all, you get all the features of Cosmos DB also in the free tier. Of course some of the features, such as multi-region support, are by nature such that they don’t fit into the free tier. Still, when running your Cosmos DB account with the free tier discount applied, you get one region for free even in multi-region scenarios.
The lack of an enforced schema is what separates no-SQL databases like Cosmos DB from traditional relational databases. That’s probably also why people either love them or hate them. One reason for hating people often come up with is that you can’t create relations between documents in a no-SQL database. That’s wrong. Of course you can create relations. The database engine will just not enforce these relations.
The fact that there is no strict enforcement is in my opinion also the beauty of the lack of an enforced schema. It’s so much easier to work with your data model, when you don’t need to make sure that your database understands your model too.
The v2 of the Cosmos DB SDK was pretty simple to use. But the Cosmos DB SDK v3 is even simpler. Even with the new SDK it’s still important to have a good data model to work with. You can read more about creating a data model in Part 3 of my Cosmos DB tutorial.
Flexible Scaling Options
Cosmos DB is very simple to scale up, if your application grows and needs more performance from its data store. Basically it just means that you configure a higher throughput for your database or collection. The effect is virtually immediate. Also the new autopilot feature, which is now in public preview, makes scaling a breeze.
Scaling out to additional regions, to provide geographical redundancy, is also quite simple. Just remember that the more regions you want to support, the more your Cosmos DB account will cost.
Powerful Query Support
I am here mainly referring to the SQL API of Cosmos DB. Remember though that Cosmos DB supports other APIs as well, such as the Gremlin API that allows you to create powerful graphs with Cosmos DB.
But back to the SQL API. The language that Cosmos DB supports is very close to what you’ve probably gotten to know if you’ve worked with SQL Server. You can specify the fields you want to include in the result, filter on virtually all content of your documents, and order the results based on both fields as well as built-in functions. You can also do joins and subqueries, and a lot of other things. These are very similar to what you might be familiar with from for instance SQL Server, or other relational databases.
To learn more about what Cosmos DB supports for your SQL queries, have a look at the Getting started with SQL queries on Microsoft docs.
Things I’d Still Love to Have
I’m not saying that Cosmos DB is perfect. I would be worried if there wasn’t anything to improve on. In the chapters below I’ve summarized the things I found myself wishing for in many of the projects where I’ve worked with Cosmos DB.
Also have a look at the Cosmos DB feedback site to see what others have been looking for in Cosmos DB. You can vote on others’ suggestions, or file your own.
Better Support for Paging Through Results
Paging through a result set is something that you probably do in all applications that access a data store. Sure, Cosmos DB provides the continuation token, but it’s quite “bulky”, especially if you want to use it as a route parameter for instance. Also using the continuation token as a parameter in your custom API might be a bit cumbersome.
Recently Cosmos DB added support for
LIMIT. What that allows you to do is to skip x amount of items, and then take the next y amount of items. For instance:
SELECT c.id FROM c ORDER BY c._ts DESC OFFSET 0 LIMIT 10
That query will return the id values from the 10 most recently modified documents.
The only problem with this is that, the more you increase on your OFFSET, the higher your request charge will be.This is because Cosmos DB will process all the documents that it skips too. In one of my accounts that I tried the query above, the initial request charge was 3.5 RU/s. When I updated the query to
OFFSET 1000, the request charge went up to 89.74 RU/s! With
OFFSET 100 the request charge was 12.13 RU/s.
Still, that can be a valid option for you, but I’d love to see some improvement on this.
In some applications you need to make sure that if you manage to store one document, then you also need to make sure that another document is stored as well. These are typically documents that have dependencies among them – If one document fails to save, then the other documents might end up orphaned.
This has traditionally been managed with the help of transactions in the relational database world. However, transactions are not supported by Cosmos DB, and is currently not on the roadmap either.
Still, there are things you can do to mitigate the lack of transactions. For instance, you can use the bulk support in the .NET SDK. Just remember that doing operations in bulk does not guarantee that all documents are reliably saved.
Another thing you should look into is the fact that you are not storing rows in tables. You are storing JSON documents. So, if you have a parent-child relationship, like one order with multiple order lines, you could store the child items with the parent document. Then you won’t have to worry about storing multiple documents, since you are only storing only one.
At the end of the day, there’s a lot you can do just by applying proper architecture design to your systems.
On-Demand Backup and Restore
Backup is currently performed automatically. If you need to do a restore, I think you need to file a support ticket with Azure support. It would be great if I could manage the backup schedule, just as you do with SQL Server for instance. And then also manually do a restore, should I need one, without having to contact Microsoft.
It seems that this feature is a planned feature, and has been added to the roadmap for Cosmos DB. However, you need to remember that is has been on the roadmap for about 18 months already (end of March 2020). Let’s just hope this gets implemented at some point.
Until then, you might be interested in this sample Azure Functions application I wrote a while ago. This application takes copies of your JSON documents stored in one or more collections on a schedule.
Support Database and Collection in Connection String
If you’re like me, with a long background in the SQL Server world, you are used to specifying all information needed to connect to your data in a single connection string.
In Cosmos DB you store your JSON documents in collections that are contained in databases. It would be great if you could specify the database and collection in the connection string in a standard way supported by the .NET SDK. To my knowledge, there currently is no such support. Currently, you need the connection string to the Cosmos DB account, then database and collection, before you can start working with your data. It would be great if you would need just the connection string in the configuration for your application.
So, as you probably have understood already, I’m quite a fan of Azure Cosmos DB. Over the last couple of years, it really has become my preferred choice of data storage. At least for systems I build in Azure.
If you want learn more about Azure Cosmos DB and start using it in your applications, I’d like to invite you to have a look at my Getting Started with Cosmos DB tutorial. It will help you avoid the mistakes I made when I started out with Cosmos DB a couple of years back.