When to use DynamoDB

Cyclists racing in London — Photo by John Cameron on Unsplash

There's no doubt DynamoDB has surged in popularity hand-in-hand with the growth in serverless adoption, as well as those fascinated and horrified by Rick Houlihan's talks from AWS re:Invent explaining single-table design.

Here I run through some of the key advantages and disadvantages you might want to consider when evaluating DynamoDB for your next project.

Why use DynamoDB over a SQL database

Operating a DynamoDB database is simplified as much as possible. DynamoDB is a fully-managed service, so you are freed from the headaches of managing servers, capacity planning, scheduling and coordinating backups, multi-AZ setup and other operational concerns that come with self-managed databases. Compared to SQL, operational overhead related to schema management is also gone, because you can declare your tables in CloudFormation or Terraform, and the (almost) schemaless document-design lets you worry about evolving your document structure in code (instead of as a generally forgotten operational concern to be addressed at release time).

DynamoDB comes with a number of automated operational concerns, not limited to:

automatic data replication between availability zones for high data safety and replication
automatic capacity scaling with usage that is responsive and related to cost
data replication between AWS regions using global tables
quick and simple backups
point-in-time recovery
builtin encryption at rest, including with your keys

A well-thought out single-table design allows your schema to evolve with minimal fuss. Although data migrations may be required for some significant changes, adding fields and new collections to your table is simply just a matter of defining them in your code. Provided the relationships of your core document types are well understood from the beginning, you do not find yourself having to manually run Data-Definition Language (DDL) statements on deploy or put in a place a scheme to apply them.

You get a consistent performance profile as your application grows. DynamoDB scales from hundreds of rows to millions of rows while delivering the same performance. Even so-called "small" applications have one or two tables that grow at an exponentially faster rate than others, which eventually slows down the performance of the entire application.

Provided you use DynamoDB properly, you never hit that 'we need to optimise our queries' moment or slowdowns from large tables affecting the rest of the application.

SQL databases let developers hide performance issues from themselves, which only surface as the usage of a system starts to grow. Designing for DynamoDB requires developers and system architects to understand their access patterns upfront, which means that most potential performance issues can be addressed from the beginning.

With SQL, problems only manifest as tables get bigger and queries no longer perform as well as when the system was small and lightly used. By that stage, you are rushing to understand bottlenecks and fix them while the user experience suffers.

Understanding the performance profile of DynamoDB is easier for developers to understand compared with SQL. Because SQL gives the illusion of being able to access your data flexibly and performantly when databases are small, the underlying complexity of indexes, joins and access plans (and the seemingly inscrutible decisions a database makes to implement them) can be a suprising and steep learning curve which becomes urgent at the worst times.

DynamoDB on the hand has a very simple (i.e. restrictive) API and set of storage mechanisms available, and therefore understanding what does and doesn't scale in DynamoDB is much easier. Add in the predictable latency of each operation at scale with a simple set of rules, and the performance profile becomes accessible to most developers.

When DynamoDB is inappropriate

DynamoDB for analytics use cases is hard. It's highlighted again and again that DynamoDB excels at OLTP (transactional) use cases and is inappropriate for OLAP (analytical and data warehousing) use cases which need to perform complex joins and run over large datasets in the one query. Without going any further into this, unless you understand your analytics use cases well and can build real-time analytics processing with DynamoDB streams, you should look elsewhere for your OLAP requirements.

Text search is inefficient Another common use case is free text search or cases where you need lots of different indexes - although DynamoDB allows up to 50 GSIs, this comes with a high cost and complexity. An even more awful solution would be to scan and read all the documents on demand.

As awful as it is, ElasticSearch is better start, and a hybrid-architecture can easily be achieved by using DynamoDB streams to feed updates into a paired ElasticSearch cluster.

Modelling your database upfront is difficult. Although you can get away with a small amount of upfront modelling, if your application is pivoting regularly and the business domain is really uncertain, it can be hard to work with DynamoDB when your schema is constantly shifting, doubly-so if you already have something in production and need to migrate data.

This is where something like SQL may present some advantages as your schema evolves, but you will still be fighting schema changes and performance issues caused by a lack of planning or consideration of performance in the long run.

Forrest Brazeal has put together a more comprehensive set of reasons why you shouldn't use DynamoDB, with some really important points, especially regarding developers not understanding how to use DynamoDB properly and attempting to replicate patterns from MongoDB and SQL databases and quickly running into trouble (a topic I have plenty of thoughts about and will write about soon).