I recently participated in some discussions on where should a team put their data for new, serverless micro-services. The usual suspects were all mentioned: MySQL, PostgreSQL, Aurora, and even MongoDB. I asked "Why?" a lot and received a bunch of answers that reflected their lack of first-hand experience with serverless. I was asked "So, what's your first pick for a database?" and promptly replied, "I start with DynamoDB." As the conversation continued, I realized that what I was saying would make a good blog post or, at least, a good reference for me in the future.
Why does a serverless architecture affect your database choice? Serverless scales, both up and down, quickly and cleanly. Think of it as elastic architecture. If you have any part of your architecture that isn't as flexible, that is where your system will break down. Too often, the first area to fail is the database. What follows are the reasons I gave in that discussion for why I consider DynamoDB my first pick.
It is easy to own
DynamoDB is a fully managed service. That means you don't have to worry about the underlying hardware or virtual machines. There are no security patches to apply or operating systems to configure. Plus, to set up a table ready to scale up to millions of customers, you only need ten lines of CloudFormation1.
It doesn't let you do things that don't scale
Put another way, DynamoDB is built for scale. Or, as Alex DeBrie said, "DynamoDB won't let you write a bad query." When first encountering DynamoDB, RDBMS engineers often complain about the lack of aggregation (sum
, count
, avg
, min
, max
, etc.) or cross-table joins/transactions. But these are exactly the kinds of operations that prevent an RDBMS from scaling. Under load, it will be these table-wide aggregation/joining/locking operations that will tip over your database. This isn't a bug of DynamoDB, it's a feature.
It has strong, fine-grained authorization via IAM
If you are using AWS, you are familiar with the IAM service and how it provides fine-grained control over resources and which actions can be performed. When using DynamoDB, you don't need to learn a second (or third) set of database-specific permission systems. You can keep using IAM to manage your DynamoDB access.
It supports an (almost) unlimited number of connections
I'm a serverless advocate and AWS Lambda is my compute of choice; it scales marvelously. I can support tens of thousands of concurrent Lambda executions without lifting a finger. But, if your database is MySQL, PostgreSQL, RDS, or MongoDB database, you will have a bad day. Why? Each one of these databases uses traditional, long-lived "connections" and is strictly limited to the number it can support at one time. 40K concurrent Lambdas all trying to read data? Good luck establishing a new database connection. Not so with DynamoDB. It uses HTTPS and provides connections that scale perfectly with Lambda.2
It doesn't require networking tricks to keep it secure
DynamoDB is a zero-trust service. It checks your access (via IAM) on every call. It doesn't assume that your location in a network means you are OK to do what you need. It also means you don't need a VPN or Bastion to reach the database.
It has an easily accessible event stream for publishing change events
With DynamoDB Streams, my database becomes an event source. I can easily emit change events after they have been written without having to manage both a database write and a publish in the same process space. I can separate concerns (write here; publish there) and simplify my code. And, using an architecture of DynamoDB Streams, Lambda, and EventBridge, I have a scalable, robust, configurable event system that is fully managed, with no servers to maintain.
It performs as fast with 100 records in a Table as it does with 1,000,000,000 records in a Table.
DynamoDB is built for scale, but it is also built to perform consistently fast. You can think of DynamoDB as a "giant, distributed hash table" (The DynamoDB Book, Alex DeBrie) that gives you predictably fast results at any size. Further, since DynamoDB doesn't let you do non-scalable things, you cannot crush the system with a bad query and impact its performance for your other users.
It automatically scales to your use
DynamoDB scales out, and it does this automatically. It re-balances partitions behind the scenes and grows with your data, preserving consistent performance the whole time. Further, if you configure the table to on-demand billing, you don't have to mess with scaling billing capacity either.
I never have to come in on a Sunday to "test the database" after the Ops team resizes all the instances underneath our database cluster.
Once, I observed a company's Operations team announce that they were planning on vertically scaling their RDBMS instances (over a "maintenance window" because of the downtime they were likely to induce) and they were looking for volunteers to validate that they didn't break anything. And, since this maintenance window was early Sunday morning, the volunteers needed to work the weekend. DynamoDB, with its built-in horizontal scaling, scales in real-time, under load, with no downtime. Plus, you never need to cancel your weekend plans to know it works. It works!
When you aren't using it, it costs nothing
If you use the on-demand pricing model, DynamoDB only charges you for the reads/writes performed. No reads or writes in the last hour? It costs you nothing. This allows you to have production-grade databases in every environment: dev, stage, and production. Try to do this with any other DB system and costs will start stacking up. Pretty soon you'll hear things like "We don't really need read replicas in staging" or "We can get by with a single-node cluster in dev" and you will find you are no longer testing against a production-like system. With DynamoDB and on-demand pricing, you never have this problem.
It's easy to own
Oh, you say I covered this already? First, thank you for reading the entire article; you rock. Second, this is so important that I wanted to mention it twice. DynamoDB is the easiest database for your engineering team to own. Ownership matters.
Further Reading
Alex Debrie: SQL, NoSQL, and Scale: How DynamoDB scales where relational databases don't
AWS: DynamoDB Pricing Options
The DynamoDB Book
Pete Naylor: DynamoDBÂ Data Modeling Series
Alex Debrie’s inspiration for my title: Why I (Still) Like the Serverless Framework over the CDK
Type: AWS::DynamoDB::Table
Properties:
TableName: test-table
BillingMode: PAY_PER_REQUEST
AttributeDefinitions:
- AttributeName: id
AttributeType: S
KeySchema:
- AttributeName: id
KeyType: HASH
This connection restriction is what AWS RDS Proxy is designed to solve. So, if you have to use a relational database in your application, this will let you use a highly elastic computing source like AWS Lambda.