Predictable performance is the real killer feature of NoSQL. NoSQL databases aren’t necessarily faster, but it’s easier to understand what exactly a query is mechanically doing (e.g. scanning this table) and therefore easier to understand query performance under load. Performance degrades slowly and predictably under load instead of suddenly and without warning when the query planner changes its mind. The sharding story is also mechanically simpler, which makes it easier to understand how sharding and adding nodes impacts performance.
The trade-off is that lots of these concerns (how should my query work mechanically?) get pushed onto the application developer, and the application has to build its own joins, FKs, and integrity constraints. Your schemas end up being more tailored for your specific access patterns, and it’s harder to write new queries ad-hoc. For a new query to be performant, you’ll sometimes need to build a new set of tables.
Postgres vs. Dynamo is a lot like functional vs. imperative: you’re trading expressiveness for more predictable and controllable performance. AWS is an example of a team where this trade often makes sense - they gave a great talk about how they require teams to use Dynamo so that performance is stable under high load.
I enjoy using DynamoDB both at work and in side-projects. As long as you plan your queries and use-cases ahead of time, DynamoDB rewards you with hassle-free dependable performance. But, unlike relational databases, a poor schema can hose you, it’s difficult to migrate or index you way out of e.g. a poorly distributed partition key.
Yes, AWS teams must justify using a relational database, when launching a new service or feature, otherwise they are encouraged to use DynamoDB. But DynamoDB is not good for all work loads. There are no magic bullets for storing and accessing state at scale.
I also like the parts of DynamoDB that let you automatically back up the table and restore it to any point in time within a minute, or export it to S3. This is the part I work on. The problems being solved are quite interesting.
In terms of predictability, there’s a lot of different parts of DynamoDB that work together to ensure overload results in consistent throttling and then trying to scale out. This is something I took away from the Roblox Consul outage, how hard it is reason about distributed system failures during overload and how consistent predictable throttling is the best outcome that is so difficult to deliver.
I work on a part of DynamoDB, my opinions are my own.