For multitenant systems, just not rolling back or tearing down the DB works almost for free. You just gotta have nice query wrappers that take a user key as the first parameter.
I would say that if you have good fixtures, you can also avoid a lot of “create one user per test” churn by sharing the user, and just querying in the “right way”. If you’re using something like pytest with proper tagging and fixture support, you can even port big test suites to this model through opt-ins. Why would you do it? Performance gains of course!
I was thinking about this the whole post (not this link in specific, but the general concept).
Isolation sounds good, and has an elegance to it, but if your code (and tests) is crapping itself in the presence of slightly unexpected data, maybe you have bigger problems.
data pollution if you need to check certain properties.
That’s the primary ticket. IME data pollution means uncontrolled inputs, which means the tests have to be built to accommodate that, which either makes them much harder to write (e.g. you need a lot of precondition checking as you don’t control them, and randomised test data to ensure you don’t collide with the existing garbage)… or leads to tests which barely test anything, because they can’t really deterministically and reliably interact with the system or test properties.
The advantages are largely unconvincing, test data is not generally realistic (since the goal is to exercise edge cases) unless the tests are very weak, and while detecting slow access patterns is a good idea… previous test data is unlikely to reveal that, the database is not in any sort of realistic shape or load as it’s pretty much just going to depend on what tests you ran. Setting up proper benchmarks, either using database presets or generative whole-system setup, actually have a chance of doing that.
We use (and built) tempgres for the same purpose. It provides a simple REST API to obtain credentials for a fresh database, and is easy to run locally, as a shared service, or even a sidecar for each CI build.
I’ve used https://testcontainers.com/ for a project and it’s really nice. It’s essentially the ‘run Postgres in a container’ option, but the code to create it and tear it down after are integrated into your tests (rspec for Rails for instance or Go) so there’s no extra/external scripting for the test environment. It has libs for most platforms/languages and was dead easy to get going.
parallel testing can be done against a single postgresql (docker) container by having multiple databases or, more conveniently (at least with the official postgresql container image), multiple ‘schemas’ within the same database (postgresql docs).
test setup code creates a schema and creates the tables, etc, and each connection can be used as-is, except it needs to execute this query first, which hopefully can be done automagically at the db library level (python sqlalchemy example):
SET search_path TO schema_test_1; -- 2, 3, etc.
the end result is parallelism without the complexity of having multiple postgresql instances: just a single container, simpler docker-compose / CI config, single db connection url, etc.
I’m using this pattern and love it. It is fantastic. I am a firm believer that incrementally adding tests across the life of a codebase should not linear increase a test run, and this was the only way I could achieve that.
Using memory blocks as disks haven’t given me the benefits I would expect for Postgres, but nvme drives are so fast I haven’t found it necessary. However, I hadn’t tested this in conjunction with docker so I may try again.
I was also very discouraged to see most modern solutions for Rails have a fixed (and default to low counts of) parallel DBs. I have the db forking in use in a fairly small Rails codebase. If someone has interest I would be happy to share as a gist.
Related:
https://calpaterson.com/against-database-teardown.html
For multitenant systems, just not rolling back or tearing down the DB works almost for free. You just gotta have nice query wrappers that take a user key as the first parameter.
I would say that if you have good fixtures, you can also avoid a lot of “create one user per test” churn by sharing the user, and just querying in the “right way”. If you’re using something like pytest with proper tagging and fixture support, you can even port big test suites to this model through opt-ins. Why would you do it? Performance gains of course!
I was thinking about this the whole post (not this link in specific, but the general concept).
Isolation sounds good, and has an elegance to it, but if your code (and tests) is crapping itself in the presence of slightly unexpected data, maybe you have bigger problems.
I dont think its about the code crapping itself, more about avoiding flaky tests and data pollution if you need to check certain properties.
That’s the primary ticket. IME data pollution means uncontrolled inputs, which means the tests have to be built to accommodate that, which either makes them much harder to write (e.g. you need a lot of precondition checking as you don’t control them, and randomised test data to ensure you don’t collide with the existing garbage)… or leads to tests which barely test anything, because they can’t really deterministically and reliably interact with the system or test properties.
The advantages are largely unconvincing, test data is not generally realistic (since the goal is to exercise edge cases) unless the tests are very weak, and while detecting slow access patterns is a good idea… previous test data is unlikely to reveal that, the database is not in any sort of realistic shape or load as it’s pretty much just going to depend on what tests you ran. Setting up proper benchmarks, either using database presets or generative whole-system setup, actually have a chance of doing that.
If you use python, testing.postgresql is a nice library.
I will be surprised if nobody mentioned neon
We use (and built) tempgres for the same purpose. It provides a simple REST API to obtain credentials for a fresh database, and is easy to run locally, as a shared service, or even a sidecar for each CI build.
https://github.com/ClockworkConsulting/tempgres-server
I’ve used https://testcontainers.com/ for a project and it’s really nice. It’s essentially the ‘run Postgres in a container’ option, but the code to create it and tear it down after are integrated into your tests (rspec for Rails for instance or Go) so there’s no extra/external scripting for the test environment. It has libs for most platforms/languages and was dead easy to get going.
parallel testing can be done against a single postgresql (docker) container by having multiple databases or, more conveniently (at least with the official postgresql container image), multiple ‘schemas’ within the same database (postgresql docs).
test setup code creates a schema and creates the tables, etc, and each connection can be used as-is, except it needs to execute this query first, which hopefully can be done automagically at the db library level (python sqlalchemy example):
the end result is parallelism without the complexity of having multiple postgresql instances: just a single container, simpler docker-compose / CI config, single db connection url, etc.
Interesting. I’ve always used the transaction method successfully, but it brings a bit of complexity to the tests.
I’ll give a try to this method, it looks pretty good.
I’m using this pattern and love it. It is fantastic. I am a firm believer that incrementally adding tests across the life of a codebase should not linear increase a test run, and this was the only way I could achieve that.
Using memory blocks as disks haven’t given me the benefits I would expect for Postgres, but nvme drives are so fast I haven’t found it necessary. However, I hadn’t tested this in conjunction with docker so I may try again.
I was also very discouraged to see most modern solutions for Rails have a fixed (and default to low counts of) parallel DBs. I have the db forking in use in a fairly small Rails codebase. If someone has interest I would be happy to share as a gist.