A generic solution that doesn’t require Docker is a tool/library called eatmydata: https://github.com/stewartsmith/libeatmydata.
Using LD_PRELOAD or a wrapper executable, libeatmydata essentially turns fsync() and other APIs that try to ensure durability into no-ops. If you don’t care about data durability, you can aggressively enable eatmydata to get a substantial speedup for workloads that call into these expensive APIs.
eatmydata is also useful when testing other applications that [ab]use fsync including build systems.
fsync() is also the reason why people believe that ramfs are just/always faster than drives. Very often the kernel is doing a good job at caching data in memory and drives perform as well as ramfs… once you disabled fsync.
At least on Linux (where I’ve measured it), tmpfs is in fact significantly faster than persistent filesystems even for cached (purely in-memory) operations.
Whether your application is filesystem-intensive enough for it to matter is another question.
…did you disable fsync() in your benchmarks?
My measurements were taken with purpose-built, hand-written microbenchmarks. There was no fsync to “disable”.
Another Dockerless alternative: put your data directory on a tmpfs file system. If you want to make setup a breeze, you can keep your pre-start state on a real disk and rsync it to tmpfs at the start of every test run.
For mysql, other than what folks have already said, you can use the MEMORY engine for your database.
for testing the sled database a few thousand times per day I do something similar by running most tests in /dev/shm
In my experience, >99% of the performance benefit of running your database transactions on tmpfs is the disabling of fsync – even copying databases to and from tmpfs, which still incurs the write load back to the harddisk, I got 100 times speedup when migrating many small databases in parallel.
looks not really specific to docker, appart the fact to setup a database quick
Yeah, it’s just that Docker makes it much easier to both get the speedup, and more broadly makes it easier to just start your database as part of your unit test run.
I’m afraid the speedup doesn’t really happen on OSes where Docker runs in a VM like macOS. At my daily job, I’ve measured that our automated tests are around 30% slower when the database runs in Docker.
One should also consider using a ramdisk in addition to disabling fsync.
Does ramdisk work for VM docker environments? Curious if you were able to find any speed ups in that environment.
I don’t know, it’s been a long time I stopped running databases in Docker on macOS for performance reasons. Nowadays, I simply open a terminal and start the plain old database binary in the foreground on a plain old macOS ramdisk with fsync disabled. I’m not sure the ramdisk really improves things but I’ve a few gigabytes of memory to waste so…
Did most of these at a previous job except instead of docker we used LXC. Another thing that is not mentioned in the article but probably should be is caching the empty state of the database schema. When you get into hundreds and thousands of migrations setting them up on a blank database can take non-trivial time especially if you use something like django migrations.
Interesting, I like your point about realism. I’ve been writing some tests that use a database mock instead of a real database for performance reasons, even though the only databases supported in that codebase are memory based. It could be argued that if you want to unit test a library, you don’t want to test it’s storage layer. Of course the rules blend when some business logic relied on by the library is enforced in the storage layer (for example uniqueness).
I wonder if there is an easy option in MySQL to turn off fsync?
docker run --mount type=tmpfs,destination=/var/lib/mysql -e mysql is one way: stores the data in a RAM disk. There’s a bunch of configuration options too, but this is easier.
docker run --mount type=tmpfs,destination=/var/lib/mysql -e mysql
Interesting technique. I wonder how it compares to running the database on a ramdisk?
Edit: derp, like three other people already propose this.
Counterpoint: “Don’t set fsync=off if you want to keep your data”. fsync=off is the option the author uses in this article. You may want to keep this in mind for production.
If you’re running in production you care a lot more about keeping your data, and that’s why the default is to use fsync.
But if you’re running tests you don’t care about keeping your data, you care about speed.
Best practice in verification and validation is to ensure what you tested is what’s actually running. Turning a component on might create extra interactions in the system that lead to a failure that wouldn’t happen during testing while it’s off. With it off, you can still test most of the code and do it faster. I think the ideal thing here is to do majority of your tests with it off followed by a batch of tests across same components with it on.
That way you know you’ve tested the actual, running configuration. As I think on it, this pattern of risk and optimal strategy for testing appears to be same as mock-up vs real-system testing. The app with fsync off is a highly-accurate mock-up of the real thing. Just takes one change to become real thing.
@itamarst is suggesting to keep the test database in RAM, so I don’t think keeping it is a priority.
That said, definitely worth being aware of the risks, in case someone does try to push that to production :)
Skipping fsync just means you are going to lose data.
The context is for running tests. No one is suggesting turning it off in production.