I want to hear the dumbest, silliest, most “ahahaha what was I/they thinking” slowdowns you’ve had to fix.
Here’s mine: we had a test suite that took 30 minutes to finish on every build. Okay, split it across multiple machines, right? With two machines it dropped to 20 minutes each, then with four machines it dropped to… 19 each. Eight machines, 18. So something weird was going on.
After some looking at logs and reading the setup, I found the problem. The test runner was using Ruby to seed with two million sample organizations. Then it generated ten departments for for each organization, and ten employees for each department, before finally dropping the whole database and starting the unit tests. The whole thing wasn’t even used!
Removing that line brought our test runs to a more manageable level.
It’s more of a you-had-to-be-there thing but off the top of my head, the time my boss turned off memcachd and the system got faster was pretty funny.
I am pretty confident I am going to do that this week as well. There is this Redis cache with a single huge JSON key to avoid a straightforward indexed inner join. I believe that decoding that JSON costs more in terms of latency than the join would.
Sometimes people with only hammers see every problem as a nail. This is especially bad in web app development.
In this particular case, if I really wanted to avoid that join, I would have used an in-process cache. Sure, with multiple instances and a load balancer I would keep the cache multiple times, but it’s not that large and since I need to unpack it from JSON every time anyway, the memory usage would definitely be lower.
But web developers are sometimes stuck in the PHP mindset of not being able to keep any data outside the request handler and if they need some, they immediately grab an out-of-process storage. Which is weird when they write the application as a persistent Python process.
Sorry for the slightly OT “rant”. :-)
Hm, I can think of a couple.
One was a test of a regex engine. I wanted to test it against “big” inputs but I messed up and the automatically generated test inputs ended up being gigabytes in size instead of a couple of hundred KB.
There was a database issue where one of the columns had been set to Latin-1 encoding (or something, I don’t remember exactly) whereas everything else was UTF-8. The DB engine was falling into a state where it did an entire table scan, transcoding each row to compare to the key.
I also worked on a product where one developer insisted on a certain feature: logging events for replay later to rebuild the database if something happened. This was not a good fit for the product (a high speed network monitor) but they kept insisting and eventually threatened to quit if we didn’t give this idea a try and some office politics came into play so…every new TCP connection of interest, every alert detected by our system, every whatever, did a transactional insert of a JSON blob to a database table. The (large) JSON blob stored all the information about the event.
Nothing ever read from that table. Ever. And the table rapidly became gigs and gigs in size, causing our automated DB cleanup to constantly run and this stupid table had been prioritized over customer-visible tables, so customers’ old data was being deleted much sooner than it otherwise would be to make room for this damn table. Performance tanked, etc. We had to add a flag to disable that subsystem that we turned on by default, but politics prevented us from deleting the subsystem entirely.
The developer decided that our complaining was because the rest of the system couldn’t keep up with their awesome idea and finally ended up quitting in a huff a few months later.
Nuking all that code the day after they left was cathartic.
Sounds like that developer heard “event sourcing” and didn’t really think about the size of the events they were storing.
It was event sourcing and they also were not concerned with the frequency of these events, the latency introduced by logging them, the size of the events, or the extra database load.
Wow. I’m pretty nuts about logging the shit out of everything in my applications but I always take performance and data storage into account. No application I write goes live without configured log rotation and retention policies. Insane amounts of logging is much more sustainable if it’s a constant size relative to load.
Not one of mine, but I heard this from some Apple folks:
They got a new version of an Apple CPU that should have been faster across the board but found it made one key framework a lot slower. It turned out that the old CPU was mispredicting a branch, which caused a pipeline bubble that cost a dozen or so cycles. During this incorrect speculation, it was loading some data that would be needed a thousand or so cycles later. Once the branch predictor was improved, the incorrect speculation stopped happening and they then stalled for a couple of hundred cycles on the load. Once they understood the cause, it was easy to add a prefetch in the right place.
In hindsight, this should have been when I realised that Spectre vulnerabilities were possible.
A common issue when trying to optimise systems with built-in prefetching: you find an expensive operation which dies loads of prefetching, you optimise out the prefetching, operation gets slower.
Turns out the prefetching did a great job, the data was needed a bit later but now instead of being bulk prefetched it’s loaded on-demand, and the number of queries has increased by 3 orders of magnitude.
I’ve wasted a lot of time on profiler-guided deoptimisations.
We’d written our own code to read from a socket in Java. Someone who was most definitely probably maybe not me had hardcoded for no good reason 128-byte reads from a socket regularly going to have several megabytes read. We didn’t catch it for several releases until someone else who was definitely not me introduced a bug that cause that socket to be read once instead of until it closed or we hit the number of bytes expected. We celebrated a significant performance increase in the next release, having increased to something like 128-kilobyte reads if not megabyte reads.
Not necessarily an application performance but a team/infra performance increase. Some of the details are foggy, but I worked at a company that wanted to add search to a long list of items in a separate PostgreSQL database. The main database was approximately 1.5T in size. The design the team came up with was this new service that would slurp up all the data into an Elasticsearch cluster and then build specific searches around templates. Seems reasonable, and often what is reached for right? Sure, until our database guy chimed in one day and said, “You know… instead of spending $20k on this application’s infrastructure, you could just use the full-text search capabilities in PostgreSQL.”
Thankfully the team listened and not only did we save a bunch of cash, development time went from 3 months to build the service to a month or so to get the frontend assets right. It was awesome to see. PostgreSQL is great.
When people say “10 * time programmers don’t exist” I think of cases like these. They absolutely do exist: they are the people who take a step back, think about another solution, and avoid building the whole system altogether.
(And yeah, postgres is great)
I’ve started writing a blog on this exact topic, because I’ve been complaining about it for 20 years: tech companies focus on hiring problem solvers, they don’t focus on hiring people who can identify the right problems to solve.
would be good to read! I am of opinion that the hiring practices are even less relevant than the above.
I think the companies are hiring ‘Jeopardy contestants’ – not book/novel writers. Both types deal with ‘words’ – but at a different level of composition, obviously.
Certainly there are different tasks needed a varying spectrum of the composition skills – but testing for a Jeopardy-like skills, does not reveal much of the composition skills.
However, these are the interviewing processes of today.
10x programmer isn’t someone who write 10x more code. It is someone who is willing to spend their time on finding solution that will suffice and will take 10x less time to implement.
I think my favorite is the fact that Tailscale has to process each incoming WireGuard packet in its own goroutine for performance reasons. Many hours have been spilled trying to work around this, but realistically it’s too fast in practice to not do that. Limiting the number of goroutines for processing incoming packets causes significant performance regressions and it’s overall just kinda funny.
Wait, really? This goes against almost all conventional wisdom! The fact that the tailscale team has so many heavy hitters means, I trust this… but… uhhhh, woah.
It makes sense when you consider that goroutines don’t map 1:1 to threads. Sure it would be bad to have lots of big heavyweight goroutines, but if you’re just using a little stack space for each one then they are quite cheap.
The surprising part isn’t how cheap goroutines are. It’s that despite the complete lack of order at which they run / packets get processed, reassembly of the stream, is cheaper than an initial send+recieve on an idle goroutine.
the best part is that the OS will reassemble the stream for you most of the time!
Now I’m confused! How does the OS reassemble the stream for you? I recv 1,000 UDP packets from the Internet, the OS doesn’t know what order I should receive them in, because UDP Packets do not have any sequence number built in. Additionally, all the data should be encrypted, because WireGuard, and unless you’re using the OS’s WireGuard implementation (which makes this even more funny), I don’t see how the OS could do this.
Once upon a 2015 I was working on making Terraform’s graph code more debuggable, and I thought it would be nice for the vertices to always be visited in the same order. It didn’t functionally matter for any graph operations, but it made examining debug logs a bit easier.
So I dropped what I thought was an innocent sort operation into the graph walk.
That code sat there for over two years (v0.7.7–v0.10.8) before one of my colleagues decided to audit performance on nontrivial Terraform configs and discovered a ridiculous amount of time spent diligently sorting vertices for an operation that had zero user benefit.
I shudder to think about the sum total of time and energy cumulatively wasted by that single line of code. It’s burned into my memory as a reminder to always be cognizant of hot paths!
One time I was making sure our testing framework (custom written) was up to date with with all dependencies and it ended up being slower. It took a week to realize there was a breaking change in one API we used. Even worse—I made the breaking change in the API.
Me: Ugh, what idiot broke this code.
$ git blame
Me: Oh. Oh no.
Postmortems are blameless, but git is not 😒
I was building a web app and eventually tinkered with the “loading” state of the frontend (a spinner was displayed on an overlay). But my local machine was pretty responsive, the spinner just flashed for a fraction of a second. When I decided to add sleep(3) to the backend code to actually test a meaningful loading state.
Yeah…
… and I left that sleep in the prod release :D So it was a 1000x speed-up when I removed it and the calls finished in 3ms :D
Fresh from the press: opening emoji picker on nixos takes 500ms for me. It seems that that’s due to accidentally quadratic PATH manipulation in a shell script?
Details: https://discourse.nixos.org/t/plasma-emojier-is-very-slow/27160
That’s nothing. It takes about five seconds on MacOS where, every time, I try various combinations of four different modifier keys and space until I get the right one.
Even after hitting the right one, it takes around 3 second for the tooltip to show up the first time you invoke it (it seems to be cached afterwards, not sure with which eviction policy).
Also, on the Google keyboard on Android, tapping the “Search emoji” text box the first time consistently displays a “Data for emoji search not available” error. Only a couple seconds later I can retry and actually search.
Once you find the fix you should send it to Microsoft, Windows 10’s emoji picker is also super slow! ;)
Years ago, there was a script set up as a cron job that needed to run semi-frequently and did a kind of expensive task.
And someone forgot to write the script to create/check for a lockfile to prevent a new instance from starting up while an old instance was still running (which was a possibility).
Have you ever seen a machine report a load average of 700? I did, that day.
I’ve seen it in the 1000s when its hard NFS mount failed, fun times…
I have seen this happen twice in the past year. Or a variation, at least.
Scheduled email sender sometimes took too long and a second (third, fourth…) was started while the first one was still running. Sadly the code did not SELECT FOR UPDATE, with obvious results. Recipients were definitely not happy.
In a similar vein, committing after sending a batch where one piece of the batch would always fail, with automatic restarts enabled, leads to this funny situation where a couple of recipients are super unhappy.
Funniest thing about those situations is that they are usually time bombs that explode long after the deploy.
I worked at a backup provider in a previous job, and the dev team pushed a beta build to the beta hardware platform that hosted the company’s backups. They were trying a new on-disk metadata format based on a few RocksDB databases instead of a single h2 instance. We (the ops team) were testing a hardware configuration that did softraid instead of using the hated, finicky hardware RAID. The platform change meant that the drives no longer lied about fsync. The real fsyncs happening over a few thousand instances (a few hundred users * a few DBs) on a single filesystem made the system load shoot up over 10k, which was a fun afternoon of disbelief and debugging.
This was in early-stage-startup land, so take that as part of the context.
The time that comes to mind for me was when my CoFounder (CEO), who has a CS degree and knows just enough to be dangerous, decided to add a one line drop shadow CSS animation before a product demo to “zest” it up.
The demo bombed, his laptop was basically turned into a space heater in front of the people he was demoing to, and locked up mid-demo.
He later came to me and asked what I had done to cause such a massive performance reduction in the system. I suspected some video encoding tweaks I had made on the C++/native layer, but couldn’t fathom how they could have caused such a massive intermittent spike in CPU, and spent hours digging on them.
Later through bisecting another engineer realized this CSS drop shadow animation in the UI layer was eating up 300% CPU all on its own, and tracked it to this single change from the CEO, which was buried among a hundred other trivial changes.
I recently swapped DBs in a project from SQLite to PostgreSQL and all of my N+1 queries suddenly became relevant.
It was fun though to see the number of queries in one of the pages drop from 4,000 to 8 or so.
I once had a job where^[1] for reasons of politics, and not wanting to do anything they put “speed-up loops” in the code. That way, when a manager came around and said, “customers are complaining our app is slow! Speed it up!”, our genius engineers could remove a million iterations from the speed-up loop, and we’d have a huge factor performance speed up!
^[1]: I didn’t, but I like this story.
I had something close to a real-world speed-up loop. An app I worked on had code like:
which years later was completely irrelevant, and an easy perf boost.
I used GTK3 for a GUI with more than a handful of elements. Accidentally quadratic and #wontfix. In their defense, GTK4 introduced new APIs that perform better. Either way, just use Qt if you can.
We have these little edge gateway things. Late one Friday evening, one of them started using an order of magnitude more CPU than all its peers. We tried a few things off and on over a week but we just couldn’t figure out what had changed. The software was the same, the kernel was the same, ambient temperature was the same, and the workload was the same. The gateway just happened to decide all of the sudden that life was hard. This box was deployed far away so we couldn’t just swap it out.
The following Friday afternoon, I went spelunking through metrics and noticed that the CPU frequency had throttled down from ~2 GHz to 200 MHz and gotten stuck there.
We were using 10x more CPU because the gateway all of the sudden had 10x less CPU to go around.
Here’s one that is probably affecting several percent of you right now:
In https://pyfound.blogspot.com/2023/04/the-eus-proposed-cra-law-may-have.html the Python Foundation says that “a version of Python is downloaded over 300 million times per day” and “10 billion packages in an average month”.
This is, of course, ridiculous. It has to be the product of a thoughtless automation. It also means that there are supply-chain attacks available. And it won’t be just a Python problem: Node and NPM, Ruby and gems, whatever: all of them likely have a few tens of thousands of users causing stupid amounts of network traffic.
Go install a local repository. Ask it to check for updates every 12 or 24 hours, instead of mindlessly downloading on every request. Keep a few old versions around in case you suddenly need to revert; for extra points, keep a separate ‘stable’ repository where you only have the last version that passed all of your tests, and make that the way you deploy to production.
You are right of course, but the instructions for a local mirror are usually so convoluted, not prominent, or plainly complicated that no one bothers.
Every single company has usually been “either we already have it in place or for a new ecosystem, we’ll fix it later”. If it was just as easy to get it started as not using it.. more people would do it.
11 and half years ago: https://stackoverflow.com/questions/7575627/can-you-host-a-private-repository-for-your-organization-to-use-with-npm (Yes, several methods given)
12 years ago: https://stackoverflow.com/questions/5677433/private-ruby-gem-server-with-authentication (Yes, use geminabox or artifactory)
14 years ago: https://stackoverflow.com/questions/77695/how-do-i-set-up-a-local-cpan-mirror (use CPAN::Mini)
10 years ago: https://stackoverflow.com/questions/15556147/pypi-is-slow-how-do-i-run-my-own-server (several options, devpi looks good to me)
Assuming you have enough disk space and bandwidth – and by definition, you do, somewhere – none of these appear to be more than a couple of hours to implement and one email message to dev-all telling them what to change.
Two common classes of database query you might do in Python are “pull one row into a tuple” and “pull a column of IDs or such as a list”.
Our DB utility library handled these two situations with 1) a function that would accumulate all the values from all the rows in a query into one big tuple (so that one function could handle a single-row or single-column query), and 2) a wrapper to call the tuple-returning function and convert its result to a list.
In retrospect it’d’ve made more sense for those two use cases to be handled with totally independent functions, and the row one enforces that the query returns exactly one row and the column one enforces that it returns one column. But ten years ago I was–uh, we were capricious and foolish.
Unfortunately, adding lots of values to a tuple one-by-one is O(n^2). Retrieving lots of values through this code still completed quickly enough that it took surprisingly long to notice the problem–it might add a few seconds pulling a million IDs, and often those retrievals were the type of situation where a legitimate few-second runtime was plausible.
When we did fix it, it was a very small diff.
Not sure I can think of any, tbh. Most of my noteworthy bugs involve things breaking horrifically, not performance regressions.
[Comment removed by author]
Using unbufferred IO when sequentially parsing / serializing a big file line-by-line. This is the most funniest performance regression I’ve seen, since it is encountered very frequently.
A few years ago I traced Apache Cassandra while creating tables and discovered that it issues thousands of 1-byte writes. Someone forgot their
BufferedWriter
!i was looking at the codebase of an internal network analysis tool (2014ish), and realized that the tool was just shelling out to some linux utility. well, it turns out this utility had a
-z
flag, which was responsible for enabling quick packet processing. i turned the flag on and boop - ~200% performance gain in every case.another one - i was responsible for a very complex ZFS pool that served the entire metric database that the company used for… basically everything. every setting and parameter was finely tuned - despite that it was having severe performance problems, and it was crashing nearly every day because ZFS couldn’t handle the load.
i flipped the system to XFS, and performance (& reliability) dramatically improved - it stopped crashing completely.
the zfs one makes me wonder why people are inclined to implement such complicated stuff preemptively. x__x i never wanna read what a ZFS ZIL SLOG is ever again
I noticed that the web site I maintain seemed to be pretty slow returning pages - taking over 1 second for really simple pages that required almost nothing from the database. I tracked it down and realised the cache wasn’t working at all due to a misconfiguration. This meant that the static files pre-processing (LESS or SCSS compiled to CSS) which would normally have happened about once per deploy and then be cached forever was instead happening on every single request. The fix was extremely simply, and got pages back to about 0.2s instead of 1.2s.
The really embarrassing thing was it had been broken for about a year and I somehow hadn’t noticed - I think I had just chalked it up to slow Internet when I did spot checks, and personally wasn’t using the site enough to notice the issue.
Was this a Rails site by any chance?
No Django, using django-compressor for static file bundling
Ah, sounded similar to the Rails asset pipeline. A one line change in an environment specific file would do the same thing in Rails.
There was this application the translators at the company would use to interact with the database that stored all the translations for every piece of copy in all the applications in the company. We heard it had been built a while ago under a strict deadline by a sort of superteam of “the best” people at the company, and it bounced around many teams before ending up in our hands.
The application had many problems, the most aggravating one being that any change to a translation would take tens of seconds to save, and about half of the time it would time out.
I take a look at the db code and I think it was inserting or updating every key individually in a loop that went like
for t in translations { repository.add(t) }
, and it’s adding the key to multiple tables because… that’s the way the application is structured, apparently? Anyway could probably use a batch insert but that alone can’t be why it’s running so slowly, right? After all it takes several tens of seconds even when just updating a single translation.Next I take a look at the frontend code; there’s a page which is loaded with all the translations for one or more applications in one or more languages, and it turns out that instead of only updating the translations that changed, it’s always updating all the translations on the page, so even for a single change it’ll run that loop a thousand times multiplied by 3 or something. The fix is really easy because the frontend already tracked which keys had changed and the more reasonable behavior was already implemented but commented out because of another bug. Fix that, test it, nope it’s still slow.
I dive into the backend again and this time I actually step into a debugger, and I notice that it’s spending most of the time in a piece of code that runs after writing to the db: it’s sending some simple POST requests with two parameters to a list of URLS that seemed to point to some internal caching services, and the reason why it was taking so long is that most of those services were not running anymore and the request timeout was a couple seconds long. The fix was literally running a DELETE on the table that held the list of those URLs (and making the rest of the requests run in a background thread).
One day, after bringing up enough of a new Python runtime to run a webserver, we saw abysmal performance. A quick bit of profiling showed that string processing was a huge bottleneck. Not super unexpected, but this looked especially bad. We did some investigation and found this commit to be at fault, authored by yours truly. Turns out
str.rpartition
is used a lot to look at file extensions and parse HTTP requests (IIRC) and my one-liner from early development was murdering performance:Rewriting it to not be… that… brought performance to somewhere near normal.
In one case, tests for a small Python backend took way too long to run for what it did. Some debugging led me to tests using Minio for testing interactions using S3 as the main hog, where those tests took just over a minute each. It turned out to be due to an incompatibility between botocore and Minio, which meant Minio didn’t handle the
Expects
request header right, making it wait for a while for the rest of the request, which never was received (details). The fix was effectively a one-liner and sped up the tests from taking 10+ minutes to instead only taking 10 seconds.Another interesting case is also Pylint gets much slower if you enable concurrency: https://github.com/pylint-dev/pylint/issues/2525.
A new shiny survey system was implemented in-house. It featured a nice editor done using Angular. One of the first users called in and said that it wasn’t working for him. The developer opened the same survey in hist machine and said “it works for me”. I asked for a screenshot of the error the user got and was greeted with a hard crash of Chrome. I asked the developer to inspect the dom elements and memory usage of the page… the poor user had only 2GB of RAM :P
A deploy was made into production and contained a migration that created an index during startup. The app started crashing when the new pod was starting, it locked everything with the index creation.
Many years ago we were developing a thing for a customer and in the contract were certain performance criteria (expressed as benchmark numbers, let’s say “must satisfy X requests per second”).
We’d been deploying on EC2 with more of a pet than cattle setup (it’s been a long time ago) and something with our nodes seemed off so we petitioned to replace them (no biggie, we had good deployment) but the customer head tech honcho was against it.
Anyway, at some point they noticed that the thing was not performing at all and their CEO was loudly complaining what bullshit the team had produced, etc. pp. but we were confident we should be fast enough, but they still wouldn’t let us replace the EC2 instances.
I then procured a private old laptop of mine, some consumer grade i5 with ~8GB memory, and installed the whole stack in a virtualized repro of the production setup. 2x varnish, 2x nginx + php backend, postgres, redis, solr, everything. (it barely fit into ram). And then I did the performance test and we were SO much better than what was agreed on. They let us redo the EC2 setup and suddenly it worked.
Sometimes it’s not the visible part of the software.
Not a regression since it was basically greenfield code, but the first protobuffs pass on Riak Time Series didn’t have a float64 field, just an ascii-encoded numeric field. I begged and begged for feedback from existing protobuffs experts since it was my first work on that, but didn’t hear anything until after it was merged in to
develop
with field numbers assigned. We did fix it before it made it to customer-world, but still.One company I worked for had a system with a read-only database replica. One of my co-workers noticed that the primary read/write db was very lightly loaded so he and I ripped out the replication code. Net result: the speed was the same and the system as a whole was simpler and easier to run. It turns out that the read-only replica was specified by an architect because “you always need one.”
For about a year or two I had constant 12% CPU usage on my Macbook. The fans were always on, battery life was atrocious, etc. iStat Menus and Activity Monitor both told me the culprit was WindowServer, so I went on a months-long quest to figure out what the problem might be. I tried quitting some “obvious” programs, investigating driver issues, and even reducing animations in MacOS to no avail. Finally, some StackOverflow posts showed me that iStat Menus and Activity Monitor were masking the problem – their usage created the high WindowServer usage itself, and my actual problem lied elsewhere.
Using
top
I saw thattaskgated
was using a weirdly high amount of CPU, so I looked inConsole.app
and saw that it was spewing logs about…bash
, about once a second. It slowly dawned on me – I used atmux
overlay calledbyobu
, which updates some widgets once a second. I had turned off those widgets, but the clock was still updating once a second. I guess that aggressive update schedule mixed poorly with unsigned Nix and Pyenv binaries, causingtaskgated
to run constantly. Quittingbyobu
and running regulartmux
immediately solved the problem, silencing the fans and adding about an hour an a half to my battery life.I can’t imagine the amount of energy I wasted just to update a clock.
Way back, I used a Java SWT app on my PowerBook. Over the course of an hour, it would gradually creep up to 100% CPU usage. It turned out that it had a callback to update UI things once per second. The SWT API exposed a one-shot event, but accidentally registered a recurring one with the underlying CFRunLoop. So, every second, one more timer event was added to the run loop. Gradually, the time spent handling these crept up (they never did anything, the handler on the Java side was gone, they just checked that there was nothing to do and exited).
In the same era, I would connect that machine to a 1 GigE university network for browsing. If I clicked on a large file to download, the machine became unusable for a few minutes. It turned out that Safari’s downloads window was updating based on the amount downloaded, not the time elapsed. For typical MODEMs, updating every 1 KiB was fine. For home broadband (1 Mb/s was fast then), it was a bit too aggressive, but fine. For a system that could happily do 300 MB/s downloads, it was flooding the system with display update events and didn’t ever recover until the download had finished and the backlog had cleared.
Specifically on the clock, XNU has some nice extensions for kqueue for this. They let you specify the slack that you are happy to accept. For a user-visible clock, you probably want a notification every second, but users are unlikely to notice 50-100ms jitter. This lets the scheduler coalesce timer events so that it can wake up once per second, do a bunch of things, and then sleep for 900ms, which is pretty good for power. I think you can do a first notification with a 1 second acceptable error to line up with other things that wake up periodically and then schedule the recurring one then, so you’ll wake up the same time as the system clock for your second ticks.
Our main app runs multiple runs of our bike tool (webpack) on the same code to remove dead code in conditional branches on each set of the run. This was getting very slow, and the dev made the choice to run those over a process pool. That didn’t work much, and worse, it caused every other build running on the shared infrastructure was impacted.
Two problems:
By default most pooling libraries default to using up n(cpu), which on a shared container based runtime was retuning the host cpu count rather than what this container was supposed to be restricted to.
Each process reading and compiling JS code was wasteful
Working to avoid that need to pool by running webpack just once eliminated both these at once, giving about 90% improvement overall. Sometimes not threading/pooling helps.