One thing that this article doesn’t really get into (it alludes to it in its note at the end about v2) is that these are all massive rewrites of already successful systems, in which real-world experience guides the new architecture.
You could try to write the most aggressively optimized version 1 you can make, but even then there will be unforeseen performance bottlenecks that may require a rewrite. There will also be parts that you have spent ages optimizing which will be cut completely when the business “pivots” or simply due to insights of actual users not using these parts. But if you do so, you may also paint yourself into a corner and not even get to the “success” phase. Like Brooks said, “write one to throw away, you will anyway”.
Of course, none of that is a free pass to write idiotically stupid and slow code…
Yeah, I think a reasonable approach is to start with a performance budget and then work to the budget while also avoiding stupid performance hits that aren’t trading off against something else. For a video game, the budget is very strict, 60FPS. For a website, the budget is a comparatively bloated 200ms.
For a video game, the budget is very strict, 60FPS
Ironically, just earlier today I saw this pop up when browsing the general Reddit front page. And a couple months ago there were similar reactions to the recommended system specs for Kerbal Space Program 2. And on and on back through time to things like the infamous “but can it run Crysis” meme back in the day.
But every time these threads come up I see people pushing the notion that game dev is somehow the domain of people who uniquely Care About Performance™ and treat every cycle and every byte as a sacred irreplaceable resource never to be wasted… which just does not hold up at all in the face of observed reality. Even the console game developers routinely ship stuff that’s just bad and broken and requires launch-day patching, and though their hardware upgrade treadmill is slower it’s still very much there.
For a website, the budget is a comparatively bloated 200ms.
For the teams and services I work with at the day job, p99 response time of 100ms is the usual performance budget, and with a little common sense and caching generally the reality is closer to 10ms or less (in fact I’m annoyed right now with one of my services because a change in requirements took away the ability to cache as efficiently and now it’s inched up closer to 20ms).
But the last time one of these threads came up someone else on this site just dismissed me as not being able to provide useful insight on performance due to my association with “webshit”.
But every time these threads come up I see people pushing the notion that game dev is somehow the domain of people who uniquely Care About Performance™ and treat every cycle and every byte as a sacred irreplaceable resource never to be wasted…
As a professional Unreal Engine game developer I can testify that indeed game dev does not actually care that much about squeezing every cycle out of a CPU. Unreal’s editor is incredibly pessimized performance-wise.
I’d take that even further and even make the bold claim that unless you’re hiring already experienced engineers, your version 1 is always going to suck performance-wise, simply because you don’t yet know the problem domain and do not know how to write the thing in question most optimally.
This. Maybe where it’s so easy to go wrong is in ignoring the performance budget while exploring the problem? That way you still don’t know know how to write something to perform quickly in your version 2, and have to spend your version 3 rethinking a lot of your assumptions.
While I’m generally on the perf-is-good side of this argument, I don’t really find these examples compelling. The smallest examples cited have over a hundred million users. At that kind of scale, a 1% performance improvement translates to huge real-world savings (especially for server-side things where the provider is paying directly), let alone the 50%+ things that it’s discussing. Generalising from these examples is not so obvious. If I am running a desktop app that consumes 30% of my CPU under load and a later version optimises it to consume 15%, that’s a similar saving to the first big win at Facebook that the article mentions. Whereas the Facebook optimisation saves millions of dollars in hardware purchases for Facebook, this saving makes no difference to me that I will notice (I might be spending very slightly more money on power, but my CPU is probably still in a low-power state either way).
This ignores the opportunity cost. Every month a developer spends optimising is a month spent not doing something else. If you write your app in a GC’d language and pay a 30% perf hit (totally made-up number) but are able to ship two compelling features in the time that a competitor ships one, then guess who will win in the market.
This ignores the opportunity cost. Every month a developer spends optimising is a month spent not doing something else. If you write your app in a GC’d language and pay a 30% perf hit (totally made-up number) but are able to ship two compelling features in the time that a competitor ships one, then guess who will win in the market.
I really wish we could do something about this messed-up incentive to screw over our users and potential users. When I lived in the Seattle area, I became friends with a poor person who had (maybe still has?) a PowerPC iMac and a mid-2000s PC in storage, and no current desktop or laptop (only a phone). Ideally, she should still be able to do the same things with those computers now that she did when they were new. But our industry’s upgrade treadmill left her behind. (Admittedly, the company I was working for at the time was and still is part of the problem.)
I guess I need to put my money where my mouth is and translate my current Electron app to Rust, on my own time.
When I lived in the Seattle area, I became friends with a poor person who had (maybe still has?) a PowerPC iMac and a mid-2000s PC in storage, and no current desktop or laptop (only a phone). Ideally, she should still be able to do the same things with those computers now that she did when they were new.
Those computers can do the same things they did when they were new. They can run the same office suites and the same web browsers and render the same web pages and play the same games. And since they likely have actual spinning-platter hard drives rather than SSDs, their storage probably hasn’t degraded over time either.
What they can’t do is run a bunch of things that were developed in the intervening decades. And that should be expected – we don’t stand still, and as faster hardware is developed we find ways to take advantage of it.
To see the point, consider a feature of the phone I carry in my pocket. I shop sometimes at a local Japanese market, and a lot of the products on their shelves do not have English labels. And I don’t speak Japanese. But I can point my phone’s camera at some packaging and it will do an OCR of the Japanese text, translate it to English, and overlay the translation onto the image in, if not real time, then close enough as makes no difference. There is simply no amount of caring about performance that could backport that experience to the very first smartphone I ever owned – its hardware would never be capable of doing this. Even trying to take a still photo and upload it to a server for the processing would be problematic due to both the much worse camera (and lack of onboard hardware-and-software-assisted image stabilization and sharpening and so on) and the low-speed cellular connection it had.
And there are tons and tons of examples of things like this in daily life. Modern tech isn’t just bloated bloatware bloating its bloaty bloat for the sake of bloat; there are tons of things we not only do but take for granted today on devices whose size/form factor would not have been possible 15-20 years ago. And those things are useful. But there’s no way to backport them to sufficiently-old hardware, and it would not be good to try to stand athwart that progress and yell “Stop!” in order to ensure that hardware never is perceived as obsolete.
It’s easy to see that kind of progress on our smartphones. But on my PC, at first glance, it feels like I’m doing mostly the same tasks that I did on the laptop I bought for college in 1999 – browse the web, do email, listen to music, write some documents, and program. Back then, I did all those things on a laptop with a 366 MHz single-core processor and 64 MB of RAM. To avoid burying the lead, I’ve concluded that that feeling is misguided, though I don’t know how much processing power and RAM we’d actually need given ideally efficient software.
The web certainly demands more of our computers than it did back then. For content websites, this is IMO a definite step backwards. One definite advancement has been the addition of streaming video; I’m sure my 1999 laptop wouldn’t have been able to handle that. Another definite advancement is the multi-process model of modern browsers, which enables stronger sandboxing. (For that matter, my 1999 laptop was never upgraded beyond Windows Me, though it also ran various Linux distros over its lifetime.) The strong security of modern browsers also benefits email clients, assuming we count the ability to render HTML email as a necessity, which I do.
Playing a local music library has probably changed the least since 1999. In fact, I do so using foobar2000, the most lean-and-mean Win32 application I routinely use. Such a lightweight player could also support streaming music services, if any of them were open to it. Instead, AFAIK all the streaming services use web apps, or web apps in native wrappers (e.g. Electron or CEF), for their desktop clients. But then, while that might be a bad thing for resource usage, it can be a good thing for accessibility; Spotify’s original highly custom UI, for example, was inaccessible with screen readers. But of course, there were accessible native desktop apps before Electron, so we should be able to have accessibility without the extreme bloat.
Microsoft Office was already getting bloated in 1999. I remember one of my classmates in 1997 complaining about how sluggish Office 97 was on his computer. But again, to some extent, one person’s bloat might have been another person’s accessibility. By 1999, at least one Windows screen reader was using the Office COM object models, which were originally implemented for VBA, to provide access to those applications. I don’t see how the additional bloat of the intervening years has made things better, though.
I certainly appreciate the power of the Rust compiler, and other modern optimizing compilers that we didn’t have in 1999. I’m tempted to give that a pass because programming is a niche activity, but part of me still believes, as Guido van Rossum did (does?), that computer programming should be for everybody. And even on modern computers, Rust is sometimes criticized for its compilation time and resource requirements. Personally, I’m willing to accept that tradeoff to get both developer productivity and efficient generated code. Luckily there are other language/compiler/runtime designs that have different tradeoffs, and some of them, e.g. a CPython or Lua-style bytecode interpreter, could even still be as usable on that 1999 laptop as they were back then.
One task that’s definitely new since then is remote meetings, e.g. Zoom. The useful life of my old laptop coincided with the bad old days of VoIP (and by the end, I had upgraded the RAM, primarily because of a resource-hungry Java app I was working on). By the time Skype landed in late 2003, I had moved onto a more powerful computer, so I don’t know if it would have run on the previous one. I suspect not. And video calling? Forget it.
One thing that has definitely changed for me since then is that I rely much more on a screen reader now (I have low vision). One of the Windows screen readers I use, NVDA, is written primarily in Python, and this enables a very flexible add-on system. the NVDA project didn’t start until 2006, and it never ran on anything older than Windows XP, so it’s safe to say that NVDA wouldn’t have run well, if at all, on my old laptop, possibly even after the RAM upgrade (to the odd quantity of 192 MB). The other major third-party Windows screen reader (JAWS) started in the 90s, and it’s bifurcated between a C++ core and a scripting layer on top. Perhaps as a result of that separation, its scripting layer isn’t as powerful as NVDA’s add-on system.
So, where does that leave us? A single core at 366 MHz and 64 MB of RAM is clearly not enough for a modern PC. But do we really need a minimum of 8 GB of RAM and whatever amount of minimum processing power is now practically required?
About 10 years ago, a novel called Off to Be the Wizard had this exchange between two time-travelers:
Phillip (from 1984): What on earth can a person do with four gigabytes of RAM?
Martin (from 2012): Upgrade it immediately.
I’m not sure if the author meant that to be funny or sad, though the former would be more in keeping with the tone of the book. But I’m still inclined to interpret it as a sad commentary.
There are also plenty of things my laptop computer can do today that a laptop of 15-20 years ago couldn’t, and plenty of software taking advantage of those capabilities.
But you seem to have decided that it’s all “bloat” and don’t particularly seem open to being persuaded otherwise, so I won’t bother trying.
I think my response was more complicated than that, though it certainly ended negatively. I realize that to some extent, increased hardware requirements are an inevitable result of real progress. But I still wonder how much better we could do if we didn’t so heavily prioritize developer convenience, and racing to implement more features, above runtime efficiency. I’m sure there’s no going back to 64 MB of RAM for a general-purpose computer, but maybe we don’t need 4 GB or higher as a minimum. I don’t know though; I’m open to being persuaded that I’m wrong.
Every month a developer spends optimising is a month spent not doing something else.
Casey’s whole point in the “clean code, horrible performance” thing is that this is not how it works. You don’t write garbage code and then spend a month optimizing it. You just keep performance in mind and don’t write stupid code in the first place. There are some common practices (OOP abuse, for example) which don’t improve code quality but cost a lot in terms of performance. Instead of choosing those approaches, just write the code in a way that’s as good (maintainable, readable, etc) but without the performance pitfalls.
Maybe you want to spend a 30% performance hit for the productivity improvements of a GC, but maybe you don’t want to spend a 1000% performance hit for the dubious productivity improvements of writing your compute-heavy application in Python, for example, or a 500% performance hit for the “benefit” of modelling your data as a class hierarchy with virtual methods Clean Code style rather than plain structs or sum types.
I’m kindof noticing a trend with Casey tending to overstate how critical this is while ignoring context; I remember a kerfuffle a while back with some perf issue with the windows terminal app, where he showed how a “simple” rendering optimization could get some huge speedup (took him an afternoon), and he thought it was ridiculous how slow the terminal was going. I’ll grant the microsoft devs overstated how hard it would be, but without the optimization, the terminal was still scrolling way way faster than you could read. If that’s not fast enough, you should probably just redirect the output to a file and skip the need to render it entirely; tuning that kind of thing is a waste of an afternoon.
Good performance engineering requires an awareness of what performance “tier” you’re targeting; being able to show gigabytes/second of of text to a human is completely pointless, as is worrying about a single small memory allocation right before an HTTP request – even if in some other situation heap allocation is something to pay attention to. Context.
I haven’t had problems with the rendering speed of the Windows Terminal for a while, but when it launched i had quite a few experiences where I ran a command that produced a load of output and then had to wait 30 seconds or more for the terminal to catch up (in one case I gave up, killed the terminal and restarted it). That was a real productivity hit and not something I’d experienced with any other terminal. I quite often run commands that spit out a load of diagnostic stuff that I normally ignore but want to be able to search back through if something goes wrong. Before joining Microsoft, I typically used the Apple Terminal (which still has a few features I miss on Windows) and I had never had that problem since my first Mac (G4 PowerBook with 1 GiB of RAM).
If that’s not fast enough, you should probably just redirect the output to a file and skip the need to render it entirely; tuning that kind of thing is a waste of an afternoon.
This is something he goes into in his lectures about refterm - being able to dump text into the terminal quickly is actually kind of a good thing to have, y’know.
I’m working on a compiler. Given that the it’s still in its early stages and can’t compile most of your code it throws a lot of errors - to me it’s important I can scroll back up and read them. It’s also important to me that I can read its IR if I need to, and there’s a lot of it. Meanwhile printing the IR out in Windows Terminal takes like four seconds of extra runtime. Compare that to any Linux terminal emulator, where dumping it all out takes almost no time.
Redirecting the output to a file is not a very good solution because you lose color coding, which aids readability. So how about just let me read them in my darn terminal without having to wait unreasonably long. That’s what it’s made for, after all - presenting the output of programs you run.
Redirecting the output to a file is not a very good solution because you lose color coding,
Most programs nowadays let you do –color=always which dumps those controls to the file as well, so it can be viewed upon reopeneing (sometimes with another flag to interpret those commands, for example ls --color=always > file.txt; cat file.txt | less -r will keep them as color commands)
Of course, faster programs are better anyway, but there’s solutions.
BTW what I personally would do if terminal rendering is actually a bottleneck is to just skip frames. A skipped frame and a blur of text are equally useless; you can’t read either of them, so focus on keeping the internal state correct as quickly as you can then catch up on drawing just periodically enough to show the user that something is still happening and user can read later.
BTW what I personally would do if terminal rendering is actually a bottleneck is to just skip frames.
Yeah! 😄
Casey does a similar thing, though arguably even better because it doesn’t cause FPS drops - just don’t render the entire scrolling blur of text at all; only the part you can see on the screen at the given time.
For example, the law of diminishing returns means that at some point “it’s not worth it” is guaranteed to be true, but it’s highly context dependent exactly when. It could only be “ridiculous” if people are saying “it’s never worth worrying about performance”, but no one is saying that.
The same is true of most of the these “excuses” - the weak form of them is always true at some point, and no one is actually saying the strong form.
EDIT: another example
“Performance only matters in small, isolated sectors of the software industry. If you don’t work on game engines or embedded systems, you don’t have to care about performance, because it doesn’t matter in your industry.”
The weak form of this is “most software does not have to care about performance to the same degree as niches like gaming”, and this is true. There are millions of devs writing business apps and web sites who can happily not worry about the cost of virtual methods calls and all the other things that upset Casey.
The strong form is “only gaming has to care about performance at all”, and no one is really saying that - every web developer knows that a web page that loads in 10sec is not acceptable.
every web developer knows that a web page that loads in 10sec is not acceptable.
And yet it is still a frequent occurrence.
If you pay attention to how non technical people talk about computers (or at least, consumer facing software), you start to notice a trend. Typically it is complaints that the computer is fickle, unreliable, and slow. “The computer is thinking” is a common idiom. And yet I would guess that something like 90% of the software that most people use on a daily basis is not IO or compute intensive; that is, it has no basis for being slow. And it’s slow anyway.
I hear quite frequently colleagues and collaborators say things like “premature optimization is the root of all evil”, “computers are so fast now, we don’t need to worry about how fast this is”, or responding to any form of performance suggestion with “let’s wait until we do some profiling before we make performance decisions”. But rarely do I ever see that profiling take place. These phrases are all used as cover to say, essentially, “we don’t need to care about performance”. So while no one comes out and says it out loud, in practice people’s actions often do say “it’s never worth worrying about performance”.
I appreciate Casey’s takes, even if they are a little hot sometimes, because it’s good to have the lone voice in the wilderness that counterbalances the broader industry’s tendency towards performance apathy.
every web developer knows that a web page that loads in 10sec is not acceptable.
And yet it is still a frequent occurrence
I would submit that most of such web sites are made by amateur web designers who just cobble together a site from Wordpress plugins and/or businesses who choose to use bottom-of-the-barrel cheap shared hosting. I’ve certainly seen this in practice with friends who can design and decided to build websites for other friends, and there’s little you can do about it besides advising them to ask a proper web development agency to build the site and pay more for hosting (which small businesses might not want to do).
I would submit that most of such web sites are made by amateur web designers who just cobble together a site from Wordpress plugins and/or businesses who choose to use bottom-of-the-barrel cheap shared hosting.
Except Atlassian. Jira and Confluence - big websites backed by a big budget - still manage to frustrate me with how slow they are on a daily basis.
Jira is Windows, though, although to a lesser extent. Yeah, it kinda sucks, and it’s full of just plain weird behavior, but it’s also kinda impressive in how it serves a billion use cases that most people never heard about, and is basically essential to a whole bunch of industries, and the first thing is kind of s consequence of the second.
The weak form of this is “most software does not have to care about performance to the same degree as niches like gaming”, and this is true. There are millions of devs writing business apps and web sites who can happily not worry about the cost of virtual methods calls and all the other things that upset Casey.
As I bring up every time Casey goes on a rant, and have already brought up elsewhere in this thread, game developers are empirically at least no better than other fields of programming, and often are worse because gamers are on average much more willing to buy top-end hardware and stay on a fast upgrade treadmill. So they can more easily just tell users to buy a faster SSD, buy more RAM, buy the latest video card, etc. rather than actually set and stick to a performance budget. There’s perhaps an argument that console game dev does better with this just because the hardware upgrade treadmill is slower there, but modern console titles have an iffy track record on other measures of quality (like “does the supposed release build actually work at all or does it require a multi-gigabyte patch on launch day”).
The difference for game developers, I suspect, is the binary nature of performance failures. If a game runs at under a certain frame rate and jitter rate, you cannot play it. If another desktop application pauses periodically, you can still use it, it’s just annoying. I use a few desktop apps that regularly pause for no obvious reason (yes, Thunderbird, I’m looking at you doing blocking IO on the main thread), if these things were games then I just couldn’t use them.
This probably gives people a skewed opinion because they never play games that fail to meet the required performance bar for their hardware, whereas they do use other kinds of program that fail to meet the desired performance target. For consoles, this testing is easy and a game that can’t meet the perf target for a particular console is never supported on that console. Or, in quite a few cases I’ve seen recently, is launched on the console a year after the PC version once they’ve made it fast enough.
As to thinking about performance in other domains, I have a couple of anecdotes that I think contradict Casey’s world view:
Many years ago now, I was working on a Smalltalk compiler and writing some GUI apps using it. For debugging, I added a simple AST interpreter. To improve startup times, I moved the JIT to a shared library so that it could be loaded after process start and we could shift over to the JIT’d code later. At some point, I had a version mismatch in the shared library that prevented it from loading. For about two weeks, all of my code was running in the slow (probably two orders of magnitude slower than the JIT) interpreter. I did not notice, performance was fine. This was on a 1 GHz Celeron M.
When I got the ePub version of my first book back, I realised that they’d lost all of the semantic markup on their conversion so I wrote a tool for my second book that would parse the LaTeX subset that I use and generate good HTML. I intentionally wrote this in a clear and easy to debug style, aiming to optimise it later. The first time I ran it, it took around 200ms to process the entire book (typesetting it in LaTeX to generate the PDF took about two minutes). I did find one loop where a load of short-lived objects were created and stuck an autorelease pool around it, which dropped peak memory usage by about 90%, but I never bothered to do anything to improve performance beyond that.
The difference for game developers, I suspect, is the binary nature of performance failures.
People always say things like this, and then I go look again at Minecraft, which is the best-selling video game of all time, and I scratch my head a bit. It has a whole third-party industry of mods whose sole purpose is to bring the game’s performance up to basic playable levels, because Minecraft’s performance on average hardware (i.e., not “gaming rigs”) is so abysmal.
So I still don’t really buy into the idea that there’s some unique level of caring-about-performance in game dev.
First off, these companies are all making at least hundreds of millions of dollars in revenue, and serving millions of users daily. If your software isn’t, advice in this article doesn’t apply to you.
Second, every single example in this article is a company wide effort. None of them come from a single heroic programmer just really caring about the performance of their assigned user story in the current sprint. The decision to undergo such efforts might have been pitched by developers, but I bet it was authorized by a C-something-O, or at least a vice-president or director of something.
Last, but not least: the phrasing of every single “debunked” excuse is a straw man. Like, no one says those things, not in the way it’s presented in the article. People use much weaker, balanced, context aware versions of these arguments, and said versions are absolutely worth of consideration when discussing performance. They can’t be debunked because they’re not even statements of fact, they tradeoffs to be considered.
I’m glad I read the comments first, so I could be less angry reading the article, knowing I’m not the only one that disagrees with it.
Same here. I would like to introduce the author to the concepts of “opportunity cost” and “point of diminishing returns”.
My best counter point to this whole article are web frameworks. Ruby on Rails and its ORM ActiveRecord are hilarious inefficient. Doing select * on every query? Absolute madness. Does it work? Oh heck yes. It works pretty damn well for most applications.
Now if someone could fix how RoR loads view template… the lookup cost on those is a travesty.
Another interesting example is Django and SQLAlchemy ORMs. Both are implemented using descriptors and metaclasses, which add quite a bit of overhead. The actual fastest way of doing the kind of thing they do is something like namedtuple: code generation with strings + eval.
But in both cases, that would be a nightmare to maintain, and it’s unlikely that would it would bring significant performance gains, since the performance of most code using ORMs will be dominated by database IO. Optimizations in the driver or the infrastructure will easily make a bigger impact.
Yeah, the best way to optimize DB-using code is and always has been to head for fewer and better queries. Even if you gave someone ten years and unlimited budget to sit down and optimize the heck out of the code in the Django ORM, you would not get even within a couple orders of magnitude of the performance gain the average Django app would get from auditing for N+1, checking query plans to see where indexes are needed, caching the most expensive queries, etc.
One thing that this article doesn’t really get into (it alludes to it in its note at the end about v2) is that these are all massive rewrites of already successful systems, in which real-world experience guides the new architecture.
You could try to write the most aggressively optimized version 1 you can make, but even then there will be unforeseen performance bottlenecks that may require a rewrite. There will also be parts that you have spent ages optimizing which will be cut completely when the business “pivots” or simply due to insights of actual users not using these parts. But if you do so, you may also paint yourself into a corner and not even get to the “success” phase. Like Brooks said, “write one to throw away, you will anyway”.
Of course, none of that is a free pass to write idiotically stupid and slow code…
Yeah, I think a reasonable approach is to start with a performance budget and then work to the budget while also avoiding stupid performance hits that aren’t trading off against something else. For a video game, the budget is very strict, 60FPS. For a website, the budget is a comparatively bloated 200ms.
Ironically, just earlier today I saw this pop up when browsing the general Reddit front page. And a couple months ago there were similar reactions to the recommended system specs for Kerbal Space Program 2. And on and on back through time to things like the infamous “but can it run Crysis” meme back in the day.
But every time these threads come up I see people pushing the notion that game dev is somehow the domain of people who uniquely Care About Performance™ and treat every cycle and every byte as a sacred irreplaceable resource never to be wasted… which just does not hold up at all in the face of observed reality. Even the console game developers routinely ship stuff that’s just bad and broken and requires launch-day patching, and though their hardware upgrade treadmill is slower it’s still very much there.
For the teams and services I work with at the day job, p99 response time of 100ms is the usual performance budget, and with a little common sense and caching generally the reality is closer to 10ms or less (in fact I’m annoyed right now with one of my services because a change in requirements took away the ability to cache as efficiently and now it’s inched up closer to 20ms).
But the last time one of these threads came up someone else on this site just dismissed me as not being able to provide useful insight on performance due to my association with “webshit”.
As a professional Unreal Engine game developer I can testify that indeed game dev does not actually care that much about squeezing every cycle out of a CPU. Unreal’s editor is incredibly pessimized performance-wise.
That’s a great term 😁
I’d take that even further and even make the bold claim that unless you’re hiring already experienced engineers, your version 1 is always going to suck performance-wise, simply because you don’t yet know the problem domain and do not know how to write the thing in question most optimally.
This. Maybe where it’s so easy to go wrong is in ignoring the performance budget while exploring the problem? That way you still don’t know know how to write something to perform quickly in your version 2, and have to spend your version 3 rethinking a lot of your assumptions.
While I’m generally on the perf-is-good side of this argument, I don’t really find these examples compelling. The smallest examples cited have over a hundred million users. At that kind of scale, a 1% performance improvement translates to huge real-world savings (especially for server-side things where the provider is paying directly), let alone the 50%+ things that it’s discussing. Generalising from these examples is not so obvious. If I am running a desktop app that consumes 30% of my CPU under load and a later version optimises it to consume 15%, that’s a similar saving to the first big win at Facebook that the article mentions. Whereas the Facebook optimisation saves millions of dollars in hardware purchases for Facebook, this saving makes no difference to me that I will notice (I might be spending very slightly more money on power, but my CPU is probably still in a low-power state either way).
This ignores the opportunity cost. Every month a developer spends optimising is a month spent not doing something else. If you write your app in a GC’d language and pay a 30% perf hit (totally made-up number) but are able to ship two compelling features in the time that a competitor ships one, then guess who will win in the market.
I really wish we could do something about this messed-up incentive to screw over our users and potential users. When I lived in the Seattle area, I became friends with a poor person who had (maybe still has?) a PowerPC iMac and a mid-2000s PC in storage, and no current desktop or laptop (only a phone). Ideally, she should still be able to do the same things with those computers now that she did when they were new. But our industry’s upgrade treadmill left her behind. (Admittedly, the company I was working for at the time was and still is part of the problem.)
I guess I need to put my money where my mouth is and translate my current Electron app to Rust, on my own time.
Those computers can do the same things they did when they were new. They can run the same office suites and the same web browsers and render the same web pages and play the same games. And since they likely have actual spinning-platter hard drives rather than SSDs, their storage probably hasn’t degraded over time either.
What they can’t do is run a bunch of things that were developed in the intervening decades. And that should be expected – we don’t stand still, and as faster hardware is developed we find ways to take advantage of it.
To see the point, consider a feature of the phone I carry in my pocket. I shop sometimes at a local Japanese market, and a lot of the products on their shelves do not have English labels. And I don’t speak Japanese. But I can point my phone’s camera at some packaging and it will do an OCR of the Japanese text, translate it to English, and overlay the translation onto the image in, if not real time, then close enough as makes no difference. There is simply no amount of caring about performance that could backport that experience to the very first smartphone I ever owned – its hardware would never be capable of doing this. Even trying to take a still photo and upload it to a server for the processing would be problematic due to both the much worse camera (and lack of onboard hardware-and-software-assisted image stabilization and sharpening and so on) and the low-speed cellular connection it had.
And there are tons and tons of examples of things like this in daily life. Modern tech isn’t just bloated bloatware bloating its bloaty bloat for the sake of bloat; there are tons of things we not only do but take for granted today on devices whose size/form factor would not have been possible 15-20 years ago. And those things are useful. But there’s no way to backport them to sufficiently-old hardware, and it would not be good to try to stand athwart that progress and yell “Stop!” in order to ensure that hardware never is perceived as obsolete.
It’s easy to see that kind of progress on our smartphones. But on my PC, at first glance, it feels like I’m doing mostly the same tasks that I did on the laptop I bought for college in 1999 – browse the web, do email, listen to music, write some documents, and program. Back then, I did all those things on a laptop with a 366 MHz single-core processor and 64 MB of RAM. To avoid burying the lead, I’ve concluded that that feeling is misguided, though I don’t know how much processing power and RAM we’d actually need given ideally efficient software.
The web certainly demands more of our computers than it did back then. For content websites, this is IMO a definite step backwards. One definite advancement has been the addition of streaming video; I’m sure my 1999 laptop wouldn’t have been able to handle that. Another definite advancement is the multi-process model of modern browsers, which enables stronger sandboxing. (For that matter, my 1999 laptop was never upgraded beyond Windows Me, though it also ran various Linux distros over its lifetime.) The strong security of modern browsers also benefits email clients, assuming we count the ability to render HTML email as a necessity, which I do.
Playing a local music library has probably changed the least since 1999. In fact, I do so using foobar2000, the most lean-and-mean Win32 application I routinely use. Such a lightweight player could also support streaming music services, if any of them were open to it. Instead, AFAIK all the streaming services use web apps, or web apps in native wrappers (e.g. Electron or CEF), for their desktop clients. But then, while that might be a bad thing for resource usage, it can be a good thing for accessibility; Spotify’s original highly custom UI, for example, was inaccessible with screen readers. But of course, there were accessible native desktop apps before Electron, so we should be able to have accessibility without the extreme bloat.
Microsoft Office was already getting bloated in 1999. I remember one of my classmates in 1997 complaining about how sluggish Office 97 was on his computer. But again, to some extent, one person’s bloat might have been another person’s accessibility. By 1999, at least one Windows screen reader was using the Office COM object models, which were originally implemented for VBA, to provide access to those applications. I don’t see how the additional bloat of the intervening years has made things better, though.
I certainly appreciate the power of the Rust compiler, and other modern optimizing compilers that we didn’t have in 1999. I’m tempted to give that a pass because programming is a niche activity, but part of me still believes, as Guido van Rossum did (does?), that computer programming should be for everybody. And even on modern computers, Rust is sometimes criticized for its compilation time and resource requirements. Personally, I’m willing to accept that tradeoff to get both developer productivity and efficient generated code. Luckily there are other language/compiler/runtime designs that have different tradeoffs, and some of them, e.g. a CPython or Lua-style bytecode interpreter, could even still be as usable on that 1999 laptop as they were back then.
One task that’s definitely new since then is remote meetings, e.g. Zoom. The useful life of my old laptop coincided with the bad old days of VoIP (and by the end, I had upgraded the RAM, primarily because of a resource-hungry Java app I was working on). By the time Skype landed in late 2003, I had moved onto a more powerful computer, so I don’t know if it would have run on the previous one. I suspect not. And video calling? Forget it.
One thing that has definitely changed for me since then is that I rely much more on a screen reader now (I have low vision). One of the Windows screen readers I use, NVDA, is written primarily in Python, and this enables a very flexible add-on system. the NVDA project didn’t start until 2006, and it never ran on anything older than Windows XP, so it’s safe to say that NVDA wouldn’t have run well, if at all, on my old laptop, possibly even after the RAM upgrade (to the odd quantity of 192 MB). The other major third-party Windows screen reader (JAWS) started in the 90s, and it’s bifurcated between a C++ core and a scripting layer on top. Perhaps as a result of that separation, its scripting layer isn’t as powerful as NVDA’s add-on system.
So, where does that leave us? A single core at 366 MHz and 64 MB of RAM is clearly not enough for a modern PC. But do we really need a minimum of 8 GB of RAM and whatever amount of minimum processing power is now practically required?
About 10 years ago, a novel called Off to Be the Wizard had this exchange between two time-travelers:
I’m not sure if the author meant that to be funny or sad, though the former would be more in keeping with the tone of the book. But I’m still inclined to interpret it as a sad commentary.
There are also plenty of things my laptop computer can do today that a laptop of 15-20 years ago couldn’t, and plenty of software taking advantage of those capabilities.
But you seem to have decided that it’s all “bloat” and don’t particularly seem open to being persuaded otherwise, so I won’t bother trying.
I think my response was more complicated than that, though it certainly ended negatively. I realize that to some extent, increased hardware requirements are an inevitable result of real progress. But I still wonder how much better we could do if we didn’t so heavily prioritize developer convenience, and racing to implement more features, above runtime efficiency. I’m sure there’s no going back to 64 MB of RAM for a general-purpose computer, but maybe we don’t need 4 GB or higher as a minimum. I don’t know though; I’m open to being persuaded that I’m wrong.
Casey’s whole point in the “clean code, horrible performance” thing is that this is not how it works. You don’t write garbage code and then spend a month optimizing it. You just keep performance in mind and don’t write stupid code in the first place. There are some common practices (OOP abuse, for example) which don’t improve code quality but cost a lot in terms of performance. Instead of choosing those approaches, just write the code in a way that’s as good (maintainable, readable, etc) but without the performance pitfalls.
Maybe you want to spend a 30% performance hit for the productivity improvements of a GC, but maybe you don’t want to spend a 1000% performance hit for the dubious productivity improvements of writing your compute-heavy application in Python, for example, or a 500% performance hit for the “benefit” of modelling your data as a class hierarchy with virtual methods Clean Code style rather than plain structs or sum types.
I’m kindof noticing a trend with Casey tending to overstate how critical this is while ignoring context; I remember a kerfuffle a while back with some perf issue with the windows terminal app, where he showed how a “simple” rendering optimization could get some huge speedup (took him an afternoon), and he thought it was ridiculous how slow the terminal was going. I’ll grant the microsoft devs overstated how hard it would be, but without the optimization, the terminal was still scrolling way way faster than you could read. If that’s not fast enough, you should probably just redirect the output to a file and skip the need to render it entirely; tuning that kind of thing is a waste of an afternoon.
Good performance engineering requires an awareness of what performance “tier” you’re targeting; being able to show gigabytes/second of of text to a human is completely pointless, as is worrying about a single small memory allocation right before an HTTP request – even if in some other situation heap allocation is something to pay attention to. Context.
I haven’t had problems with the rendering speed of the Windows Terminal for a while, but when it launched i had quite a few experiences where I ran a command that produced a load of output and then had to wait 30 seconds or more for the terminal to catch up (in one case I gave up, killed the terminal and restarted it). That was a real productivity hit and not something I’d experienced with any other terminal. I quite often run commands that spit out a load of diagnostic stuff that I normally ignore but want to be able to search back through if something goes wrong. Before joining Microsoft, I typically used the Apple Terminal (which still has a few features I miss on Windows) and I had never had that problem since my first Mac (G4 PowerBook with 1 GiB of RAM).
This is something he goes into in his lectures about refterm - being able to dump text into the terminal quickly is actually kind of a good thing to have, y’know.
I’m working on a compiler. Given that the it’s still in its early stages and can’t compile most of your code it throws a lot of errors - to me it’s important I can scroll back up and read them. It’s also important to me that I can read its IR if I need to, and there’s a lot of it. Meanwhile printing the IR out in Windows Terminal takes like four seconds of extra runtime. Compare that to any Linux terminal emulator, where dumping it all out takes almost no time.
Redirecting the output to a file is not a very good solution because you lose color coding, which aids readability. So how about just let me read them in my darn terminal without having to wait unreasonably long. That’s what it’s made for, after all - presenting the output of programs you run.
Most programs nowadays let you do –color=always which dumps those controls to the file as well, so it can be viewed upon reopeneing (sometimes with another flag to interpret those commands, for example
ls --color=always > file.txt; cat file.txt | less -r
will keep them as color commands)Of course, faster programs are better anyway, but there’s solutions.
BTW what I personally would do if terminal rendering is actually a bottleneck is to just skip frames. A skipped frame and a blur of text are equally useless; you can’t read either of them, so focus on keeping the internal state correct as quickly as you can then catch up on drawing just periodically enough to show the user that something is still happening and user can read later.
Yeah! 😄
Casey does a similar thing, though arguably even better because it doesn’t cause FPS drops - just don’t render the entire scrolling blur of text at all; only the part you can see on the screen at the given time.
This feels fairly straw-man-y.
For example, the law of diminishing returns means that at some point “it’s not worth it” is guaranteed to be true, but it’s highly context dependent exactly when. It could only be “ridiculous” if people are saying “it’s never worth worrying about performance”, but no one is saying that.
The same is true of most of the these “excuses” - the weak form of them is always true at some point, and no one is actually saying the strong form.
EDIT: another example
The weak form of this is “most software does not have to care about performance to the same degree as niches like gaming”, and this is true. There are millions of devs writing business apps and web sites who can happily not worry about the cost of virtual methods calls and all the other things that upset Casey.
The strong form is “only gaming has to care about performance at all”, and no one is really saying that - every web developer knows that a web page that loads in 10sec is not acceptable.
And yet it is still a frequent occurrence.
If you pay attention to how non technical people talk about computers (or at least, consumer facing software), you start to notice a trend. Typically it is complaints that the computer is fickle, unreliable, and slow. “The computer is thinking” is a common idiom. And yet I would guess that something like 90% of the software that most people use on a daily basis is not IO or compute intensive; that is, it has no basis for being slow. And it’s slow anyway.
I hear quite frequently colleagues and collaborators say things like “premature optimization is the root of all evil”, “computers are so fast now, we don’t need to worry about how fast this is”, or responding to any form of performance suggestion with “let’s wait until we do some profiling before we make performance decisions”. But rarely do I ever see that profiling take place. These phrases are all used as cover to say, essentially, “we don’t need to care about performance”. So while no one comes out and says it out loud, in practice people’s actions often do say “it’s never worth worrying about performance”.
I appreciate Casey’s takes, even if they are a little hot sometimes, because it’s good to have the lone voice in the wilderness that counterbalances the broader industry’s tendency towards performance apathy.
I would submit that most of such web sites are made by amateur web designers who just cobble together a site from Wordpress plugins and/or businesses who choose to use bottom-of-the-barrel cheap shared hosting. I’ve certainly seen this in practice with friends who can design and decided to build websites for other friends, and there’s little you can do about it besides advising them to ask a proper web development agency to build the site and pay more for hosting (which small businesses might not want to do).
Except Atlassian. Jira and Confluence - big websites backed by a big budget - still manage to frustrate me with how slow they are on a daily basis.
Jira is Windows, though, although to a lesser extent. Yeah, it kinda sucks, and it’s full of just plain weird behavior, but it’s also kinda impressive in how it serves a billion use cases that most people never heard about, and is basically essential to a whole bunch of industries, and the first thing is kind of s consequence of the second.
As I bring up every time Casey goes on a rant, and have already brought up elsewhere in this thread, game developers are empirically at least no better than other fields of programming, and often are worse because gamers are on average much more willing to buy top-end hardware and stay on a fast upgrade treadmill. So they can more easily just tell users to buy a faster SSD, buy more RAM, buy the latest video card, etc. rather than actually set and stick to a performance budget. There’s perhaps an argument that console game dev does better with this just because the hardware upgrade treadmill is slower there, but modern console titles have an iffy track record on other measures of quality (like “does the supposed release build actually work at all or does it require a multi-gigabyte patch on launch day”).
The difference for game developers, I suspect, is the binary nature of performance failures. If a game runs at under a certain frame rate and jitter rate, you cannot play it. If another desktop application pauses periodically, you can still use it, it’s just annoying. I use a few desktop apps that regularly pause for no obvious reason (yes, Thunderbird, I’m looking at you doing blocking IO on the main thread), if these things were games then I just couldn’t use them.
This probably gives people a skewed opinion because they never play games that fail to meet the required performance bar for their hardware, whereas they do use other kinds of program that fail to meet the desired performance target. For consoles, this testing is easy and a game that can’t meet the perf target for a particular console is never supported on that console. Or, in quite a few cases I’ve seen recently, is launched on the console a year after the PC version once they’ve made it fast enough.
As to thinking about performance in other domains, I have a couple of anecdotes that I think contradict Casey’s world view:
Many years ago now, I was working on a Smalltalk compiler and writing some GUI apps using it. For debugging, I added a simple AST interpreter. To improve startup times, I moved the JIT to a shared library so that it could be loaded after process start and we could shift over to the JIT’d code later. At some point, I had a version mismatch in the shared library that prevented it from loading. For about two weeks, all of my code was running in the slow (probably two orders of magnitude slower than the JIT) interpreter. I did not notice, performance was fine. This was on a 1 GHz Celeron M.
When I got the ePub version of my first book back, I realised that they’d lost all of the semantic markup on their conversion so I wrote a tool for my second book that would parse the LaTeX subset that I use and generate good HTML. I intentionally wrote this in a clear and easy to debug style, aiming to optimise it later. The first time I ran it, it took around 200ms to process the entire book (typesetting it in LaTeX to generate the PDF took about two minutes). I did find one loop where a load of short-lived objects were created and stuck an autorelease pool around it, which dropped peak memory usage by about 90%, but I never bothered to do anything to improve performance beyond that.
People always say things like this, and then I go look again at Minecraft, which is the best-selling video game of all time, and I scratch my head a bit. It has a whole third-party industry of mods whose sole purpose is to bring the game’s performance up to basic playable levels, because Minecraft’s performance on average hardware (i.e., not “gaming rigs”) is so abysmal.
So I still don’t really buy into the idea that there’s some unique level of caring-about-performance in game dev.
First off, these companies are all making at least hundreds of millions of dollars in revenue, and serving millions of users daily. If your software isn’t, advice in this article doesn’t apply to you.
Second, every single example in this article is a company wide effort. None of them come from a single heroic programmer just really caring about the performance of their assigned user story in the current sprint. The decision to undergo such efforts might have been pitched by developers, but I bet it was authorized by a C-something-O, or at least a vice-president or director of something.
Last, but not least: the phrasing of every single “debunked” excuse is a straw man. Like, no one says those things, not in the way it’s presented in the article. People use much weaker, balanced, context aware versions of these arguments, and said versions are absolutely worth of consideration when discussing performance. They can’t be debunked because they’re not even statements of fact, they tradeoffs to be considered.
I’m glad I read the comments first, so I could be less angry reading the article, knowing I’m not the only one that disagrees with it.
Same here. I would like to introduce the author to the concepts of “opportunity cost” and “point of diminishing returns”.
My best counter point to this whole article are web frameworks. Ruby on Rails and its ORM ActiveRecord are hilarious inefficient. Doing
select *
on every query? Absolute madness. Does it work? Oh heck yes. It works pretty damn well for most applications.Now if someone could fix how RoR loads view template… the lookup cost on those is a travesty.
Another interesting example is Django and SQLAlchemy ORMs. Both are implemented using descriptors and metaclasses, which add quite a bit of overhead. The actual fastest way of doing the kind of thing they do is something like namedtuple: code generation with strings + eval.
But in both cases, that would be a nightmare to maintain, and it’s unlikely that would it would bring significant performance gains, since the performance of most code using ORMs will be dominated by database IO. Optimizations in the driver or the infrastructure will easily make a bigger impact.
Yeah, the best way to optimize DB-using code is and always has been to head for fewer and better queries. Even if you gave someone ten years and unlimited budget to sit down and optimize the heck out of the code in the Django ORM, you would not get even within a couple orders of magnitude of the performance gain the average Django app would get from auditing for N+1, checking query plans to see where indexes are needed, caching the most expensive queries, etc.