1. 33
  1.  

  2. 16

    The point Knuth was making was to avoid micro-optimizations through out the code. Such optimizations tend to make the code harder to reason about, and often times, aren’t even where the program spends its time. Write the program to be correct, then profile it if it’s still slow. By profiling, you will find where the actual bottle neck is.

    Over the years, I’ve been correct 50% of the time where I thought I knew where the bottle neck was—the last time it was due to some parsing code was doing more than I wanted, but I didn’t find that out until I profiled the code (and up until I did, the code was fast enough for our uses, until it wasn’t).

    1. 2

      People really should read the whole paper, Structured Programming with go to Statements

      Here is a great post that includes the surrounding text to that often cited quote

      I’ve become convinced that all compilers written from now on should be designed to provide all programmers with feedback indicating what parts of their programs are costing the most; indeed, this feedback should be supplied automatically unless it has been specifically turned off.

      Doing little optimization tricks is an easy distraction, but it really should be prevented by the CI/CD system.

      1. 2

        Also right before it:

        The conventional wisdom shared by many of today’s software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny-wise-and-pound-foolish programmers, who can’t debug or maintain their “optimized” programs. In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal.

        EDIT oh whoops it was in your link too

    2. 11

      Vectorization in Python can give you a 25×-40× speedup over naive Python for loops. If your job takes 1 minute in NumPy, it will take 30 minutes or more with plain old CPython. That means going from being able to run your code 60 times an hour to only twice an hour, making it significantly harder to experiment and iterate on your code:

      This is also a good argument for parameterization: if your job takes 1 minute on NumPy, it will probably take 1 second on NumPy with a small data set. So you want to easily be able to swap between sample data and the actual data you care about. Most science people I know don’t do this.

      1. 8

        Your comment reminds me of something I read today:

        We had a word for those speedy & terse analyses - “university code”. Smart, compact, quickly written. Completely impenetrable, impossible to change or maintain, sometimes opaque even to the author.

        There’s a place for this approach, but not near anything business critical

        @agapow on Twitter

        1. 1

          Heh, this is 99% of the code I interact with in a daily basis

        2. 2

          Yeah I always did this on “big data” jobs but found my coworkers didn’t … still puzzled by that. I always aim for a test loop that’s less than 10 seconds, and a minute can be intolerable. Sometimes sampling is nontrivial, but it almost always pays for itself.

        3. 11

          The problem with the “premature optimization is the root of all evil” is that it is not the full quote, and if you exclude the full quote you get people arguing for “simple code” over correct design. The full quote is:

          “Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.”

          So Knuth is saying that

          1. too much time is spent optimizing code that isn’t on an actual performance sensitive path
          2. we should not be looking at micro optimisations as much as we do: Knuth is not saying “don’t think about performance”, he’s saying that prematurely optimizing for things like branch lines, or cache lines, etc is not warranted unless you have exhausted the macro optimizations (architecture or algorithms).

          Neither (1) nor (2) are saying what many people claim, which is that you should not consider performance when designing a system.

          When I was trapped in a group project at university one of the group members would fight (and even revert) “complicated” code because it was “premature”, and “premature optimization is the root of all evil”. He would (and did) revert things like “this code should be O(N) and not O(N^2)” - he only had this fixation because all anyone ever says (incl. professors) was the tiny bit in the middle. The way this quote is taught was literally causing him to actively fight sensible design because he’d basically been taught “code that makes things faster is inherently bad”.

          1. 9

            There’s an important corollary to Knuth’s observation: because you don’t know a priori which 3% of the code will benefit from optimisation, you should write cleanly abstracted code so that you can easily microoptimise any 3% later.

            1. 2

              cleanly abstracted

              That’s a bit of an oxymoron, when you’re talking about performance.

              1. 5

                Absolutely untrue and I’m struggling to understand how you would come to believe that.

              2. 2

                Right, I’m merely saying that there is a world of difference between “premature optimization” and “choosing sensible base architecture”.

            2. 5

              “Writing faster software doesn’t always take more time” glosses over the fact that writing the NumPy-enabled code requires knowledge on top of regular Python syntax. Also, as someone who’s barely seen NumPy it’s not obvious which parts of my existing code bases can be optimised by using NumPy.

              That said, excellent article, and I’d love to look into NumPy more to see whether it can improve some stuff I’m working on. Thank you!

              1. 4

                My MacBook is increasingly slow now but I feel ways about upgrading because every time it’s slow it’s because the software is misbehaving. If I upgrade I’m just rewarding their poor software quality. Accidentally quadratic is the devil.

                1. 14

                  Another way to put it: Using the right Big O is never a premature optimization.

                  1. 3

                    I don’t know that accidentally quadratic algorithms are the most common cause of slow software. If I had to guess, I’d say that cache misses, branch mispredictions, and unbuffered IO are probably near the top.

                    1. 3

                      Oh and unnecessary intermediary data structures. In fact, that one might be at the top of my list for why a lot of software is slower than it could be.

                      1. 3

                        I’d disagree. Cache misses and branch mispredictions can only tank the performance so much. Applications often have quadratic explosions, and developers usually know this. I really enjoyed reading blog posts from Aras Pranckevičius about speeding up Blender imports - it’s all quadratic algorithms where there should be none!

                        1. 2

                          In the last 12 months, how many accidentally quadratic algorithms have you fixed in your code that made it faster vs. other ways of making the code faster?

                          1. 6

                            So far every performance issue I had to tackle this year has been due to accidentally quadratic (or even worse complexity) code.

                            1. 1

                              The most significant speed improvement I made to one of my projects was replacing a linear scan with a hash table.

                        2. 3

                          I disagree, unless the word ‘right’ is doing a lot of work in that sentence. I’ve written some quadratic code in a build system recently. It’s a lot more readable than the linear version and I know that, in our use case, n is always going to be <5 and the constant is a lot smaller then it would be for the linear version. The entire build system code is less than one percent of the total build time (most of it is the compiler and linker, as it should be).

                          Big O tells you asymptotic worst-case performance. Whether this matters depends a lot on the size of the input set and on the size of the constant.

                          1. 1

                            If input sized is fixed, it’s not Big O. Big O assumes unbounded input.

                            1. 2

                              Nothing in the real world has unbounded input. This is why thinking about big-O notation can be misleading. The upper bound on n may be single digits, hundreds, or hundreds of millions, but it’s always there. Often, you reach the asymptotic bounds for values of n lower than the ones that you see in production and then big-O notation helps in understanding the complexity, but it still hides the value of the constant. For sufficiently small values of n, the constant factor dominates the algorithmic complexity when running an algorithm on a real computer. Doing optimisation well depends a lot on understanding, for your specific task, what a ‘sufficiently small value’ means.

                              1. 1

                                Unbounded is not the same as infinite.

                                1. 1

                                  I neither said, nor implied, that it was.

                      2. 4

                        (I just pushed some edits to hopefully make the point a bit clearer.)

                        1. 4

                          Kudos on the excellent article. I couldn’t agree more and, as a newcomer to Python, I found the practical examples very helpful.

                          One nit:

                          One big 100MB read, which requires you to stored the data in one big blob, using a suitable data format like Parquet.

                          “Stored” should be “store.”

                          1. 1

                            Thanks, will fix soon.

                        2. 1

                          I think it’s not just the premature but also the optimization part. There’s a difference between let’s call it applying trucks, that night be confusing, bring their own complexity and writing simple easy to understand code that is also fast, not wasting resources.

                          Not unnecessarily calculating the same thing over and over when easily avoidable, not having calls passing through giant stacks of some framework for no reason/benefit, and not applying the latest pattern your learned about just for the sake of it might be things that make your code faster, but also simpler, clearer and more maintainable.

                          I think there’s a difference between having an eye on speed compared to actually optimizing parts, where you would think about algorithms, add fast paths, etc. These can be premature when you don’t know they are even a bottleneck or when you didn’t benchmark before potentially blindly assuming it will make things faster. Worse, if you do things like implementing in C/C++ or nowadays maybe Rust only for someone else to come along to let you know that calling the foreign function itself makes the code slower than it was or could be.

                          Also it’s a good idea to think of whether something actually has any benefit at all. Like if you have that script running once a day with a fixed or even fairly fixed amount of run time and you shave off a few milliseconds from it it might not be worth it. Also things that only happen once might not be worth it.

                          I think something people forget about when they realize they need to make something faster if to benchmark it before, maybe both on the micro level but also some kind of “overall performance” so you don’t accidentally end up just moving time spent from one part of a system to surgery you are not benchmarking.

                          Another important thing is doing these things with as close to real life data as possible. Way too baby tubes there’s situations where huge increases in performance are announced or even improvements in data sizes only to realize that there isn’t actually a real life chance when used in production, simply because the data it’s applied on is different. There is a huge opportunity to fool yourself with optimizations. Measuring also can be non-trivial.

                          1. 1

                            Note the original version of this article (just fixed now) had the wrong code, I copy/pasted the wrong implementation.

                            1. 1

                              Indeed, looking at speed early is important. Right now I’m working on some Big Data™ and even though I did start with some fairly fast foundations for what I’m doing, I find it’s still important to focus on speed. Naive implementations of some processes work in a workable time(10-15 min) with truncated data, but since I know there’s going to be 100x more than I use for development iteration, it’s a very good motive to work on decreasing that to a couple seconds through some optimization earlier than later. It both makes my iteration times a lot quicker, and I won’t need to improve it later when it ends up too slow for the full dataset.

                              1. 1

                                Glad someone wrote this! Yes, that old mantra is so misunderstood and overused.

                                Its premise is that you can and even will fix it later. While that’s at least possible for most things, or aspects, there are going to be those where you just know that will never happen. Especially design decisions, which tend to require fundamental rewrites to undo – the kind that have ended many businesses. I’m saying it, because it surprisingly has to be said: Don’t make design decisions that you know are stupid. I’m talking about memory hierarchy and orders of magnitude here. Think about it: All work done after that will be influenced by that decision.

                                It’s time to recognise that it’s not “optimization” and “premature” to take your time to get the fateful decisions right. If you want to compete on speed, absolutely, design for performance.