1. 8
  1. 7

    That’s more like “impact of unidiomatic <= loops on performance”: with <, both Rust and C++ optimize this to O(1) computation. With <=, there’s an edge case in C++ where the program loops infinitely when n is a maximal value for a type, and doesn’t compute the sum (*). Interestingly, Rust version with ..= terminates, but fails to optimize, because rust inclusive ranges optimize poorly. Although (1..=n).sum() does give O(1) solution as well: godolt.

    I also have a feeling that the author knows what’s going on exactly, but deliberately leaves the explanation off, to make the example seem more generalizable than it really is.

    (*) I believe technically C++ is allowed to optimize this to Gauss trick even for unsigned numbers, as infinite loops without side effects are UB.

    1. 2

      (*) I believe technically C++ is allowed to optimize this to Gauss trick even for unsigned numbers, as infinite loops without side effects are UB.

      In fact, I think this is why they’re UB.

      1. 1

        (*) I believe technically C++ is allowed to optimize this to Gauss trick even for unsigned numbers, as infinite loops without side effects are UB.

        A comment on the blog entry points out that clang does this optimization; it’s just a missed optimization in GCC.

        1. 2

          Not really, clang doesn’t optimize the case with <=. That comment changes <= to <.

      2. 3

        Oh, my pet peeve!

        Rant on

        Compilers should not silently optimise a loop I’ve written to sum integers to the closed formula version.

        Never ever. No.

        If they are smart enough to figure out that there’s a closed formula, then issue a warning telling me about that formula, or maybe tell me about a library function that does it.

        So I can change the code.

        Rant off

        Thank you for your attention.

        1. 3

          Compilers should not silently optimise a loop I’ve written to sum integers to the closed formula version.

          Serious question: why not? If it produces the same result under all possible inputs, what’s your argument against it?

          1. 1

            Well, I already wrote what the compiler should do instead.

            There are essentially 2 possibilities why the loop solution is in there:

            1. I don’t know the closed form solution
            2. I do know the closed form solution, but decided not to use it

            In neither case is silently replacing what I wrote the right answer.

            If I do know the closed form solution and I put the other solution in there, that was probably for a reason, the compiler should not override me. For example I wanted to time the CPU doing arithmetic. And of course that is another “output” from this computation, so they do not produce exactly the same result. So in this case, please don’t replace, or if you do replace at least tell me about it.

            If I don’t know the closed form, then I should probably learn about it, because the code would be better if it used the closed form solution instead of the loop. That would make sure that it is always used, rather than at the whims of the optimiser, which we can’t really control. It also makes compile times faster and makes the code more intention-revealing.

            1. 3

              There’s a third reason that is far more common:

              The closed form would lead to incomprehensible, difficult-to-modify, source code.

              The entire point of an optimising compiler is to allow the programmer to write clean, readable, maintainable source code and generate assembly that they might be able to write by hand but would probably never want to modify.

              The transform pass that’s replacing the loop with the closed form doesn’t know if that’s what you wrote in a single function or if this is the result of a load of earlier passes. Typically it will see this kind of thing after inlining and constant propagation. Source code isn’t trying to sum all integers in a range, it’s trying to do something more generic and it’s only after multiple rounds of transformation and analysis that the compiler can determine that it can aggressively specialise the loop for this particular use.

              In particular, you typically see summing integers in a loop as a single accumulator in a loop that does a load of other things so that code at the end can see it. If you want to use the accumulated value inside the loop, that’s the correct form, otherwise the right thing is to pull it out and compute it at the end with the closed form. Imagine you do that by hand. Now you want to debug the loop and know the cumulative total, so in debug builds you end up keeping an accumulator around as well and then discard it and use the closed form. If your closed form computation had bugs, you wouldn’t see these in the debug build. You could stick in an assert that checks that the two are the same. Now your code is a mess. Or you could just use the accumulator for both, have the same code paths for debug and release builds, and have the compiler generate good code for the release builds.

              If you want your compiler to warn you explicitly, then you need to keep a lot more state around. Compilers are written as sequences of transform passes (with analyses off to the side). They don’t know anything about prior passes, they just take some input and produce some output. If you want them to warn then they need to be able to explain to the programmer why they are warning. Are you happy with compilers requiring an increase in memory by a polynomial factor?

              1. -1

                The closed form would lead to incomprehensible, difficult-to-modify, source code.

                Hard disagree. Trying to infer what an iterative loop will actually compute is what’s hard to figure out, because you have to play computer and keep iterative state in your head, which leads to subtle bugs when you go back and modify that code later.

                See also: Go To Statement Considered Harmful:

                My second remark is that our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed. For that reason we should do (as wise programmers aware of our limitations) our utmost to shorten the conceptual gap between the static program and the dynamic process, to make the correspondence between the program (spread out in text space) and the process (spread out in time) as trivial as possible.

                Minimising tracing dynamic execution of code should always be a goal.

                allow the programmer to write clean, readable, maintainable source code

                Exactly. But here it is doing the opposite: allowing the programmer to write low-level, mucky iterative code and infer a clean higher-level semantic description from that lower level code. That I don’t want.

                Typically it will see…

                This seems highly contrived and speculative.

                In particular [computing a sum alongside doing other computation]

                Again, that example seems extremely contrived. Furthermore, if the loop is doing anything else of substance, keeping the sum around will be entirely negligible.

                If you want your compiler to warn you explicitly…[the world will implode]

                Again, lots of speculation.

                I am also fine with them simply not doing this “optimisation”.

                1. 3

                  This reply reads like a troll, I have flagged it as such.

                  A trivial(?) counterexample -

                  Fresh from being hired, you’re tasked with the following business-critical code: add all numbers from 1 to 1,000 except those that match FizzBuzz.

                  What I’d do, and any normal person would do, is to write a loop, and check numbers against the rule, not adding them to the total if they match. Presumably the compiler will derive a closed form (which I cannot provide, Wolfram Alpha failed me, but the OEIS sequence gives a hint that it’s complicated).

                  What you would do is write the above, get an answer from the compiler which is the closed form, delete the loop, insert the closed form, add a comment saying “closed issue #14495982” and go on with your day.

                  2 years later, your successor has to implement an urgent enhancement: only exclude those numbers which are mod 5 ==0 or mod 7==0. Your closed form is inscrutable and hardly trivial to amend, as opposed to ours, which requires changing 1 variable. (Of course our code might implement Enterprise FizzBuzz where the modands are parametrized, but that way madness lies).

                  In summary and conclusion - you should write code for other people to understand, not to make the compiler’s life easier.

                  1. 1

                    I think there can be a middle ground. Better tooling to show the developer what “no optimization” vs “optimization” will do to your code.

                    Like… annotated godbolt inline in your IDE, for those of us who have used it to look at C++ code.

          2. 2

            Fully agree. I would say this about UB:

            If correctness is critical, you want the compiler to warn about UB, not use it to infer things. Such as inferring that a branch is unreachable and deleting it (insert Linus rant about deleting null pointer checks), or as in this case, “optimising” away something that is a bug in the source code (infinite loop without side effects on overflow) into something else not prescribed by the source code or exhibited by the unoptimised program.

            In my own programs, I mostly use natural numbers (unsigned integer is such a misnomer for natural number) – I don’t expect to have much UB, so it should be no problem to turn on such a warning.

            1. 2

              Here are some of the reasons why most compilers don’t do this sort of diagnostic:

              • Adding diagnostics breaks the build if users are compiling with warnings as errors.
                This is why gcc -Wall hasn’t changed for years.
              • The user may not understand, care, or be willing to change their code.
                You can’t, for example. modify the Spec benchmarks to improve performance.
                Detecting the loop and optimizing it improves performance for everyone not just the few who fix their code.
              • Diagnostics are done by an earlier pass; there isn’t enough information to issue a diagnostic.
                Many optimizations may have modified the intermediate representation, e.g. the loop might have been restructured, contained in an inlined or cloned function, etc.
              1. 1

                What a great summary of the sorry state of compilers today … for users.

                All of these “reasons” are barely more than excuses as to why doing this would be inconvenient for the creators of compilers.

                None of them invalidate my reason for why doing this is the right thing for users of compilers.