1. 41
  1. 8

    I think this would be an interesting post for many people (I was a spec author and worked on a JS engine for many years so it’s difficult to distinguish stuff I know that everyone knows and stuff I know because of insane career choices :D)

    It’s also well written and coherent, which can be rare :D

    (I would like it if they filed bugs on webkit and gecko when they get X improves in chrome bug regresses in webkit/gecko as that’s generally considered a bug by the relevant engine teams)

    1. 6

      Thank you :) About filing bugs, do you mean the people trying to optimize code? The way that I understand it (when doing my explorations at least), is that it’s mostly that certain engines optimize different things differently.

      For instance, when implementing a function under the hood by a for loop instead of a while loop, it’s going to be faster on browser X because they optimized for loops more than while loops, but slower on browser Y because they optimized for while loops instead (since in my understanding performance is often about trade-offs). This is probably an over-simplified example but I hope you catch my draft.

      Is this way of thinking too naive? I didn’t think about reporting issues to browsers. Any advice on how to do this?

      1. 13

        (I used to work on a browser perf team.)

        It’s true that browser vendors optimize for different things, but sometimes something is optimized in Browser X and Browser Y hasn’t gotten around to it or isn’t aware of it. In those cases, it doesn’t hurt to file a bug.

        A great way to phrase it is “If you use <Pattern A>, the performance is roughly equivalent in all 3 browser engines. But if you tweak it into <Pattern B>, suddenly Browser X is way slower than Browser Y or Browser Z.” If nothing else, it gives Browser X a reduced test case they can use to measure their progress on fixing it.

        1. 4

          That’s helpful advice, thank you very much!

        2. 6

          Thank you :) About filing bugs, do you mean the people trying to optimize code? The way that I understand it (when doing my explorations at least), is that it’s mostly that certain engines optimize different things differently.

          The engine optimization heuristics aren’t specified and differ between engines, but the various JS teams generally consider any code change that improves performance in other engines but regresses it in their’s to be a bug (I can’t really speak for spider monkey, but that is absolutely the model that JSC uses). Bug reports for JSC go through webkit.org (you can use the apple bug reporter, but bugs.webkit.org has the benefit of not being a black hole).

          For your example of for loops vs while loops, they would be expected to have no performance delta. The only “obvious” (e.g as a former implementer) is

          for (let I = 0; I< N; I++) { // upper case I because autocorrect fights me, and I have surrendered
            let f = function() { return I; }
            console.log(f())
          }
          

          vs

          for (ver I = 0; I< N; I++) {
            let f = function() { return I; }
            console.log(f())
          }
          

          vs

          var I = 0
          while (I < N) {
            let f = function() { return I; }
            console.log(f())
          }
          

          In principle this would result in the for(let) loop being slower than the other two. In JSC if you’re running in a process that isn’t able to allocate a memory region for the JIT (on intel machines this will be a writable mapping and an executable mapping to the same physical memory, and on Apple’s arm architectures it’s RW^RX memory and it toggles as needed) then you’ll only get the interpreter. Now the interpreter is very fast compared to the interpreters of old but it doesn’t do any optimizations to the level of inlining, so in that case the while loop above will definitely be slower than the for loop, so we can start there for a discussion of performance.

          The reason for this behavior, which may seem weird as you might think “but let is modern JS, surely it’s made faster?”, is the function inside the loop body. The function captures the I being used to iterate, which means that when the function object is created, the function captures a reference to the current scope. The problem for the for(let) is that let behaves sanely, and so as the for loop iterates the body of each loop gets a different I. This is where the performance sadness happens, as it means that every iteration through the loop has to create a new object to hold the definition of the I variable. You can logically think of the for(let) being something like this:

          for (secretInternalValue = 0; secretInternalValue < N; secretInternalValue++) {
            with({I: secretInternalValue}) {
              let f = function() { return I; }
              console.log(f())
            }
          }
          

          Obviously the semantics aren’t quite the same as that intermediate object and the “with” are much much more heavily constrained than an actual with statement*. Now in theory a sufficiently clever JIT could optimise this case by inlining f, and then discovering that it no longer needs to create a closure but I’m unsure if that ever happens - it would be quite expensive to do, and would only kick in if you have code that is this trivial. Also the modern VMs all do free variable/capture tracking, and if the function in the above examples did not capture I, then they would know that they don’t need to create a scope object for the function.

          Anyway, the performance difference here is not necessarily a prioritization of for vs while (in my experience the bias in what people use isn’t enough to favor one over the other). I do know that a lot of work has gone into for(of) as it has rapidly become very common, but is also much more complex than simply incrementing a number (it can technically be calling an arbitrary function, has object allocation, etc, etc).

          But in the general case there are differing priorities, the most basic is memory use vs runtime. Back when I was working on them webkit+jsc paid much more attention to memory use than v8 which means certain tradeoffs. JSC historically was designed to support more backends (though that’s mercifully reduced, no one needs a JS JIT on SH4, MIPS, etc), it has to be able to run without a JIT, etc. There are things like how expensive object allocation is, or how frequently an object needs to be allocated - for example in early chrome time they were GC allocating objects for doubles, so 1.52.53.5 could result in two object allocations, whereas JSC was doing terrible things with the NaN bits so didn’t need any allocations. That means that at the time, the V8 team had more reason to optimize numeric code to avoid intermediate allocations so that was a bigger focus than for JSC.

          So things like that are what colour the exact heuristics each engine uses, but all the engines expect their performance to be at least in the same ball park as each other for any given construct, and in general they aren’t making that many decisions purely on the basis of the higher level constructs (syntax, library functions). Hence bug reports (esp. with test cases) when they do diverge greatly are always appreciated.

          • The scope chain is just a linked list of objects, when you look up a variable that’s in the scope chain the VM can do that statically as long as there isn’t an entry in the scope chain that can change the VM just goes “ok that variable is declared X levels up the scope chain from here, and it is stored at offset Y”. But if there’s a dynamic object as you would get with with(...) or a non-strict function contains an eval in the scope chain the VM can’t generate code that statically past it, because it can’t know whether the variable it’s searching will be present at the level
      2. 5

        (edit to add: I spent about an hour writing this post, doing other things in between. After I hit post, I saw a lot of other comments had come in in the meantime. So apologies for writing anything that was already said.)

        Hah, not five minutes ago I was listening to an (old) episode of Elm Radio, a podcast by Jeroen and Dillon, so it was a neat little double-take seeing your name on Lobsters immediately after, Jeroen. (I recommend the podcast, by the way. They have lovely gentle voices, and are really good about letting their guests speak.)

        I believe we could make Elm code perform much faster if we had direct access to these building blocks, or if we could give some hints to the runtime that these are numbers for sure.

        Perhaps you already know this, but I’ll mention it just in case. (I skimmed Robin’s articles that you link, but couldn’t find any references to the below.) For the Javascript JIT you can take a we-know-it’s-numeric x and produce a JIT-knows-it’s-integer value by writing x | 0 (bitwise ‘or’ always produces an integer, and ‘or 0’ is an identity operation). A JIT-knows-it’s-Boolean is obtained via !!x. Asm.js also uses +x (double-precision float), or fround(x) (float). [1]

        Example from Microsoft’s docs [2,3]:

        // JIT knows after this function, this.x and this.y are integers
        Point.prototype.halve = function() {
            this.x = (this.x / 2) | 0;
            this.y = (this.y / 2) | 0;
        }
        
        // JIT knows every value this function writes to the array is integer
        function halveArray(a) {
            for (var i = 0, al = a.length; i < al; i++) {
                a[i] = (a[i] / 2) | 0;
            }
        }
        

        But probably your problem is not limited to numeric values, but to many kinds of type annotations? I wonder what kind of JavaScript Typescript emits. After all TS, like Elm, has variables with known types, as well as variables with sum types (that may be narrowed by case checks).

        Anyway, all of this is just to say: lovely article, lovely podcast, many thanks for both.

        [1]: Tips and tricks for writing asm.js as a human
        [2]: Writing efficient Javascript
        [3]: The next section, Avoid floating point value boxing, explains how to use Float32Array for floats.

        1. 3

          I wonder what kind of JavaScript Typescript emits.

          It doesn’t emit any of asm.js’ “type hints”; what you see is what you get (except for enums, which are compiled as an object).

          I was also thinking you could add | 0 to optimize for integer addition while reading the post but I never found anything to suggest that those hints are valid outside of asm.js. Does anybody know whether the browser JS engines out there also use it as a hint to optimize code outside of asm.js blocks?

          1. 3

            Non gecko browsers never cared about “asm.js”, the only real difference was in spidermonkey which would IIRC adopt some cheaper semantics for a few operations. Unfortunately this was so long ago I have no recollection of which semantics they changed, or where, but it always struct us as being a fairly questionable endeavor as almost everything that they said asm.js would allow JSC at least was already doing.

            The “type hints” like |0, etc (or in fact any of the bit ops, logical ops, etc) were tracked in JSC in all JS all of the time, even in the days where it was just a hilariously optimized AST interpreter, well before “asm.js” ever happened. In the era of the AST the big saving was avoiding GC allocation of values that couldn’t be represented in 31 bits, or possibly 30? Again, this was a long time ago. Once you get to the bytecode interpreter a lot of costs inherent in the AST meant that int<->double conversions were become proportionally more significant so continuing to track int vs. double remained valuable. Similar logic exists for all the other numeric and logical operations - or more generally anything that would result in a known type (in the VMs int, number, bool, sense not the JS sense).

            Obviously the value of knowing number vs int vs bool vs object etc increases continuously as the various engines became more powerful, and the JITs were able to perform ever more complex optimizations, but even the early JITs were doing some fun heuristics. The first generation of the JSC JIT had things math operations implemented essentially as

            if (both values are ints)
                do int logic
                do overflow checks
            if (left is double and right is int)
                convert right to double
            if (left is int and right is double)
               convert left to double
            if (both values are doubles)
                do double logic
                do overflow checks
            otherwise
                call helper function to do really expensive things
            

            Different sets of checks would be elided if you statically new what the type would be, e.g. a|b|c knows a|b will produce an int, a-b will produce a number of some kind. The codegen would also reorder things if it thought it was worthwhile, for example in general you would assume a/b is more like to involves doubles so you do the double checks earlier or even just remove the int paths entirely. This was before there was any kind of “optimizing” jit or layers of jits at all, because the memory usage of the generated code was a real problem: all those checks and overflow checks and setting up for and calling helper functions produced some horrific code that was very large. The showcase logic was so significant that in JSC all of the slow paths were emitted at the end of the generated code for a function (or global code, eval, etc), such that codegen would walk the bytecode twice, the first time it would emit fast paths with uninitialized jumps, and the second time it would stamp down slow paths and back patch the appropriate labels. This was a particularly painful thing to manage as you had to ensure that the way the slow path emitted code matched what the fast path was doing.

        2. 4

          The first part: about changing the “shape” of the object messing up the optimization of functions being called with similar arguments, kind of galvanizes something I’ve been arguing for the last year or two.

          (This is slightly off-topic and really has almost nothing to do with Elm, itself)

          To be clear, I’m not saying that this is “proof” that my opinion is right, or that this kind of problem is unsolvable, etc. I’m merely claiming that I came to have an opinion (which I’ll express in a second), and that this information seems to be consistent with that opinion.

          I’ve been saying for a while now that trying to transplant functional programming styles and techniques into languages that are not designed for functional programming is a mistake. Especially ones with “heavy” runtimes that do runtime optimizations/JITing, etc.

          JavaScript’s design is clearly expecting programmers to mutate data in-place. That’s the reason that the standard array and map types’ APIs have mutable APIs, and it’s the reason there hasn’t been a way to “deep copy” things until just recently (or in the near future?). It’s why “records” and “tuples” are just now being added, etc.

          As someone who fell deep into the FP rabbit hole, I love functional programming languages. I like Scala, OCaml, even Clojure despite my strong preference for static types. But, FP is not the only programming style that can offer correct/safe code, despite the hype.

          My advice is to go with the flow. Sure, minimizing mutation is naturally going to make your code easier to understand, but minimizing mutation by taking a copy of something and never using the old copy ever again is effectively the same thing as mutating it when it comes to a human trying to understand the interaction of the system. In fact, your inner mental model as you’re skimming the code is probably thinking something like: “Okay, then we take the Foo object and update its bar field…”

          There’s also a contradiction I see in the online programming communities. I see people (rightly) complain about how slow and bloated software has become; whether it’s web-apps or even local apps, yet I also see people repeat the advice that computers are fast and it doesn’t matter if you make umpteen copies of an array in your function by calling arr.map().filter().reduce() over and over. I’m not sure we can reconcile those two viewpoints without some mental gymnastics (e.g., “It’s not the code, it’s just that the software is doing more stuff.”).

          Now, after rambling about all that, I want to add that I think my “advice” here doesn’t apply (as much) to people who are writing programming languages, like Elm. It’s more about people writing libraries and applications in a non-functional language trying to force it to be functional. But, it does apply a little bit to Elm, too, because it’s always going to be an uphill battle to write a language to target a runtime that somewhat hates your language’s programming paradigm…

          1. 2

            I see people (rightly) complain about how slow and bloated software has become

            I’d say it’s mostly dependent on who says it. The developers who have to optimize their code will say that it’s fine to make copies of things (until they measure it and find out it’s making the app slow), and the rest who just notice that the app is slow.

            it’s always going to be an uphill battle to write a language to target a runtime that somewhat hates your language’s programming paradigm

            At very low levels (machine code) the paradigm is very imperative and non-functional. Since languages always build to lower-level languages (in some sense, not saying JS is lower-level than Elm), meaning that there will necessarily be a level where the paradigms don’t .

            JavaScript is actually not such a bad target, because it allows passing functions as arguments and things like that which can be harder in some other targets. In fact, you can kind of see Elm as a subset of JavaScript, so there’s not really much that is too hard to translate. And the performance is usually very good for a JavaScript app (because the language already optimizes plenty of things, including the keys that I mentioned). It’s just that because we always want the language to be faster than it is today that we try to find new optimizations, and they’re hard to benchmark.

            1. 1

              I’d say it’s mostly dependent on who says it. The developers who have to optimize their code will say that it’s fine to make copies of things (until they measure it and find out it’s making the app slow), and the rest who just notice that the app is slow.

              I agree and disagree. I agree because a lot of times the people who are commenting on the perceived speed or memory consumption of a piece of software don’t know enough to be informed about whether it’s good or bad engineering that went into making that way.

              On the other hand, I think it’s also valid for even a non-tech-savvy individual to sit down at a computer today and wonder why it takes longer to load the god-damned calculator app than it took in the 90s on their Pentium II PC.

              Obviously there’s more to it than making extra copies of objects and arrays in whatever programming language- that was just an obvious/simple example I chose to pick on. But, as a developer myself, I can’t help but notice in my own work that there is an attitude that there’s almost no limit to how algorithmically poor your code can be, because “Computers are fast and IO dwarfs everything,” (which fails to account for the fact that many code paths have very few IO calls and potentially very many allocations and GC garbage).

              As a specific example, I challenge anyone here to find a typical Java project that uses Hibernate to talk to a SQL database, and compare that to a version where you take out Hibernate completely and just shoot raw query strings and use the (horrible) ResultSet API to extract the data returned from the DB call. Again, this is just a random concrete example, but I would bet real currency that the speed up will be statistically significant, even including the time for the IO.

              Admittedly, this has nothing to do with the original topic. But my point here is that we, as developers, keep doing things that are slow and then there’s a contingency of us that are gaslighting us into believing that what we’re doing isn’t actually making our software slower.

              Also, benchmarking/measuring isn’t a silver bullet. If all of your code is 10% slower than it should be, then your measurements won’t find any bottlenecks because it’s all relative.

              At very low levels (machine code) the paradigm is very imperative and non-functional. Since languages always build to lower-level languages (in some sense, not saying JS is lower-level than Elm), meaning that there will necessarily be a level where the paradigms don’t .

              Definitely true. But it’s also true that adding more layers means more “friction”. So we already have the machine code layer and we’ll lose some maximum performance by translating a functional language to machine code. But then we have a virtual machine on top of the machine code, so that VM loses some performance, but has its own stuff that it optimizes for. Then we have Elm which targets the VM which target the machine code, so there will be losses at every level.

              The/your article points out another example of it: the addition operator. Since JS doesn’t expose type-specific addition functions, every time we add two numbers in Elm, the runtime is checking if our numbers are Strings or BigIntegers or whatever else, even though we know for sure that they’re numbers, already. That’s the cost of targeting a runtime that’s designed for more-or-less the opposite of what you’re trying to do. I’m not saying that you shouldn’t do it. Goodness knows I rather never have to write actual JavaScript ever again if I can help it…

          2. 4

            Nice post

            I didnt know about object key order

            I wonder in general for such an optimized target if the best thing to do is produce as close to “modern idiomatic” JavaScript, I get this is probably not super well defined either but I imagine that it would be the most optimized path, right? I never looked at elm output but i wonder how much if there’s any really esoteric js output?

            1. 16

              Object key order is super annoying, and historically it also applied to indexed properties, essentially until Google said f-this and shipped a browser that didn’t respect that and through pure market pressure made it safe for other browsers to also switch to indexed properties on non-array objects being enumerated in numeric order. We used to have sites where a huge amount of the runtime was consumed by integer->string conversions due to non-array objects being used in array like ways, and so we had to perform the conversion to have any kind of sane lookup logic. Similarly for(in) being terrible on arrays as the index is a string, so enumerating the indexed properties on an array means {0,1,2,3...} each does an integer to string conversion \o/.

              Even better were the semantics of for(in) in the face of the object you’re enumerating being modified during enumeration. The behavior is that if a key is removed during enumeration you will not see that key during enumeration, and fair enough you might say, but then it gets insane. If the property still exists in the prototype chain it is still visited. If the property is re-added during the enumeration, before it would have been visited in the original enumeration order the property still gets enumerated in the same place it would have been originally. If the property is re-added after the point where you would have visited it originally then you won’t see it.

              for(in) enumeration also visits all the enumerable properties on the prototype chain, but does not visit a property name multiple times, even if the object being enumerated has properties that shadow prototype properties.

              The result of this behaviour is that a correct implementation starts for(in) enumeration by creating an array of every property that could be enumeration, and then enumeration means iterating the list, looking up the property, and then checking to see if the property still exists, and then enter the body if appropriate. Note that this also applies to arrays. Basically something along the lines of:

              let alreadySeen : Set<String> = new Set<String>();
              let propertyNames : [String] = [];
              let currentObject : Object = subjectObject
              do {
                let objectProperties : [String] = GetAllPropertyNames(currentObject)
                for (let i = 0; i < objectProperties.length; i++) {
                  let propertyName = objectProperties[i]
                  if (alreadySeen.contains(propertyName) continue;
                  alreadySeen.add(propertyName)
                  propertyNames.append(propertyName)
                }
                currentObject = GetPrototype(currentObject)
              } while (currentObject != null);
              for (let i = 0; i < propertyNames.length; i++) {
                let propertyValue = subject.getProperty(propertyNames[I])
                if (propertyValue == empty) continue;
                doForBody(propertyValue)
              }
              

              The behavior of property enumeration in for(in) when the prototype chain is mutated is not specified by the standard, because by the time of ES3.1 - which was the first edition of the spec where we really started trying to fix the unspecified or simply incorrect parts of the spec - the major JS engines of the time (SpiderMonkey, JSC, whatever Trident used) had different behavior as the prototype chain was mutated, and despite our best efforts we couldn’t create a specification that wouldn’t break real content for at least one engine due to the gratuitous single-engine targets and/or UA sniffing code. Given Chrome’s stronger than IE control of what webdevs have decided is correct behavior and the reduction user agent sniffing (because their only UA sniffing is used to tell non-chrome users that the site requires chrome) the behaviour could probably now be specified as being whatever chrome does.

              All that said, enumerating properties in insertion order is useful for many reasons - there are lots of cases where code is much simpler as you don’t need to manually maintain your own ordering, but also from a spec writers point of view you need to be able to define an enumeration order as the alternative is each JS engine trying to reverse engineer each other’s behavior due to websites that end up relying on enumeration order and similar. So while fundamentally the reason for specified enumeration or of object keys is in order because that’s what netscape and IE did, even modern objects like Map and Set define insertion order enumeration of entries.

              1. 8

                This all sounds… horrendous. I’m glad I don’t have to navigate in that madness :D Thank you for the insght!

                1. 2

                  I believe for(in) was discouraged in JavaScript: the good parts for these reasons. At the time if you wanted any kind of decent performance, you had to navigate that stuff

                  1. 4

                    It’s still discouraged :D

                    We did a lot of work to improve performance, but enumerating arrays (and array likes) requires us to do an int->string conversion for every index (that isn’t empty). For object property enumeration we cache the property names and key use of that list off of the structure (the name for hidden classes in JSC, other’s use shape, etc), then we attempt to convert the subsequent index into that object with the the iterated property name into a direct offset access into that object’s storage.

                    But seriously @zladuric is correct, don’t use for(in) - use for(of) as it’s more likely to give you what you want, even though it can technically be slower than for(in) in some cases.

            2. 1

              I’d probably benchmark against a simple, non-JIT JavaScript interpreter, and treat statistally significant improvements in that setting as successful optimisations (even if they cause performance regressions in a particular browser).

              At the end of the day you’re shipping a .js file, so you should use the cost model implied by the JavaScript standard. I think it’s unreasonable to expect someone to optimise against the intersection of multiple undocumented browser JIT cost models.

              1. 7

                This is not a good plan. The cost model implied by the standards do not remotely correspond to the cost model of any production JS engine, not even if you force them to only run with their interpreter.

                The ECMAScript specifies behaviour such that this super simple code

                function f(n) {
                  var x = 1;
                  for (let I = 0; I < n; I++) {
                    x = x - I
                  }
                  return x
                }
                

                requires behaviour that no production JS engine would consider even in a non-JIT mode. Let’s go through what the spec requires, but that modern JS engines would not do

                function f(n) {
                  // 1. Allocate a new activation object (I think the latest iteration of the spec doesn't call them activations?)
                  // 2. push that object onto the top of the scope chain
                  // 3. Insert a new DontDelete property labeled "n" with the value of the first
                  //    parameter into the activation object
                  //    This means allocating property storage on the activation object (as
                  //    far as the spec is concerned it's
                  //    basically a normal JS object, so the [[PutDirect]] operation here does
                  //    the same pile of hash lookups and
                  //    allocation of storage space
                  // 4. Insert a new DontDelete property labeled "x" and set it to the value 
                  //    undefined, with the same costs as for the "n" above.
                  var x = 1;
                  // 5. Allocate a new value of type Number containing the value 1
                  // 6. Walk up the current scope chain to find activation the has the property "x" - that is
                  //    a. for (currentScope = top of scope chain; currentScope != empty; currentScope = currentScope)
                  //    b.    if [[HasOwnProperty]](currentScope, "x") break;
                  // 7. Perform [[GetOwnPropertySlot]](currentScope, "x")
                  // 8. Test that the slot has not been initialize, and throw an exception if it has been
                  // 9. Perform [[PutOwnProperty]] to store the object containing the value 1 into the "x" slot of currentScope
                  // 10. allocate an object of type number, containing the value 0
                  // 11. Repeat steps 1 and 2
                  // 12. Repeat step 3 for the name "I", assigning the value from 10
                  //     for (let I = 0; I < n; I++) {
                  // 13. Repeat step 6 for "I"
                  // 14. Repeat step 7 for "I"
                  // 15. Test the slot from 14 to ensure that it has been initialized,
                  //     and throw an exception if not
                  // 16. Load the value from 15
                  // 17. Repeat steps 1 and 2
                  // 18. Repeat step 6 for "I"
                  // 19. Repeat step 7 for "I"
                  // 20. Test the slot from 19 to ensure that it has not been initialized,
                  //     and throw an exception if it has
                  // 21. Store the value from 16 to the slot from 19
                  // 22. Repeat step 3 for the name "I", assigning the value from 10
                  // 23. Repeat step 6 for "I"
                  // 24. Repeat step 7 for "I"
                  // 25. Test the slot from 24 to ensure that it has been initialized,
                  //     and throw an exception if not
                  // 26. Repeat steps 23-25 for "n"
                  // 27. Allocate an object of type Boolean to contain the result of 
                  //     perform the < operation on 25 and 26
                  //     The logic to determine how < is performed is based on the types
                  //     of the input and its not unreasonable for an interpreter to
                  //     perform those checks, so we'll exclude these from our "things
                  //     that ES says happen but don't actually happen in reality"
                  // 28. Perform [[ToBoolean]] on the object from 27, then if false
                  //     a. pop the top scope from the scope chain
                  //     b. terminate the loop
                    x = x - I
                  // 29. Repeat step 6 for "x" -- this is looking up the activation
                  //     we'll be storing to
                  // 30. repeat steps 23-24 for "x"
                  // 31. repeat steps 23-25 for "I"
                  // 32. perform [[ToNumber]] on the value from step 30
                  // 33. perform [[ToNumber]] on the value from step 31
                  // 34. allocate a new object of type number, containing the result
                  //     of subtracting <33> from <32>
                  // 35. Look up the property slot for "x" from the object you found 
                  //     in step 29
                  // 36. Perform [[Put]] using the object from 19, the slot from 25,
                  //     and the value from 34. [[Put]] checks for readonly properties
                  //     and setters
                  // 37. Repeat steps 6 and 7 for "I"
                  // 38. If I is not initialized throw an exception
                  // 39. load the value from the slot in 38
                  // 40. Perform [[ToNumber]] on the value from 39
                  // 41. Create a new object of type number containing the result
                  //     of adding 1 to the value from step 40
                  // 42. Pop the top off the scope chain
                  // 43. Repeat 6&7 for I
                  // 44. store the result of 41 to the slot from 43
                  // 45. loop back to 13
                  }
                  // 46. pop the top off the scope chain
                  return x
                }
                

                Now, this isn’t 100% accurate as I’m going entirely off memory, and this isn’t a super great code writing environment :D, but the general gist is correct. The important thing though is “oh look how explicit and/or long winded the spec is”, it’s to look at what the cost model for the spec is.

                In this model there are at least 4 object allocations on each iteration of the loop, plus a few outside of it. None of those objects would be created in a modern JS engine, so that will dramatically skew what your performance looks like.

                There are also I think more than 12 table lookups per iteration according to the spec model, and those lookups are expensive. 15+ years ago there was a significant performance difference between

                var a = expr1
                var b = expr2
                return a + b
                

                and

                return expr1 + expr2
                

                Entirely due to the repeated property lookups.

                The cost of property access in ECMAScript is similarly table look ups.

                None of these costs is remotely representative of how engines work, so trying to apply the results of performance tests you have against this hypothetical “spec comparable” engine, to any other engine would not be remotely correct or usable.

                We can look at what the non-optimizing interpreter from jsc does for the above:

                [   0] enter              
                [   1] get_scope          dst:loc4
                [   3] mov                dst:loc5, src:loc4
                [   6] check_traps        
                [   7] mov                dst:loc6, src:<JSValue()>(const0)
                [  10] mov                dst:loc6, src:Int32: 1(const1)
                [  13] mov                dst:loc7, src:<JSValue()>(const0)
                [  16] mov                dst:loc7, src:Int32: 0(const2)
                [  19] jnless             lhs:loc7, rhs:arg1, targetLabel:19(->38)
                [  23] loop_hint          
                [  24] check_traps        
                [  25] sub                dst:loc6, lhs:loc6, rhs:loc7, profileIndex:0, operandTypes:OperandTypes(126, 126)
                [  31] inc                srcDst:loc7, profileIndex:0
                [  34] jless              lhs:loc7, rhs:arg1, targetLabel:-11(->23)
                [  38] ret                value:loc6
                

                You can see that there is no property lookup, and no allocation is needed to represent numbers in JSC.

                Again, this is the absolute earliest layer of codegen, where the latency of source being download to having started execution is absolutely critical.

                We can do something a bit silly to try and show what the ES model is

                function f(n) {
                  with ({n: n, x: 1}) {
                    with ({I: 0}) {
                      for (let I = 0; I < n; I++) {
                        with ({I: I}) {
                          x = x - I
                        }
                      }
                    }
                    return x
                  }
                }
                

                Now “spec like” cost model would likely not have a significant performance impact from this horror, but the code the interpreter executes is too insane to fit in a comment, so I put it here: https://pastebin.com/REZpCSty

                A simple test of the above shows that if we force JSC to operate without a jit the “ES like” example is approximately 50x slower. With the jit enabled the first run of the insane case is around 78x slower than the normal case. The next time round - e.g. after the optimizing stages have kicked in - the absurd case is around 430x slower than the normal version. Logging the GCs, found that the ES like path allocated enough objects to trigger 66 garbage collection cycles. The sane code version did not trigger a single GC.

                Again the point of all this is not that the spec cost model is slower, it’s that it is very simply not relevant to real world performance.

              2. 1

                The whole sum operation starts performing 4 times slower than before on Firefox (and 7% slower on Chrome).

                Well. Optimise only for Chrome. Nobody uses Firefox anyway. (If you’re using FF it doesn’t mean any significant percentage will use, for my app ff users is <1%)

                1. 9

                  Optimise only for Chrome. Nobody uses Firefox anyway.

                  Setting aside Firefox, Safari exists and is the only browser a lot of people use. Just optimising for one browser is a good way to create a monoculture, which is a situation the web doesn’t want to be in. You should also maybe mention you work for Google when making suggestions like this.

                  1. 4

                    And yet people say Safari is the new IE :-/

                    Making your code only work in chrome, or only caring about behaviour and performance in chrome, is a surefire way to make chrome the only browser, this is exactly how IE became such a dominant browser, and how MS gained functionally so much control of what expected browser behaviour was.

                    Chrome is the OS included browser on the absolutely dominant mobile OS/platform. That makes it essentially the only browser on that platform (look at the usage statistics, it’s no different from IE on windows - technically people could install other browsers, but statistically that was a rounding error). This time we have an IE that is also dominant on linux, has a massive share on Mac (>30%), etc. We already have developers taking Chrome shipping features as being mandatory standards, and forcing other browsers to support them in spite of privacy concerns, etc.

                    Google also leverages it’s position as the top two most visited sites in the world, its absolute dominance as an email provider to aggressively push chrome on every user, repeatedly directing people to install it.

                    I would hope that what you’re currently doing, and what you are explicitly encouraging might give you some insight into why all those “terrible developers” (quoting general commentary, not you) kept writing IE specific code: IE was obviously the only browser that statistically mattered. It is really important that you understand that there was a period where not only was IE technology better for webdevs than netscape, but also was a better browser for end users: it was faster, used less memory, didn’t have a super weird UI, etc. MS didn’t really start leveraging it’s monopoly position until after that point, because once it got there it was able to rely on devs targeting primarily IE, so the built in browser also “Just Worked”.