1. 32
  1.  

  2. 13

    It’s actually worse. A named element can overwrite (clobber) a native property.

    E.g., two radio buttons with name childNodes will overwrite the form.childNodes and an iteration will only see the radio buttons. No other childnodes.

    This attack is called DOM Clobbering.

    1. 4

      You’re right. I wanted to also write a DOM clobbering specific follow-up and forgot to mention it in the article.

    2. 9

      Yup, it contributes to the cost of non-let/var properties on the global object being slow, as the semantics of caching lookups of regular become much more unpleasant. Yet another reason to actually declare your variables I guess? :)

      1. 4

        as the semantics of caching lookups of regular become much more unpleasant

        Can you elaborate what you mean by this?

        1. 8

          It boils down to how lexical property lookup works in JS engines. My knowledge is specific to JSC, but I believe the other major engines do things similarly.

          We’ll start with property access, and then move over to scoped lookup. High performance JS engines do not model properties exactly as the ES spec does, instead they have a set of distinct logical property access modes:

          1. Regular property accesses. e.g. a = {x:…}; …a.x…,a.x = …; etc

          2. Property creations: a={}; a.y=…;

          3. Property accessors: a= {get z(), set z()}; a.z += 5

          4. Host driven properties - A good example of a property that isn’t considered “special” would be function.arguments, function.name, etc. These are properties that more or less match the behavior of accessors from (3) in that they have a slot in the object storage that references a GC object (e.g. they’re just very light internal VM objects not full JS ones, e.g. they have no concept of properties, etc in JSC they’re something close to struct GetterSetter {.StructureID structure; JSValue (*get)(ArgumentBuffer&); JSValue (*set)(ArgumentBuffer&); }

          5. Integer properties. This is kind of complicated, but also kind of not: The ES spec does not distinguish between indexed property access (obj["foo"]) and direct access (obj.foo), nor does it generally distinguish between integral and string properties, e.g. obj[1] is the same as obj[“1”]. Historically other than arrays, strings, etc that was how engines worked - someObject[i] would result in someObject[i.toString()], because basically until google clout made whatever V8 does == the spec, the ES spec mandated observable behaviour (mostly around property enumeration) that essentially required you do it this. way When V8 was released they said screw that behavior, integer properties always go first, and then had two storage buffers so integer property access could always be treated as an array index. Once V8 shipped breaking the godawful old behavior, JSC and SM were able to also drop the terrible semantics and switch to similar “all JS objects can be array-like-ish”

          6. “Magic” properties: these are properties that aren’t represented as part of the actual JS Object’s concept of a slot. Things like indexed properties in a string or a NodeList, but also things like the properties created for elements with an id. The critical part is that because these values aren’t really JS properties they can change without actually mutating the object.

          7. Finally there is lexical lookup. Lexical look up is special, because until you reach the global object, and assuming a lack of |with| (because it’s awful, generally no one used it in perf sensitive code back then, and in modern code people simply do not use it at all), or non-strict-mode |eval| (because it can insert new variables into the scope chain), then the JS VM can statically resolve variable accesses. So any lexical lookup of a declared variable becomes “1. step N times up the scope chain. 2. load/store the Mth element in that flat buffer” - for the global object we can do even better and say “load the Mth variable declared on the lexical global object”. But that only works for var or let declarations because of how the behavior is semantically different from a normal property (they’re not deletable, enumerable, etc). For normal properties added (say globalThis.x = 5), we fall back to standard property lookup, e.g. someUndeclaredIdentifier += 5 is equivalent to globalThis.someUndeclaredIdentifier.

          So I think everyone understands the general idea of how property lookup caching works in modern JS engines. A basic recap though is that every object has some “class” or “structure” such that if to normal JS objects have the same properties, with the same attributes, then those objects will have the same descriptor. This means you can cache lookups by saying if an object of structure S has a property P at offset O, then I can store that and the next time I hit this property access if the new object also has structure S, then I can skip looking up the property and simply reuse offset O to get the property. We can also do the inverse: if the object does not contain that property we can cache that and then repeat this logic going up the prototype chain - this is useful because it allows caching of the lookups for things like [].map, where map is on the array prototype.

          Hopefully we can see how the magic properties from (6) can screw with things. Now if we have an object with magic properties we can’t simply say “the last time we were here with structure S we found P at offset O, this object also has structure S, so lets load that property from that offset” because if you changed (for example) the id of an element in the DOM the JS concept of an object structure means the object structure does not change. Similarly the inverted case where the value isn’t missing we can’t necessarily cache the prototype access - as might happen in someNodeList.forEach(…).

          So now we get to a lexical lookup of an element by its id. Now these properties aren’t explicitly declared variables so we don’t get to do our high speed direct access and fall back to the normal property lookup. so

          `<div id=foo>Hi!</div>
           <div id=bar>Wat</div>
           <script>
             function getInnerText() {
               return foo.innerText
             }
             console.log(getInnerText())
             foo.id="womp"
             bar.id = "foo"
             console.log(getInnerText())
           </script>
          `
          

          Can’t cache the lookup of id - generally JSC will know that it should cache the property lookup function itself, but can’t get the actual value for the specific property.

          The final - very long winded - result is that

          `

          `

          is in the order of 8-10x faster than

          `

          `

          If you’re forcing real work, like the decidedly non trivial .innerText, the variable based look up drops to merely 2x faster - given the cost of innerText this will hopefully give some idea of the cost of the uncached/uncacheable? lookup.

      2. 5

        The document object has a number of collections on it that work like this too: embeds, forms, images, links, and scripts, as well as direct references to the head and body elements. Interestingly it also has styleSheets, but I don’t believe the stylesheets are exposed as properties on the returned object, and I’m not at a computer to check.

        1. 1

          But please don’t :D

          Reasonable indexed access for those elements is much easier to maintain than the named access

        2. 4

          I remember use of this being discouraged multiple decades ago, and am astonished to find that it’s still around.

          I wonder if anyone who runs a search engine has ever done a census of sites that still actually depend on ancient things like this, and quantified the effect of just dropping support – I have to imagine that a huge percentage of things that used to use this back in the day have just silently linkrotted their way off the internet since then.

          1. 7

            The crome web platform security team is tracking this as an anti-pattern they would like to deprecate.

            See “DOM clobbered variable accessed” in the “Bad Markup” section at https://deprecate.it/ - access is currently at 10% of page views, though it’s unclear and hard to measure if missing access would actually break anything.

            1. 3

              TIL about https://deprecate.it/, thanks for sharing.

          2. 2

            They can, but you and whoever has to maintain your mess (probably also you) will be happier if’n ya don’t.

            1. 0

              It might not return what you think. According to the spec, when there are multiple instances of the same named element in the DOM — say, two instances of div class=“cool” — the browser should return an HTMLCollection with an array of the instances. Firefox, however, only returns the first instance. Then again, the spec says we ought to use one instance of an id in an element’s tree anyway

              Someone is confused between class and id…

              1. 1

                A minor detail :D

                1. 1

                  Ouch, yeah, haha. Let me see if I can fix that.