1. 24
  1. 17

    Modern browsers don’t cache requests to CDNs across multiple domains, since that can be used to track users — this means that even if someone has already downloaded the library you’re including from the CDN on one website, they’ll have to download it again when they visit your website.

    This is still news to a lot of developers. It bears repeating.

    What I do is just download the library that I want and include the files in my repo, just like any other source file. I usually do this in a directory called vendored or 3p, and be sure to include the version number of the package that I downloaded in the filename. This takes maybe 60 seconds more work than including the CDN version, which seems worth it to me for the privacy, robustness, and speed benefits.

    I’m with you up to here. I’ve heard the treat-dependencies-as-source approach before. I don’t think it’s realistic for a couple of reasons:

    1. JavaScript dependencies listed on npm are a common malware vector. Expecting everyone to read all of the source of all their dependencies themselves and understand it before they copy/paste it into their source is not going to be as effective as an npm audit report.
    2. I’ve inherited a couple of projects that are set up like this. It tends to be fairly error-prone. If someone leaves off a version number or forgets a sub-dependency then you spend a day tearing your hair out trying to fix a version conflict or just look up the right version of the API documentation. Tools like LibMan for .NET automate this process for a marginally better developer experience, but it isn’t as robust as npm, which will make sure the right versions of sub-dependencies and peer dependencies are installed.

    npm can be a frustrating tool to use to be sure, but your proposal strikes me as only marginally more secure than using a CDN and with some significant drawbacks.

    1. 5

      I agree that vendoring dependencies isn’t always the right way to go about things, but as an alternative to pulling things in from a CDN, it’s usually what people want. If I’m pulling in more than a couple dependencies, or dealing with a dependency graph, then I use a real package manager. But for something like pulling in jQuery or lodash and building on top of it, NPM doesn’t really buy you anything but complexity.

      I don’t think there are people out there loading a giant dependency graph from CDNs — typically that’s something people only do for small projects, and so my suggestion here was implicitly assuming that scale — it’s “what you should do instead of pulling from a CDN”, not “how you should manage your dependencies in general” :)

      1. 4

        Installing directly from NPM can also improve your JavaScript dev experience since TypeScript can load type definitions automatically for many packages. If those packages are not in a standardized place, then you likely won’t get as much help from your other tools.

        I have a tendency to use a bundling tool and then just specify the packages part of the vendored chunk. It usually has very similar benefits with caching, but it can be overall fewer bytes, fewer open concurrent connections, and it works with more standardized tooling.

        1. 2

          I don’t think there are people out there loading a giant dependency graph from CDNs

          They do, unfortunately. WordPress and ecommerce sites are especially liable to ballooning JS dependencies. What starts out as just “jQuery or lodash” multiplies with plugins and other common dependencies like Bootstrap. It’s what jsdelivr’s combine fearure is for.

          Context is everything. If it’s just you or you and a team of people who have agreed to JavaScript austerity, it makes sense to do as you say. But as general advice, I still disagree.

          1. 2

            NPM doesn’t really buy you anything but complexity.

            It just does the thing you’re manually doing. The problem with NPM/Yarn IMHO is that it has a shitty default: it always uses ^ for the version constraint rather than allowing you to configure it or (a way better strategy) using ~. Caret means “give me whatever the latest of this is other than the next major because everyone uses SemVer properly and nothing ever goes wrong”. Tilde means “I want patch updates but not major or minor versions because I’m a realist and people don’t use SemVer correctly and/or cannot predict every possible breaking change”

            Why can’t I do this?

            yarn add jquery --tilde
            

            and then it could end up like

            {
              "dependencies": {
                "jquery": "~2.0.0"
              }
            }
            

            The JS ecosystem makes package management harder than it needs to be. Even Bundler does a better job at this.

            1. 1

              Totally. You can use the @ syntax, but you still have to look up the version number on https://semver.npmjs.com and then:

              npm install --save jquery@~2.0.0
              

              Would love to be able to:

              npm install --save jquery@~latest
              

              …or something like it to install the latest version and peg it to the latest patch. Another crappy default is not adding an engines field automatically. I’ve seen so many newcomers to npm waste time on npm installation errors that turn out to be caused by environment incompatibility.

              I cringe that I’m recommending npm at all, but I have yet to encounter anything that’s substantially better. Bower found a middle road of complexity long ago before they got shamed for the overlap with npm and fell into disuse.

        2. 5

          Given that browsers stopped caching CDN content globally (it’s now double-keyed per top site), I’d generally agree that using a CDN may not be worth the complexity anymore.

          P.S. Editor of the Subresource Integrity spec and maintainer of https://www.srihash.org/ here. Thanks for the plug. :)

          1. 3

            We vendor our JS (except for Amazon’s Payment Gateway script, which I’m not 100% sure is safe to store ‘offline’ like this, I recall seeing some dynamically generated part of its makeup), however we don’t rely on versions in file names;

            We instead use a backend route that combines the resources required for a given page, and generates a hash for them. This way, if any of the resulting assets (js files) change, the hash will change, and thus the URL will change. The pages that load the JS themselves are cached via varnish so there’s no speed penalty for hashing those files on every request.

            1. 1

              I’m using nginx already. Should just proxy the packages I use?

              1. 1

                What advantages would proxying bring you? Wouldn’t you just add more latency compared to mirroring them?

                1. 3

                  Well it avoids the “customer says the site doesn’t work because they have a browser plugin that blocks 3rd party JS” issue I guess.

                  1. 1

                    …yes, but that wouldn’t be a problem with mirroring - storing and serving the JS from the same host as the website itself - either, right? Naive proxying would add an extra roundtrip to between your host and upstream and I don’t understand why that would be preferable for javascript? It would make sense for data which is to big to store at the same host ofc. And it can become pretty similar to mirroring with the right caching strategy.

                    1. 3

                      No you’re right, the problem is better solved by storing and serving the JS yourself;

                      But if someone says they’re going to proxy stuff like this I have to assume that means they’re also knowledgeable enough to mean they’re going to proxy it with a cache, thus avoiding the extra roundtrip, serving from the same domain.

                      Part of the work that went into the caching method I mention in my other comment was greatly reducing our third party JS use, but I can see how proxying (again, with a cache) a few scripts that we (a) trust to be reliable and (b) have to use a third party resource for - for us this would mostly be Braintree’s JS and Amazon Pay’s JS - would actually be a benefit in terms of developer workload. We get the benefit of loading the script that the vendor provides (i.e. fixes for issues they find) but without the downsides of a third party domain load on the page.

                      I don’t know that we’ll actually ever do that, but it’s a scenario I can see that might be worth it.

                      1. 1

                        Yes, this is what I meant. Proxying it with a cache. This would achieve the “privacy” and “speed” objectives of the original post. I don’t know if the security part would work.

                        1. 1

                          Thanks for the insights! I was wondering why one would voluntarily introduce cache-invalidation as a potential problem to solve for a few MB in javascript on disk - but your example of third-party JS resources which might not be packaged, like those payment providers, makes perfect sense

                    2. 2

                      It reduces tracking because the customers IP isn’t passed to the CDN and can improve lagency as your proxy may already have a connection open to the CDN while the client won’t.

                      But I agree, why not just download it at build time and remove the rubtime dependency on the CDN?

                      1. 1

                        why not just download it at build time and remove the rubtime dependency on the CDN?

                        Well, nothing else gets downloaded at build time right now. In fact, for a long time my “build” was git push.