1. 61
  1.  

  2. 26

    I co-authored the Subresource Integrity specification and agree with almost everything this article is saying. The benefits of a public CDN are mostly useless since browsers have split caches per first-party origin (which I agree is the right thing to do!). I think SRI is mostly useful when you are using a big paid edge CDN that you can’t fully trust.

    I don’t agree with the IPFS bits. I don’t think that’s realistically usable for any website out there.

    1. 3

      As someone who hosts a lot of client websites with IPFS I wonder what you think makes it harder?

      1. 2

        How do you do that?

        1. 5

          Pin the site to at least two IPFS nodes (usually one I run and also pinata.cloud) then set the dnslink TXT record to /ipfs/hash and then A or CNAME flatten to an IPFS gateway (often cloudflare, because why not make them pay the bills? But pinata.cloud also a great option if you are paying them anyway)

          1. 6

            So you still need to host a copy, and you need a regular CDN to serve and scale it. That exactly like old-school HTTP, but with extra steps.

            1. 3

              No, CloudFlare offers a public IPFS gateway, you don’t need a “regular CDN”

              1. 8

                Cloudflare abstracts the IPFS away, so that users don’t talk to the IPFS network. Users just connect over HTTP to the company’s own CDN servers running nginx, just like everyone else who’s not using IPFS in any way. IPFS here is not handling the traffic nor distributing the content in a meaningful way.

                Such setup makes the protocol behind the CDN entirely irrelevant. It could have been RFC 1149 too, or an Apache in someone’s basement, and it wouldn’t make a difference.

                1. 2

                  Yeah but you don’t have to run anything except the IPFS node. No exposing ports via a firewall, no configuring a public CDN, etc.

                  Pin your files in IPFS and point DNS to CloudFlare. Done!

            2. 2

              Can you explain what you are gaining from this? If this is routed through a third party gateway, how do you get baked-in integrity checks? Not for the user, only somewhere in the backend, no? I’m inexperienced with ipfs and I might misunderstand bits. But would be happy to learn more about your setup.

              1. 2

                I get redundancy (because my content is in 2+ places) so if my box is down, or even if both the nodes I pin on are down, the site keeps serving due to caching at the edge, but even when the cache expires usually one of my two pins are up.

                I get simplicity. Some content I can pin on machines in my house without any additional port forwarding, etc. My client can pin the content on their machine or by uploading to pinata.cloud for free and I just add the hash to my deployment and it streams live from them. No more wetransfer.

                And I get the future. If a user wants integrity checking or p2p serving, they just need the browser extension and it will load my sites from IPFS directly and not use the CDN proxy.

              2. 1

                Very cool! Thank you!

        2. 16

          cached content is no longer shared between domains. This is known as cache partitioning and has been the default in Chrome since October 2020 (v86), Firefox since January 2021 (v85), and Safari since 2013 (v6.1). That means if a visitor visits site A and site B, and both of them load https://public-cdn.example/my-script.js, the script will be loaded from scratch both times.

          This means the primary benefit of shared public CDNs is no longer relevant for any modern browsers.

          Huh, interesting, I didn’t know about that. I assumed the reason that would be given was that many people bundle their dependencies together, which has made public CDNs redundant because you’re not trying to load many individual <script>s anymore.

          1. 10

            Another very tertiary reason is there are so many different versions of libraries that the hit rate wasn’t that high anyway.

          2. 13

            So many pages don’t work for me because they have important stuff on a CDN, and third-party scripts are blocked by default on my main browser. Then I open the list in order to temporarily allow it, assuming it’s some site-name-hash.cdn-co.com thing, see there’s 34 different domains, mostly gibberish, and just close the tab as too hard.

            1. 8

              Library version fragmentation made public CDNs only marginally useful a long time ago. When there was only a handful of versions of jQuery or Angular or whatever, it made a little more sense. When websites are referring to a massive spectrum of libraries the chances of finding yours already cached is low.

              1. 3

                I wish I could upvote this twice. The author addresses the clickbait-y ness of the title in an interesting and ultimately informative way, and the coverage of the topic is balanced, even offering some cases where using a public CDN makes a lot of sense.

                1. 3

                  Isn’t the whole point of unkpg and cdnjs availability and ease of access? The caching bit seems like a cherry on top compared to having modules readily loadable from the browser with a script tag. Sadly, you can’t ED6-import from all of these sites, but that would be great. Deno does that well (https imports).

                  1. 3

                    I wouldn’t say public CDNs are completely obsolete. What this article does not take into consideration is the positive impact of geographic locality (i.e. reduced RTT and packet loss probability) on transport layer performance. If you want to avoid page load times on the order of seconds (e.g. several MB worth of javascript over a transatlantic connection) either rely on a public CDN or run your own content delivery on EC2 et al. Of course this involves more work and potentially money.

                    1. 2

                      This would only apply if whatever you’re fetching from the CDN is really huge. For any reasonably small file the transport performance is irrelevant compared to the extra handshake overhead.

                      1. 1

                        It does apply for smallish file sizes (on the order of a few megabytes). It mainly depends on how far you have progressed the congestion window of the connection. Even with an initial window of 10 MSS it would take several RTT to transfer the first megabyte

                        1. 3

                          There’s a benefit if you use a single CDN for everything, but if you add a CDN only for some URLs, it’s most likely to be net negative.

                          Even though CDNs have low latency, connecting to a CDN in addition to the other host only adds more latency, never decreases it.

                          It’s unlikely to help with download speed either. When you host your main site off-CDN, then users will pay the cost of TCP slow start anyway. Subsequent requests will have an already warmed-up connection to use, and just going with it is likely to be faster than setting up a brand new connection and suffering TCP slow start all over again from a CDN.

                          1. 1

                            That is definitely interesting. I never realized how expensive TLS handshakes really are. I’ve always assumed that the number of RTTs required for the crytpo handshake are what the issue is, not the computational part.

                            I wonder if this is going to change with QUICs ability to perform 0-RTT connection setups.

                            1. 1

                              No, CPU cost of TLS is not that big. For clients the cost is mainly in roundtrips for DNS, TCP/IP handshake and TLS handshake, and then TCP starting with a small window size.

                              Secondary problem is that HTTP/2 prioritization works only within a single connection, so when you mix 3rd party domains you don’t have much control over which resources are going to load first.

                              QUIC 0-RTT may to help indeed, reducing the additional cost to just an extra DNS lookup. It won’t solve the prioritization problem.

                    2. 1

                      A last brief tangent to finish: one technology that I do see being very promising here in future is IPFS.

                      Look at the source and your tune will change quickly.

                      1. 12

                        IPFS is not alone though, there are other “browsable torrent” style things like Dat.

                        Problem with all of them is, getting anything P2P involved in browsing is a huge privacy downgrade. Instead of having a private conversation with the server over HTTPS, worst case you’re basically screaming “I’m downloading this hash” into the world for every little thing you see.

                        Maybe we need some other way to leverage content addressing, one that would e.g. default to only communicating with the original server, but allow also fetching from other places as configured (so like as a user you could have preferences like “use archive.org to get content when unavailable on the origin” or “prefer fetching from the cache on my server, then my friend’s server, otherwise fetch from origin”). On the other hand that’s probably a doomed idea since most people don’t understand enough of this to have any preferences like that :/

                        1. 5

                          Use a truncated hash to attain k-anonymity when querying broadly, then return a mapping of full hashes and peers that have the source data.

                          You could probably also embed peer public keys in the process so the actual transmission has end-to-end encryption.

                          1. 6

                            A truncated hash helps against broadcasting to everyone, but there is still a possibility of “everything knowing” adversary that says that it has most of the resources people download, even if it doesn’t, and as such gathers what everyone is downloading. Or it could have all the resources of one specific site it wants to monitor. Sadly, P2P doesn’t really work for anonymity as long as we have consistent identifiers. It probably could work over Tor though, but you’re not getting any latency improvements there.

                            1. 3

                              A truncated hash helps against broadcasting to everyone

                              Yep, that was the specific threat model my suggestion was meant to mitigate.

                              but there is still a possibility of “everything knowing” adversary that says that it has most of the resources people download, even if it doesn’t, and as such gathers what everyone is downloading. Or it could have all the resources of one specific site it wants to monitor.

                              Okay, but that’s a different threat model entirely.

                              Truncated hashes solve the “tell everyone in the network what you’re looking for” problem. It can’t stop someone from doing Sybil attacks to respond to post-broadcast download requests. (Such an attack doesn’t even need to host the content; it can merely MitM the request and having enough nodes near the user to win the race.)

                              It may sound like the same thing, superficially, but it’s distinct. You can solve the broadcast problem without bogging down the Tor network with a crapton of P2P traffic. Remember, “Don’t make perfect the enemy of good” is a thing.

                              (You could even let the requestor choose the truncation length and/or send a challenge-response nonce that serves as an HMAC key for calculating the response over the actual content hash if you wanted to be cheeky.)

                              1. 3

                                I believe that it’s exactly these kinds of attacks that are the ones people care the most about. People don’t exactly care about some nuggets of information that occasionally leak through for several parties. People care about bad actors that accumulating such information and processing it for themselves. Denying little snippets that leak for everyone doesn’t really do anything if there’s still someone who can harvest all of your data.

                            2. 3

                              So the idea would be to send, say, the first 6 bytes of a hash, then every node would send a list of the hashes of all their files which start with those 6 bytes?

                              Presumably, the popularity of content on the IPFS follows some sort of power law. So out of the, say, 100 (or 100, or whatever) pieces of content which matches your truncated hash, the most popular item would be vastly more popular than the second most popular item, etc. I feel like you could build up a fairly good profile of what content someone has probably downloaded by just monitoring the truncated hashes they query. How do you protect against that?

                              1. 2

                                Websites usually need several resources, so I suspect in aggregate you couldn’t make the hashes truncated enough to avoid identifying sites from the n different hashes they use.

                                There are attacks that can detect popular sites from HTTPS packet sizes alone, which is why HTTP/2 added ability to send random padding.

                        2. 1

                          Not even mention the only real use case for CDNs? Being close to users so page load is faster.

                          1. 0

                            With growing application size, CDNs providing minor improvements already becomes irrelevant.