1. 33
  1.  

  2. 36

    When reading this article I wanted to echo the same thing that Daniel Steinberg basically said.

    DoH is necessary because the DNS community messed up over the past two decades. Instead of privacy, hop-to-hop authentication and integrity, they picked end2end integrity protection but no privacy (DNSSEC) and wasted two decades on heaping that unbelievable amount of complexity on top of DNS. Meanwhile cleanup of basic protocol issues with DNS crawled along at glacial pace.

    This is why DNSSEC will never get browser support, but DoH is getting there rapidly. It solves the right problems.

    1. 4

      I haven’t studied DoH or DoT enough to feel comfortable talking about the solutions, but on the requirements side, intuitively I don’t get where this all-consuming “privacy” boundary is supposed to be. Is the next step that all browsers will just ship with mandatory VPNs so nobody can see what IP address I’m talking to? (Based on history, that wouldn’t really surprise me.) So then there’s a massive invisible overlay network just for the WWW?

      And by “nobody” I mean nobody who doesn’t really matter anyway, since I’d think no corporation with an extensive network, nor any country with extensive human rights problems, is going to let you use either protocol anyway (or they’ll require a MITM CA).

      1. 5

        The end game is all traffic is protected by a sort of ad hoc point to point VPN between endpoints. There can be traffic analysis but no content analysis.

        1. 6

          We’re slowly moving towards “Tor”. It seems all privacy enhancements being implemented slowly build up to something that Tor already provides for a long time…

          1. 4

            “Tor all the things” would be awesome.. if it could be fast

            1. 2

              Or what dnscurve did years ago.

            2. 1

              But the point of this seems to be making the “endpoint” private as well. The line between “traffic” and “content” is ever blurrier — I wouldn’t have thought DNS is “content”. If it is, then I don’t know why IP addresses aren’t “content” just as much. Is this only supposed to improve privacy for shared servers?

              1. 8

                I’ve never thought of the content of DNS packets as anything other than content. Every packet has a header containing addresses and some data. The data should be encrypted.

                1. 1

                  I don’t think the argument is that simple. ICMP and ARP packets are also headers and data, but that data surely isn’t “content”. I would have made your statement just about application UDP and TCP.

                  I think of “content” as what applications exchange, and “traffic” (aka “metadata”) as what the network that connects applications needs to exchange to get them connected. Given that both DNS names and IP addresses identify endpoints, it’s not obvious to me why DNS names are more sensitive than IP addresses. The end result of a DNS lookup is that you immediately send a packet to the resulting IP address, which quite often identifies who you’re talking to just as clearly as the DNS name.

                  No doubt I’m just uneducated on this — my point was I don’t understand where that line is being drawn. When I try to follow this line of reasoning I end up needing a complete layer-3 VPN (so you can’t even see the IP addresses), not just some revisions to the DNS protocol.

                  1. 2

                    The end result of a DNS lookup is that you immediately send a packet to the resulting IP address

                    This is a very limited view of DNS.

                    1. 1

                      Is there another usage of DNS that’s relevant to this privacy discussion that’s going on?

                      1. 3

                        Most browsers do DNS prefetching, which reveals page content even for links you don’t visit.

                        1. 1

                          Good point! It makes me think that perhaps we should make browsers continually prefetch random websites that the users don’t visit, which would improve privacy in much the same way as the CDNs do. (Actually, I feel like that has been proposed, though I can’t find a reference.)

                          iTerm had a bug in which it was making DNS requests for bits of terminal output to see if they were links it should highlight. So sometimes content does leak into DNS — by either definition.

                          1. 1
                        2. 1

                          CNAME records, quite obviously, for one

                          1. 1

                            OK, obviously, but then is there something relevant to privacy that you do with CNAME records, other than simply looking up the corresponding A record and then immediately going to that IP address?

                            If the argument is “ah, but the A address is for a CDN”, that thread is below…I only get “privacy” if I use a CDN of sufficient size to obscure my endpoint?

                            1. 3

                              OK, obviously, but then is there something relevant to privacy that you do with CNAME records, other than simply looking up the corresponding A record and then immediately going to that IP address

                              I resolve some-controversial-site-in-my-country.com to CNAME blah.squarespace.com. I resolve that to A {some squarespace IP}

                              Without DoH or equiv, its obvious to a network observer who I’m talking to. With it, it is impossible to distinguish it from thousands of other sites.

                              If the argument is “ah, but the A address is for a CDN”, that thread is below…I only get “privacy” if I use a CDN of sufficient size to obscure my endpoint?

                              Yes, this doesn’t fix every single privacy issue. No, that doesn’t mean it doesn’t improve the situation for a lot of things.

                  2. 5

                    IP addresses are content when they are A records to your-strange-porno-site.cx or bombmaking-101.su.

                    They are metadata when they redirect to *.cloudfront.net, akamiedge.net, cdn.cloudflare.com, …, and huge swaths of the Internet are behind giant CDNs. Widespread DoH and ESNI adoption will basically mean that anyone between you and that CDN will be essentially blind to what you are accessing.

                    Is this better? That’s for you to decide ;)

                    1. 6

                      Well, here again I don’t quite get the requirements. I’m not sure it’s a good goal to achieve “privacy” by routing everything through three giant commercial CDNs.

                      1. 3

                        Because three CDNs are literally the only uses of Virtual Hosting and SNI on the entire internet?

                        I’d venture to say that the overwhelming majority of non-corporate, user generated content (and a large amount of smaller business sites) are not hosted at a dedicated IP. It’s all shopify equivalents and hundreds of blog and CMS hosting services.

                        1. 1

                          Well, the smaller the host is, the weaker the “security” becomes.

                          Anyway, I was just trying to understand the requirements behind this protocol, not make a value judgment. Seems like the goal is increased obscurity for a large, but undefined and unstable, set of websites.

                          If I were afraid of my website access being discovered, I personally wouldn’t rely on this mechanism for my security, without some other mechanism to guarantee the quantity and irrelevance of other websites on the same host/proxy. But others might find it useful. It seems to me like an inelegant hack that is partially effective, and I agree it’s disappointing if this is the best practical solution the internet engineering community has come up with.

                          1. 2

                            I have multiple subdomins on a fairly small site. Some of them are less public than others, so it would be nice to not reveal their presence.

            3. 4

              here be rants:

              everything get’s shoehorned atop of webby stuff. it’s getting harder (socially and technically) to use things in ways your several big brothers don’t want you to. got the wrong ip? solve 50 capchas to help cars not run over too many people, affecting stock value. interoperability? well we’ve got a “rest” api which is slower than parsing the website with beautiful soup. tables are represented as JSON arrays, thats the only thing we know! hell, let’s put markup in javascript in html because we use a platform for rendering static documents as remote-GUI toolkit. want to use a port different from port 80/443? well, too bad, we don’t allow those filthy things here. want a real connection? well we have web sockets in store! soon there will be a requirement for a tunnel to your next friendly cdn to do things. which effectively is like dialing into a remote mainframe, only that everything is http now. then we’ll finally have the love child of all the best parts: the blazing speed of browsers paired with the privacy of centralized data storage.

              1. 1

                I understand your point. But as hacky as DoH is: Unlike all the other crypto protocols, HTTPS actually works well, large scale and is somewhat battle-tested regarding protocol attacks (HTTP, TLS), cryptography, fuzzing.

                I’d much rather use Firefox’s HTTPS stack for DNS than, say, any other application using openssl.

                As an aside, this happening on port 443 is a feature, not a bug: Your DNS traffic can blend in. It’s known to be allowed in most, if not all setups.

                1. 2

                  I understand your point. But as hacky as DoH is: Unlike all the other crypto protocols, HTTPS actually works well, large scale and is somewhat battle-tested regarding protocol attacks (HTTP, TLS), cryptography, fuzzing.

                  why is it better then dns over tls?

                  edit: the crypto used is the same for tls, fuzzing is a problem for individual implementations. there may be undefined behaviours in the protocol definition, but DNS should be well aged by now so that these are known.

                  I’d much rather use Firefox’s HTTPS stack for DNS than, say, any other application using openssl.

                  this is a problem of using openssl and not a feature inherent to DoH.

                  As an aside, this happening on port 443 is a feature, not a bug: Your DNS traffic can blend in. It’s known to be allowed in most, if not all setups.

                  encryption should happen some layers further down, not on the application layer.

                  even if unpopular: people should trust those who’ve created much of the infrastructure we use today if they say something is a bad idea.

              2. 3

                “The Net interprets censorship as damage and routes around it.” – John Gilmore (1993)

                1. 2

                  I’m going to give a prediction here. The ability to mess with the DNS is too important to let it go, so lawmakers are going to interfer with this effort. Profane example, completely remote from what DoH is supposedly to prevent: a copyright holder (rightly) wants to block an illegal site distributing pirated content. The typical way to achieve this is that the country’s ISPs are issued to suppress resolution of the corresponding DNS entry. With DoH, this isn’t going to be possible anymore. Thus, lawmakers are going to forbid its use in browsers with the argument that it hinders copyright protection.

                  Another point: I remember from somewhere that Mozilla wants to send all your DNS queries to Cloudflare. Since Cloudflare is a US-based company, Mozilla is effectively transferring all the DNS queries to the USA. Doing so requires some efforts under the EU GDPR regulation. I don’t want to go into the details here, but I do have doubts on what the legal ground for the transmission as such and the transmission specifically into the US is if Mozilla doesn’t want to ask each and every Firefox user for their consent. If the CJEU at some point cancels the “EU-US Privacy Shield” (likely), it’s going to get even harder to legalise the data transmission. And that not only if one suspects that the NSA is going to be highly interested in Cloudflare’s global DNS resolver.

                  With all that, I think that the honourable goal DoH sets out for will not be reached; the effort harms privacy more than it helps it by ringing lawmaker’s warning bells and provoking a collision with the GDPR.

                  1. 2

                    On one hand: I agree that DNS-over-HTTPS is a silly and convoluted solution.

                    On the other hand: DNS-over-TLS is a bad solution for the reason pointed out: it lives on its own port.

                    Question: Why do we need ports any more at all? It seems like if we didn’t have dedicated port numbers, but instead referred to resources by subdomain or subdirectory beneath the main hostname, then all traffic would be indiscriminate when secured by TLS.

                    1. 4

                      Could it have been possible for DNS-over-TLS to use 443 and make the server able to route DNS and HTTP request appropriately? I’m not very knowledgable of TLS. From what I understand its just a transport layer so a server could simply read the beginning of an incoming message and easily detect if it is an HTTP or DNS header?

                      1. 9

                        Yes, much like http2 works. It complicates the TLS connection because now it passes a hint about the service it wants, but that bridge is already crossed.

                      2. 4

                        IP addresses allow two arbitrary computers to exchange information [1], whereas ports allow to arbitrary programs (or processes) to exchange information. Also, it’s TCP and UDP that have ports. There are other protocols that ride on top of IP (not that anyone cares anymore).

                        [1] Well, in theory anyway, NAT breaks that to some degree.

                        1. 3

                          Ports are kinda central to packet routing, if my understanding is correct, as it has been deployed.

                          1. 5

                            You need the concept of ports to route packets to the appropriate process, certainly. However, with DNS SRV records, you don’t need globally-agreed-upon port assignments (a la “HTTP goes to port 80”). You could assign arbitrary ports to services and direct clients accordingly with SRV.

                            Support for this is very incomplete (e.g. browsers go to port 80/443 on the A/AAAA record for a domain rather than querying for SRVs), but the infrastructure is in place.

                            1. 5

                              On what port do I send the DNS query for the SRV record of my DNS server?

                              1. 1

                                Obviously, you look up an SRV record to determine which port DNS is served over. ;)

                                I don’t know if anyone has thought about the bootstrapping problem. In theory, you could deal with it the same way you already bootstrap your DNS (DHCP or including the port with the IP address in static configurations), but I don’t know if this is actually possible.

                              2. 2

                                You need the concept of ports to route packets to the appropriate process

                                Unless we assign an IP address to every web facing process.

                            2. 1

                              Problem: both solutions to private DNS queries have downsides related to the DNS protocol fundamentally having failed to envision a need for privacy

                              Solution: radically overhaul the transport layer by replacing both TCP and UDP with something portless?

                              The suggested cure is worse than the disease, in this case, in terms of sheer amount of work, and completely replaced hardware and software, it would require .

                              1. 2

                                I don’t think DNS is the right place to do privacy. If I’m on someone’s network, he can see what IP addresses I’m talking to. I can hide my DNS traffic, but he still gets to see the IP addresses I ultimately end up contacting.

                                Trying to add privacy at the DNS stage is doing it at the wrong layer. If I want privacy, I need it at the IP layer.

                                1. 4

                                  Assuming that looking up an A record and making a connection to that IP is the only thing DNS is used for.

                                  1. 3

                                    Think of CDN or “big websites” traffic. If you hit Google, Amazon, Cloudflare datacenters, nobody will be able to tell if you were reaching google.com, amazon.com, cloudflare.com or any of their costumers.

                                    Currently, this is leaking through SNI and DNS. DoH and Ecrypted SNI (ESNI) will improve on the status quo.

                                    1. 2

                                      And totally screws small sites. Or is the end game centralization of all web sites to a few hosts to “protect” the privacy of users?

                                      1. 2

                                        You can also self-host more than one domain on your site. In fact, I do too. It’s just a smaller set :-)

                                        1. 1

                                          End game would be VPNs or Tor.

                                        2. 2

                                          Is that really true? I though request/response metadata and timing analysis coud tell them who we were connecting to.

                                          1. 2

                                            Depends who they are. I’m not going to do a full traffic dump, then try to correlate packet timings to discover whether you were loading gmail or facebook. But tcpdump port 53 is something I’ve actually done to discover what’s talking to where.

                                            1. 1

                                              True. maybe ESNI and DoH are only increasing the required work. Needs more research?

                                              1. 1

                                                Probably to be on safe side. Id run it by experts in correlation analyses on network traffic. They might already have something for it.

                                            2. 2

                                              nobody will be able to tell if you were reaching google.com, amazon.com, cloudflare.com or any of their costumers.

                                              except for GOOGL, AMZN, et al. which will happily give away your data, without even flinching a bit.

                                              1. 1

                                                Yeah, depends on who you want to exclude from snooping on your traffic. The ISP, I assumed. The Googles and Amazons of the world have your data regardless of DNS/DoH.

                                                I acknowledge that the circumstances are different in every country, but in the US, the major ISPs actually own ad networks and thus have a strong incentive not to ever encrypt DNS traffic.

                                                1. 1

                                                  Yeah, depends on who you want to exclude from snooping on your traffic. The ISP, I assumed. The Googles and Amazons of the world have your data regardless of DNS/DoH.

                                                  so i’m supposed to just give them full access over the remaining part which isn’t served by them?

                                                  I acknowledge that the circumstances are different in every country, but in the US, the major ISPs actually own ad networks and thus have a strong incentive not to ever encrypt DNS traffic.

                                                  ISPs in the rest of the world aren’t better, but this still isn’t a reason to shoehorn DNS into HTTP.

                                                  1. 1

                                                    No, you’re misreading the first bit. You’re already giving iit to them, most likely, because of all those cloud customers. This makes their main web property indistinguishable from their clients, once SNI and DNS is encrypted.

                                                    No need to give more than before.

                                                    1. 1

                                                      You’re already giving iit to them, most likely, because of all those cloud customers.

                                                      this is a faux reason. i try to not use these things when possible. just because many things are there, it doesn’t mean that i have to use even more stuff of them, quite the opposite. this may be an inconvenience for me, but it is one i’m willing to take.

                                                      This makes their main web property indistinguishable from their clients, once SNI and DNS is encrypted.

                                                      indistinguishable for everybody on the way, but not for the big ad companies on whose systems things are. those are what i’m worried about.

                                                      1. 1

                                                        Hm I feel were going in circles here.

                                                        For those people who do use those services, there is an immediate gain in terms of hostname privacy (towards their ISP), once DoH and ESNI are shipped.

                                                        That’s all I’m saying. I’m not implying you do or you should.

                                                        1. 1

                                                          I’m not implying you do or you should.

                                                          no, but the implications of DoH are that i’ll end up using it, even if i don’t want to. it’ll be baked into the browsers, from there it’s only a small step to mandatory usage in systemd. regarding DoH in general: if you only have http, everything looks like a nail.

                                          2. 1

                                            Alternative solution: don’t use DNS anymore.

                                            Still lots of work since we need to ditch HTTP, HTTPS, FTP, and a host of other host-oriented protocols. But, for many of these, we’ve got well-supported alternatives already. The question of how to slightly improve a horribly-flawed system stuck in a set of political deadlocks becomes totally obviated.

                                            1. 3

                                              That’s the biggest change of all of them. The whole reason for using DoH is to have a small change, that improves things, and that doesn’t require literally replacing the entire web.

                                              1. 1

                                                Sure, but it’s sort of a waste of time to try to preserve the web. The biggest problem with DNS is that most of the time the actual hostname is totally irrelevant to our purposes & we only care about it because the application-layer protocol we’re using was poorly designed.

                                                We’re going to need to fix that eventually so why not do it now, ipv6-style (i.e., make a parallel set of protocols that actually do the right thing & hang out there for a couple decades while the people using the old ones slowly run out of incremental fixes and start to notice the dead end they’re heading toward).

                                                Myopic folks aren’t going to adopt large-scale improvments until they have no other choice, but as soon as they have no other choice they’re quick to adopt an existing solution. We’re better off already having made one they can adopt, because if we let them design their own it’s not going to last any longer than the last one.

                                                DNS is baked into everything, despite being a clearly bad idea, because it was well-established. Well, IPFS is well-established now, so we can start using it for new projects and treating DNS as legacy for everything that’s not basically ssh.

                                                1. 8

                                                  Well, IPFS is well-established now

                                                  No it’s not. Even by computer standards, IPFS is still a baby.

                                                  Skype was probably the most well-established P2P application in the world before they switched to being a reskinned MSN Messenger, and the Skype P2P network had disasters just like centralized services have, caused by netsplits, client bugs, and introduction point issues. BitTorrent probably holds the crown for most well-established P2P network now, and since it’s shared-nothing (the DHT isn’t, but BitTorrent can operate without it), has never had network-wide disasters. IPFS relies on the DHT, so it’s more like Skype than BitTorrent for reliability.

                                                  1. 0

                                                    It’s only ten years old, sure. I haven’t seen any reliability problems with it. Have you?

                                                    DHT tech, on top of being an actually appropriate solution to the problem of addressing static chunks of data (one that eliminates whole classes of attacks by its very nature), is more reliable now than DNS is. And, we have plenty of implementations and protocols to choose from.

                                                    Dropping IPFS or some other DHT into an existing system (like a browser) is straightforward. Opera did it years ago. Beaker does it now. There are pure-javascript implementations of DAT and IPFS for folks who can’t integrate it into their browser.

                                                    Skype isn’t a good comparison to a DHT, because Skype connects a pair of dynamic streams together. In other words, it can’t take advantage of redundant caching, so being P2P doesn’t really do it any favors aside from eliminating a single point of failure from the initial negotiation steps.

                                                    For transferring documents (or scripts, or blobs, or whatever), dynamicism is a bug – and one we eliminate with named data. Static data is the norm for most of what we use the web for, and should be the norm for substantially more of it. We can trivially eliminate hostnames from all asset fetches, replace database blobs with similar asset fetches, use one-time pads for keeping secret resources secret while allowing anyone to fetch them, & start to look at ways of making services portable between machines. (I hear DAT has a solution to this last one.) All of this is stuff any random front-end developer can figure out without much nudging, because the hard work has been done & open sourced already.

                                                    1. 4

                                                      IPFS is not ten years old. Its initial commit is five years ago, and that was the start of the paper, not the implementation.

                                                      1. 1

                                                        Huh. I could have sworn it was presented back in 2010. I must be getting it confused with another DHT system.

                                                  2. 7

                                                    Sure, but it’s sort of a waste of time to try to preserve the web.

                                                    This is letting Perfect be the enemy of Good thinking. We can incrementally improve (imperfectly, true) privacy now. Throwing out everything and starting over with a completely new set of protocols is a multi-decade effort before we start seeing the benefits. We should improve the situation we’re in, not ignore it while fantasizing about being in some other situation that won’t arrive for many years.

                                                    The biggest problem with DNS is that most of the time the actual hostname is totally irrelevant to our purposes & we only care about it because the application-layer protocol we’re using was poorly designed.

                                                    This hasn’t been true since Virtual Hosting and SNI became a thing. DNS contains (and leaks) information about exactly who we’re talking to that an IP address doesn’t.

                                                    1. 2

                                                      This is letting Perfect be the enemy of Good thinking. We can incrementally improve (imperfectly, true) privacy now.

                                                      We can also take advantage of low-hanging fruit that circumvent the tarpit that is incremental improvements to DNS now.

                                                      The perfect isn’t the enemy of the good here. This is merely a matter of what looks like a good idea on a six month timeline versus what looks like a good idea on a two year timeline. And, we can guarantee that folks will work on incremental improvements to DNS endlessly, even if we are not those folks.

                                                      Throwing out everything and starting over with a completely new set of protocols is a multi-decade effort before we start seeing the benefits.

                                                      Luckily, it’s an effort that started almost two decades ago, & we’re ready to reap the benefits of it.

                                                      DNS contains (and leaks) information about exactly who we’re talking to that an IP address doesn’t.

                                                      That’s not a reason to keep it.

                                                      Permanently associating any kind of host information (be it hostname or DNS name or IP) with a chunk of data & exposing that association to the user is a mistake. It’s an entanglement of distinct concerns based on false assumptions about DNS permanence, and it makes the whole domain name & data center rent-seeking complex inevitable. The fact that DNS is insecure is among its lesser problems; it should not have been relied upon in the first place.

                                                      The faster we make it irrelevant the better, & this can be done incrementally and from the application layer.

                                                    2. 2

                                                      But why would IPFS solve it?

                                                      Replacing every hostname with a hash doesn’t seem very user-friendly to me and last I checked, you can trivially sniff out what content someone is loading by inspecting the requested hashes from the network.

                                                      IPFS isn’t mature either, it’s not even a decade old and most middleboxes will start blocking it once people start using it for illegitimate purposes. There is no plan to circumvent blocking by middleboxes, not even after that stunt with putting wikipedia on IPFS.

                                                      1. 1

                                                        IPFS doesn’t replace hostnames with hashes.It uses hashes as host-agnostic document addresses.

                                                        Identifying hosts is not directly relevant to grabbing documents, and so baking hostnames into document addresses mixes two levels of abstractions, with undesirable side effects (like dependence upon DNS and server farms to provide absurd uptime guarantees).

                                                        IPFS is one example of distributed permanent addressing. There are a lot of implementations – most relying upon hashes, since hashes provide a convenient mechanism for producing practically-unique addresses without collusion, but some using other mechanisms.

                                                        The point is that once you have permanent addresses for static documents, all clients can become servers & you start getting situations where accidentally slashdotting a site is impossible because the more people try to access it the more redundancy there is in its hosting. You remove some of the hairiest problems with caching, because while you can flush things out of a cache, the copy in cache is never invalidated by changes, because the object at a particular permanent address is immutable.

                                                        Problems (particularly with web-tech) that smart folks have been trying to solve with elaborate hacks for decades become trivial when we make addresses permanent, because complications like DNS become irrelevant.

                                                        1. 1

                                                          And other problems become hard like “how do I have my content still online in 20 years?”.

                                                          IPFS doesn’t address the issues it should be addressing, using hashes everywhere being one of them making it particularly user-unfriendly (possibly even user-hostile).

                                                          IPFS doesn’t act like a proper cache either (unless their eviction strategy has significantly evolved to be more cooperative) in addition to leaking data everywhere.

                                                          Torrent and dat:// solve the problem much better and don’t over-advertise their capabilities.

                                                          Nobody really needs permanent addressing, what they really need is either a Tor onion address or actually cashing out for a proper webserver (where IPFS also won’t help if your content is dynamic, it’ll make things only more javascript heavy than they already are).

                                                          1. 1

                                                            how do I have my content still online in 20 years?

                                                            If you want to guarantee persistence of content over long periods, you will need to continue to host it (or have it hosted on your behalf), just as you would with host-based addressing. The difference is that your host machine can be puny because it’s no longer a single point of failure under traffic load: as requests increase linearly, the likelihood of any request being directed to your host decreases geometrically (with a slow decay via cache eviction).

                                                            IPFS doesn’t address the issues it should be addressing, using hashes everywhere being one of them making it particularly user-unfriendly (possibly even user-hostile).

                                                            I would absolutely support a pet-name system on top of IPFS. Hashes are convenient for a number of reasons, but IPFS is only one example of a relatively-mature named-data-oriented solution to permanent addressing. It’s minimal & has good support for putting new policies on top of it, so integrating it into applications that have their own caching and name policies is convenient.

                                                            IPFS doesn’t act like a proper cache either (unless their eviction strategy has significantly evolved to be more cooperative) in addition to leaking data everywhere.

                                                            Most caches have forced eviction based on mutability. Mutability is not a feature of systems that use permanent addressing. That said, I would like to see IPFS clients outfitted with a replication system that forces peers to cache copies of a hash when it is being automatically flushed if an insufficient number of peers already have it (in order to address problem #1) as well as a store-and-forward mode (likewise).

                                                            Torrent and dat:// solve the problem much better and don’t over-advertise their capabilities.

                                                            Torrent has unfortunately already become a popular target for blocking. I would personally welcome sharing caches over DHT by default over heavy adoption of IPFS since it requires less additional work to solve certain technical problems (or, better yet, DHT sharing of IPFS pinned items – we get permanent addresses and seed/leech metrics), but for political reasons that ship has probably sailed. DAT seems not to solve the permanent address problem at all, although it at least decentralizes services; I haven’t looked too deeply into it, but it could be viable.

                                                            Nobody really needs permanent addressing,

                                                            Early web standards assume but do not enforce that addresses are permanent. Every 404 is a fundamental violation of the promise of hypertext. The fact that we can’t depend upon addresses to be truly permanent has made the absurd superstructure of web tech inevitable – and it’s unnecessary.

                                                            what they really need is either a Tor onion address

                                                            An onion address just hides traffic. It doesn’t address the single point of failure in terms of a single set of hosts.

                                                            or actually cashing out for a proper webserver

                                                            A proper web server, though relatively cheap, is more expensive and requires more technical skill to run than is necessary or desirable. It also represents a chain of single points of failure: a domain can be siezed (by a state or by anybody who can social-engineer GoDaddy or perform DNS poisoning attacks), while a host will go down under high load (or have its contents changed if somebody gets write access to the disk). Permanent addresses solve the availability problem in the case of load or active threat, while hash-based permanent addresses solve the correctness problem.

                                                            where IPFS also won’t help if your content is dynamic,

                                                            Truly dynamic content is relatively rare (hence the popularity of cloudflare and akamai), and even less dynamic content actually needs to be dynamic. We ought to minimize it for the same reasons we minimize mutability in functional-style code. Mutability creates all manner of complications that make certain kinds of desirable guarantees difficult or impossible.

                                                            Signature chains provide a convenient way of adding simulated mutability to immutable objects (sort of like how monads do) in a distributed way. A more radical way of handling mutability – one that would require more infrastructure on top of IPFS but would probably be amenable to use with other protocols – is to support append-only streams & construct objects from slices of that append-only stream (what was called a ‘permascroll’ in Xanadu from 2006-2014). This stuff would need to be implemented, but it would not need to be invented – and inventing is the hard part.

                                                            it’ll make things only more javascript heavy than they already are

                                                            Only if we stick to web tech, and then only if we don’t think carefully and clearly about how best to design these systems. (Unfortunately, endemic lack of forethought is really the underlying problem here, rather than any particular technology. It’s possible to use even complete trash in a sensible and productive way.)

                                                            1. 1

                                                              The difference is that your host machine can be puny because it’s no longer a single point of failure under traffic load: as requests increase linearly, the likelihood of any request being directed to your host decreases geometrically (with a slow decay via cache eviction).

                                                              I don’t think this is a problem that needs addressing. Static content like the type that IPFS serves can be cheaply served to a lot of customers without needing a fancy CDN. An RPi on a home connection should be able to handle 4 million visitors a month easily with purely static content.

                                                              Dynamic content, ie the content that needs bigger nodes, isn’t compatible with IPFS to begin with.

                                                              Most caches have forced eviction based on mutability

                                                              Caches also evict based on a number of different strategies that have nothing to do with mutability though, IPFS’ strategy for loading content (FIFO last I checked) behaves poorly with most internet browsing behaviour.

                                                              DAT seems not to solve the permanent address problem at all, although it at least decentralizes services; I haven’t looked too deeply into it, but it could be viable.

                                                              The public key of a DAT share is essentially like a IPFS target with the added bonus of having at tracked and replicated history and mutability, offering everything an IPNS or IPFS hash does. Additionally it’s more private and doesn’t try to sell itself as censorship resistant (just look at the stunt with putting Wikipedia on IPFS)

                                                              Every 404 is a fundamental violation of the promise of hypertext.

                                                              I would disagree with that. It’s more important that we archive valuable content (ie, archive.org or via the ArchiveTeam, etc.) than having a permanent addressing method.

                                                              Additionally the permanent addressing still does not solve content being offline. Once it’s lost, it’s lost and no amount of throwing blockchain, hashes and P2P at it will ever solve this.

                                                              You cannot stop a 404 from happening.

                                                              The hash might be the same but for 99.999% of content on the internet, it’ll be lost within the decade regardless.

                                                              Truly dynamic content is relatively rare

                                                              I would also disagree with that, in the modern internet, mutable and dynamic content are becoming more common as people become more connected.

                                                              CF and Ak allow hosters to cache pages that are mostly static like the reddit frontpage as well as reducing the need for georeplicated servers and reducing the load on the existing servers.

                                                              is to support append-only streams & construct objects from slices of that append-only stream

                                                              See DAT, that’s what it does. It’s an append-only log of changes. You can go back and look at previous versions of the DAT URL provided that all the chunks are available in the P2P network.

                                                              Only if we stick to web tech, and then only if we don’t think carefully and clearly about how best to design these systems.

                                                              IPFS in it’s current form is largely provided as a Node.js library, with bindings to some other languages. It’s being heavily marketed for browsers. The amount of JS in websites would only increase with IPFS and likely slow everything down even further until it scales up to global, or as it promises, interplanetary scale (though interplanetary is a pipedream, the protocol can’t even handle satellite internet properly)

                                                              Instead of looking at pipedreams of cryptography for the solution, we ought to improve the infrastructure and reduce the amount of CPU needed for dynamic content, this is a more easy and viable option than switching the entire internet to a protocol that forgets data if it doesn’t remember it often enough.