1. 30

  2. 22


    1. 5

      Currently the // slashes have meaning described in the RFC 7595:

      Schemes SHOULD avoid improper use of "//". The use of double slashes in the first part of a URI is not a stylistic indicator that what follows is a URI: double slashes are intended for use ONLY when the syntax of the <scheme-specific-part> contains a hierarchical structure. In URIs from such schemes, the use of double slashes indicates that what follows is the top hierarchical element for a naming authority (Section 3.2 of RFC 3986 has more details). Schemes that do not contain a conformant hierarchical structure in their <scheme-specific-part> SHOULD NOT use double slashes following the "<scheme>:" string.

      1. 1

        Angle bracketed content in the quote got swallowed by the markdown parser.

        The RFC reads double slashes are intended for use ONLY when the syntax of the <scheme-specific-part> contains a hierarchical structure. A “scheme-specific-part” is defined in terms of “hier-part”… which doesn’t appear to be defined in the grammar.

        1. 0

          I fixed it by adding backticks.

          This is exactly why I prefer XML-based markup languages for anything serious.

          1. 1

            I’m not sure which side of the seriousness boundary we’re on here, but I’m glad I don’t have to write HTML in these comment boxes :-)

            Do you know what hier-part is supposed to mean? I think it’s it’s everything following the : and preceding an optional ? or # delimiter. So it would include the host. But is it the DNS hierarchy that’s being referred to, or the filesystem hierarchy? Because, ever since the advent of dynamic routing, the existence of the latter is no longer the common case for web URIs – although dynamic routes often are styled to suggest a hierarchy.

            Anyway, that RFC is from 2015 and the OP is from 2009, so it seems like a bit of a semantic retrofit.

            1. 4

              I’m glad I don’t have to write HTML in these comment boxes

              Though latex is probably my favourite, I honestly would prefer html to markdown. It’s much saner. E.G. nesting in markdown makes no sense.

              1. 1

                It’s referring to both, although the host portion is optional. It’s defined in the BNF in the RFC:

                hier-part     = "//" authority path-abempty
                              / path-absolute
                              / path-rootless
                              / path-empty

                Here, authority refers to the DNS portion (that may include user information), and the various path-* rules are local to a host (thing filesystem, but they don’t have to refer to actual files).

                1. 1

                  You mean RFC3986… OK. But why is a literal "//" included? I don’t see how this grammar (spread between two RFCs) can ever produce a URL without the leading double slashes.

                  1. 2

                    Checkout RFC-2234 (the BNF RFC, latest version RFC-5234). The bare ‘/’ in this rule means “or”. The hier-part is defined as four different formats.

          2. 1

            I wonder how many things choke when given URIs like smtp:lobste.rs

            1. 2

              I would hope with something like “unsupported URL scheme”. I ran your example through a URL parser and it handled it just fine, although it assigned “lobste.rs” to the path portion because it’s missing the ‘//’.

              1. 1

                it assigned “lobste.rs” to the path portion because it’s missing the ‘//’.

                FWIW this is correct. If there’s no slashes then there’s no authority section, which means anything after the colon is the path.

          3. 3

            Suggest tag: historical

            1. 3

              Slashes aren’t a problem. Browsers don’t display them, and in common use “www.” works well enough as a recognizable web URL prefix.

              What has been a bigger mistake, and is probably going to lead to URLs’ demise, is inconsistency whether hierarchy goes right-to-left (host) or left-to-right (path), which results in the most important part to sitting in the middle, where it’s hard to parse for an average person.

              1. 4

                I love a good demise! URLs seem pretty entrenched, though. What could possibly replace them?

                1. 1

                  Browsers don’t display them

                  They don’t?

                  1. 1

                    Browsers can help with that though, so it shouldn’t be the job of the average joe to skim through the path to figure out where they are.

                    1. 1

                      That’s what I mean by URL demise, e.g. Safari doesn’t display URLs any more, only the domain name.

                      OTOH if URLs were in the form com/example/yada/yada average joe could parse them, they could be truncated in a straightforward way if they didn’t fit, so browser vendors wouldn’t have a good reason to stop displaying URLs.

                  2. 3

                    FWIW, browsers recognize URLs without the //. Try e.g. https:lobste.rs.

                    (Lobsters apparently doesn’t recognize it, though.)

                    1. 3
                      1. Wrap it in angle brackets. <http:lobste.rs> becomes http:lobste.rs

                      2. I’m pretty sure that’s specified in the WHATWG URL specification, which considers HTTP plus a few others to be “special” schemas, but not in the IETF URL specification, which has no such concept.

                    2. 3

                      I wonder how to take the article. Leaving out forward slashes seems like the kind of decision you could have made at the time.

                      However, I’m tempted to say that a lot more things in the scheme that are unnecessary today. Including the scheme makes sense if you’re using some kind of client that connects to servers using a wide variety of clients. But that’s not the case any more. You’re overwhelmingly using a browser that connects to things over https. Similarly, the idea of www.example.com denoting something different from example.com seems like a shibboleth for old timers. I’m aware that you can treat them differently, but it’s not clear why you’d want to foist that complexity onto end-users.

                      I don’t dispute that this perspective would’ve seemed absurd in 1989. That’s why I don’t know how to take it–is Lee just describing the things that could’ve been identified as mistakes at the time, or does he think the forward slashes are the only thing about the scheme that was unnecessary?

                      1. 5

                        Users can just type host name and path into a browser, and it works, so the end-user problem no longer exists. Browser address bars don’t accept URLs, but an undefined, constantly changing heuristic syntax. They can even automatically derive a search URL from what’s entered. But there are certainly non-HTTP URLs in common use, like file:, mailto:, and lots of custom schemes used for “deep linking” on mobile platforms, so the full URL syntax is still very much alive and useful.

                        1. 2

                          Keep in mind that it wasn’t obvious how the internet would take shape. The idea that everyone would be using browsers, or their phones, was far from inevitable. It could just as easily have been, for example, that the “protocol” would have ended up as the signifier for the app to use, and therefor not redundant (like file extensions).

                          But I agree that the // has no real utility, and could have been easily avoided.

                          1. 1

                            Agreed. I tried to say that the idea probably made sense at the time. Working with a scheme so that one client could access things via multiple protocols was logical when HTTP was just a budding idea. It’s just that today, browsers exist, HTTP itself has become more complex, and so dedicated tools that don’t need a scheme are the norm.

                        2. 3

                          I remember in a dark corner of my brain that someone once said to me that the original intent was to place some additional information between the two slashes. I don’t have time to check that right now, but maybe someone finds it entertaining to dig that out…

                          1. 2

                            For anyone wondering “why ?” (year 2000, bbc was late to the party !) :

                            Q: What is the history of the // ?

                            A: I wanted the syntax of the URI to separate the bit which the web browser has to know about (www.example.com) from the rest (the opaque string which is blindly requested by the client from the server). Within the rest of the URI, slashes (/) were the clear choice to separate parts of a hierarchical system, and I wanted to be able to make a link without having to know the name of the service (www.example.com) which was publishing the data. The relative URI syntax is just unix pathname syntax reused without apology. Anyone who had used unix would find it quite obvious. Then I needed an extension to add the service name (hostname). In fact this was similar to the problem the Apollo domain system had had when they created a network file system. They had extended the filename syntax to allow //computername/file/path/as/usual. So I just copied Apollo. Apollo was a brand of unix workstation. (The Apollo folks, who invented domain and Apollo’s Remote procedure call system later I think went largely to Microsoft, and rumor has it that much of Microsoft’s RPC system was).

                            I have to say that now I regret that the syntax is so clumsy. I would like http://www.example.com/foo/bar/baz to be just written http:com/example/foo/bar/baz where the client would figure out that www.example.com existed and was the server to contact. But it is too late now. It turned out the shorthand “//www.example.com/foo/bar/baz” is rarely used and so we could dispense with the “//”.

                            1. 1

                              I mean, it’s a syntax for multi-protocol resource locators. It’s hard for me to think of what alternative syntax might be more palatable to people.

                              IMO the actual issue is that people want to ignore everything besides HTTP and find the ‘uniform’ aspect of URLs cumbersome :)