1. 72
  1. 13

    All the parsers that get it “wrong” parse it the same way, which actually demonstrates interoperability. I’ve checked Rust’s url parser based on the WHATWG spec, and it also sees the path as //:http://@http://http://.

    So I guess the conclusion is that it isn’t a valid URL, as the username:password part can’t contain literal slashes. The closest valid one is http://http:http:@http://http://?http://#http://.

    1. 11

      Inasmuch as user:pass@host gets translated into HTTP Basic auth, this seems correct: RFC 7617 specifies that the password (and username) cannot contain CTL characters, which includes /.

      1. 4

        The user agent is permitted to escape any characters within a portion of the URI that are illegal if unescaped, but also may choose to continue to display the unescaped form of those characters. For example, you can have ‘/’ as part of a path segment, and ‘&’ as part of a query parameter name, as long as they are escaped.

      2. 8

        The reason that different parsers get it wrong in the same way is because the spec is garbage, so any time an implementer has a question on how to interpret the vagueness of the URI spec, they go to a different user agent implementation and try to reverse engineer its behavior into their own personal spec addendum.

        Also, I’m personally guilty of doing this now several times, with 3986 in particular (which I’ve implemented now in 3 different languages over the past 30 years, most recently two months ago).

        The spec, as it is written, is unimplementable. There are simply too many errors in the spec. And too many catch-alls (“get out of jail free” expressions) in the pseudo-BNF.


        1. 5

          The thing is that there are two conflicting standards and nobody ever bothered to combine them.

          1. 2

            Which ones are conflicting? https://datatracker.ietf.org/doc/html/rfc3986#section-3.2.1 doesn’t allow slashes either.

            As for combining the URL/URI/IRI mess, WHATWG did just that.

            1. 1

              Well and newline handling in WHATWG. E.g. http://example\n.org is http://example.org. I think it comes from HTML being written in terminals with max column widths. For some reason that tolerance has gone into the URL parser rather than being handled in the HTML-parser, though I don’t know why.

          2. 2

            Being more forgiving with special characters in the password definitely seems like something that curl might be asked to do in a bug report.

          3. 4

            As if any standard meant much here

            I felt the weight of this sentence.

            1. 2

              …as if millions of voices suddenly cried out in terror and were suddenly silenced.

            2. 5