1. 27
  1. 5

    I wonder if they were aware of systemd’s existing dash encoding scheme.

      1. 1

        That’s it! Or at least that’s a tool to work with them.

        It’s domain specific, but for paths, having to escape dashes is a real pain in the bum :-)

    1. 4

      hmm, my immediate inclination is that I’m not a big fan of this: it breaks the assumption that path.split(”/”) gives a useful list of components, so parsing Datasette paths requires custom code with special cases for certain prefixes. (Or maybe you can unconditionally dash-decode the path when you split it into components, regardless of whether the path actually contains a dash-encoded component? That’s a little better, I guess, but it still means you can’t use normal path-parsing code for Datasette paths.)

      (This also affects the little path-parser in our head looking at Datasette URLs.)

      Maybe this isn’t a big deal - presumably you’ve already implemented the parsing for Datasette itself, I guess parsing Datasette paths isn’t something other software is likely to need to do. But it still seems a little ugly - I’d prefer an encoding scheme that didn’t keep literal slashes in the path.

      1. 3

        The only thing that should be parsing the path section of a URL (with allowances given for relative paths and concatenation) is some component of the application that generated it. Otherwise, things get dicey.

      2. 2

        Interesting that unlike percent-encoding this has exponential properties if nested. So for example if the primary-key of your table “web-hits” is “datasette url” you view it at /web--hits, which generates /web--hits/-/web----hits, which generates /web--hits/--/web--------hits---/web--hits which generates /web--hits/--/web----------------hits-------/web----hits. (Example done by hand, probably wrong). In contrast percent-encoding would produce /web%2Dhits, then /web%2Dhits/%2Fweb%252dhitsthen/web%252Dhits/%252Fweb%252dhitshen/web%25252Dhits/%25252Fweb%25252dhits`

        It is a rarely thought of property about encoding but I find it interesting. Much like C escaping of \ as \\ the escaping of - with -- makes the worst-case encoding overhead 2x even when repeated. The worst case for percent-encoding is 3x the first time, but then 3/5 growth the next time and improving from there.

        1. 1

          I once tried to percent encode URLs after a path like /api/http%whatever but also ran into the double escape problem, so this was relatable content.

          1. 1

            Did you consider using url-safe base64 encoding?

            1. 3

              He covers that in his article. He had a somewhat narrow and niche use case. This approach makes sense in that light.

              1. 1

                You’re right, I missed that point.

                I feel some of those requirements are self-imposed with questionable real-world implications, like, is for users really that important to be able to read the URL?

                Maybe in his case it truly is, but in general, I would always prefer to keep things simple and as “standard” as possible.

                1. 2

                  For this kind of application (or any API really) the URL is a key part of the user interface. From a UI perspective, this solution is much simpler - certainly much less confusing than “URL is a garbled mess for these tables and not for others”. And looking at the code, it’s much simpler than base64 encoding too.

                  Whip up an RFC and send it to IETF and then it’ll be “standard” too…

              2. 1

                I had the same thought. Article doesn’t address it explicitly, but does say that one of their goals was to modify the encoded data as little as possible.