1. 15
  1. 5

    We used to rely a lot on CDB (which is a really great piece of software). It performed blazingly quick and produces compact files. We only stopped using it when our data started growing close to the 4 GB limit

    I’m confused about CDB. You don’t really describe it but it seems to be important based on the title you submitted. You link to a Spotify page where they say they don’t use it. They don’t describe it further.

    1. 9

      CDB is the venerable key-value embedded database implemented by D.J. Bernstein, https://cr.yp.to/cdb.html, and was used in many software solutions implemented by him, notably qmail and tinydns.

      From there it was picked up by many other network services that required mostly-static low-overhead lookup tables, like for example Postfix as an alternative for its user database.


      Thanks for pointing out this oversight. I’ve updated the Why CDB? section with the text above.

      1. 2

        Thanks!

    2. 4

      I was thinking that something like this should exist. Nice to see it implemented, it’s looking pretty good! I’m just a bit confused about this part:

      the server does not support per-request decompression / recompression

      the server will serve all resources compressed (i.e. Content-Encoding: brotli), regardless of what the browser accepts

      Wouldn’t it be possible to store multiple versions of the files and choose according to Accept-Encoding header? Sending content in an encoding that the UA doesn’t claim to support seems wrong to me.

      1. 1

        Wouldn’t it be possible to store multiple versions of the files and choose according to Accept-Encoding header?

        Yes, that would be a possibility if such a requirement would be really needed.

        At the moment the site operator has three choices:

        • bundle two separate archives, one with compression and one without (provided it fits in 4 GiB), and use something like HAProxy to choose between the two backends based on the Accept-Encoding header;
        • (most likely) use a CDN service (like CloudFlare) that would do caching and re-encoding based on what the actual client supports;
        • choose to just support compressed contents, based on the next observation;

        Sending content in an encoding that the UA doesn’t claim to support seems wrong to me.

        Any “modern enough browser” (say since 2018, and here I include things like Netsurf), does support even Brotli compression, let alone gzip. Indeed Lynx (the console browser) doesn’t support Brotli, but it still does support gzip.

        Thus, for all “real clients” out there, serving straight compressed contents is not an issue, thus one could choose to just ignore the Accept-Encoding header.

        1. 3

          Real clients are not limited to browsers. For example, Python does not support brotli.

          1. 1

            Yes, and if such a client would be among the intended clients for one’s use-case, then one could switch to gzip compression, or even disable compression.

            However as said, Kawipiko’s main use-case is serving mostly static web-sites (blogs, products, etc.) that are to be consumed by real browsers (Firefox, Chrome, Safari, etc.), and thus it provides optimizations towards this goal.

          2. 2

            It’s still against spec. I don’t see a reason why an archive couldn’t have multiple compression types bundled within it, and why kawpiko couldn’t handle choosing the correct one. FWIW, even the identity encoding isn’t guaranteed to be acceptable.

            1. 1

              Well, I’ve just looked over the HTTP specification regarding Accept-Encoding (RFC 7231 – HTTP/1.1 Semantics and Content – Section 5.3.4, Accept-Encoding), and from what I see I would say Kawipiko’s behavior (of just sending the compressed version it has) is compliant with the specification:

              An Accept-Encoding header field with a combined field-value that is empty implies that the user agent does not want any content-coding in response. If an Accept-Encoding header field is present in a request and none of the available representations for the response have a content-coding that is listed as acceptable, the origin server SHOULD send a response without any content-coding.

              The emphasis is on SHOULD which states that a well-behaved server should send the response without compression, but in the end it’s not a hard requirement and the server could just ignore the requested encoding.

              Granted, this is by applying the “letter of the law”, thus a well-behaved server should do things differently. However, as stated many other times, Kawipiko tries to be practical in its choices, at least for the use-cases it was meant to be used.

              1. 3

                If you’re going to talk about the definition of SHOULD, then it also includes the sentiment that you document a very good reason for breaking the rule.

                IMO, “everyone implements brotli right?” is not a sufficient reason at the current time, but “everyone implements gzip right?” does rise to the threshold of SHOULD-breaking. In particular, older webkit (e.g. iPhones) are something you should be concerned about.

                1. 1

                  You are right, when one chooses to “bend” the rules (as is with this SHOULD), one should also clearly document the reasoning. So I’ll try to summarize here my reasoning for choosing to ignore the Accept-Encoding header:

                  • first of all the website developer has to make this choice! he can instruct kawipiko-archiver to use no compression, use gzip (perhaps with the zopfli compressor), or use brotli; thus the choice of breaking the rules is his; (by default kawipiko-archiver doesn’t use any compression;)

                  • however, according to caniuse.com and Mozilla Developer Network, one could expect that gzip compression works in all browsers, thus the website developer could choose to go with gzip compression without expecting compatibility issues; (see https://caniuse.com/sr_content-encoding-gzip and https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding#browser_compatibility;)

                  • with regard to brotli compression, it’s a tradeoff between storage (and bandwidth) reduction (which sometimes are negligible compared with gzip) and compatibility;

                  • and finally, the recommended way to deploy any serious web-site these days is behind a CDN; which could go two ways for older clients: either they won’t be supported by the CDN itself (say TLS compatibility issues) in which case it’s pointless to talk about compression, either the CDN will take care of decompressing in case the client doesn’t support it (see here) in which case it’s again pointless to talk about compression on the origin side; (granted CloudFlare only supports gzip compression on the origin side, and will just pass brotli on untouched, thus the wesite developer should choose gzip;)


                  However, to put things into perspective: how about TLS 1.1; the security experts strongly suggest disabling it, and some even suggest dropping TLS 1.2 in favour of TLS 1.3. Should we keep serving TLS 1.0 just because at some point in time someone had used a browser that doesn’t support at least TLS 1.2?

                  Just to be clear, I’m not for planed obsolescence; however I am for giving the opportunity to the user to sometimes take the most practical approach and bend some rules.