1. 8
  1.  

  2. 4

    Require query parameters to be sorted alphabetically

    I understand the benefit here but this is pretty user unfriendly, especially because the default dictionary in many languages does not allow sorting. Why not sort these under the hood, in a client library or similar?

    1. 2

      If we provided client libraries we would probably do that. But there’s only two of us who’ve worked on this api and we have client teams using Perl, Java, Scala, JavaScript & Objective-C. We don’t have capacity to provide & support good native clients for all of those. And as I pointed out, it might look like a hassle but it helps ensure that clients gets as much benefit from caching as they can, so they are not complaining. (At least not to me ;-)

    2. 3

      +1 for rejecting unknown params. People don’t like; I love it. :)

      I do param normalization in the app though. Sometimes similar but slightly different queries have the same result. Memcache is a good fit. E.g. I cache the result of the query (rows 1,2,4,8,11) and their final output.

      I think “query is well formed” is a good rule. “Query is well formed and like so” is a bit much.

      Have rearranged but identical queries been a problem in practice? I’d like to hear more about the circumstances under which you could say “we received X reqs but only X/10 uniques”.

      1. 1

        Yes, rearranged but identical queries was (and still is) a problem. We run an e-commerce site, and the main product listing would just append filter options to the end of the URL. Thus two identical queries would have different-looking URLs depending on the order people selected the filters. I don’t have numbers of how much it hurt us, but that wasn’t really the point. (Given the amount of filter options we have, you can do trillions of distinct queries.) The point was to avoid clients accidentally getting bad cache performance when they can easily avoid it.

      2. 2

        On OpenResty one could do some preprocessing on inbound requests to normalize them before caching. Relying on sorted parameters when they are specifically allowed to be unsorted in the standard means you are actually breaking the standard.

        Normalizing the keys is the correct solution.

        1. 1

          Sure, using a caching proxy that can do normalisation would be an alternative. (And thanks for the pointer; I wasn’t aware of OpenResty.) But you won’t get maximal benefit from local client or CDN caches that way.

          RFC 3986 does indeed mention that you can get unnecessary aliases for the same resource, and has this to say about avoiding the problem:

          False negatives are caused by the production and use of URI aliases. Unnecessary aliases can be reduced, regardless of the comparison method, by consistently providing URI references in an already- normalized form (i.e., a form identical to what would be produced after normalization is applied, as described below).

          Protocols and data formats often limit some URI comparisons to simple string comparison, based on the theory that people and implementations will, in their own best interest, be consistent in providing URI references, or at least consistent enough to negate any efficiency that might be obtained from further normalization.

          I haven’t found anything in the RFC saying I can’t enforce this normalisation, however. Did I miss something?

          1. 2

            Of course you are free to do what you want. I understand you have less control over intermediaries, but all caching should be done on a normalized URL. My pedantic point is that you could be altering semantics in a bad way. I have seen plenty of web apps that required the parameters in the URL in a specific order, this is bad.

            Those intermediaries that you don’t have control over, if they are doing caching, I would consider them broken if they weren’t doing normalization first.

            Your serverside should assume random order and normalize.