If you want to run an API behind Cloudflare, you need to be aware that Cloudflare does not obey the Vary header for anything other than images (and for images only on a Pro or higher plan).
This means you can’t safely vary the content you send back based on the Accept header, or you may risk serving cached HTML to a JSON client or vice-versa.
vary — Cloudflare does not consider vary values in caching decisions. Nevertheless, vary values are respected when Vary for images is configured and when the vary header is vary: accept-encoding.
Later, later edit: I think that maybe the “does not obey the vary header” might mean that it doesn’t even bother to try to cache resources that make use of it? Because otherwise I can’t understand why my examples work fine, in direct contradiction to the documentation parent provided.
I’m pretty confident this is incorrect, I have an ActivityPub service behind cloudflare and it can serve three types of content for the same URL: json payload, raw image, html presentation for said image.
regarding your edit, the CF default config only caches files with specific file extensions. Your paths don’t have file extensions, and thus don’t qualify for being cached anyways, unless you’ve setup specific rules telling it differently.
The various Fediverse projects have been an example where conneg has caused issues in practice: they usually provide posts both as HTML for browsers and as activitypub-JSON under the same URL. It turns out in practice many implementations do not do full parsing of the Accept: headers and return the wrong thing if they get a slightly unusual-but-valid request.
I’m glad to tell you that my implementation of a fediverse server does this correctly. The content is served based on what user-agents send in the Accept header (currently html, json, or binary). :))
I can see the logic of separating HTML output and JSON output, there is still use for content negotiation, as there’s more output options than just JSON, like CBOR, YAML, s-exprs, and so on.
Another reason is that hypermedia endpoints can very easily represent multiple types of resource. For example, your “members” page might also have a list that shows “invitations” (i.e. future members), which for good reason will be a different database table with a different schema.
There are lots of other examples of this, such as a “category” page which displays both the details of a “category” record, and items of a completely different type of thing, such as “product”, that belong to the category.
The pattern of “one type of resource per endpoint” is nowhere close to covering the majority of pages, which is why I find the Rails patterns for this to be unhelpful, and will in fact exert a negative influence on the usability of a web site which organises its endpoints around this structure. In Django, the Class Based Views have the same problem - just because my page has a list of things, that doesn’t make it a ListView - it has a list, that doesn’t mean it is a list.
Agreed about data vs. hypermedia versioning needs. Data APIs that don’t include version numbers in their URLs are a nightmare to maintain. Eventually, one needs to redesign some existing data resource but can’t for backward compatibility reasons. A typical solution is to create new ones with nearly identical names and leave the old ones up for backward compatibility’s sake. Newcomers have no idea which ones to use and so proper use of the API becomes a matter of tribal knowledge. Data APIs—especially public ones—really ought to be versioned.
I found using content negotiation for versioning instead to work quite well. Consumers can even decide to only accept newer media type versions for a select subset of the API, allowing them to migrate gradually and gracefully at their own pace. Technically this would of course also work for APIs versioned via the URL but often this is precluded for implementation reasons (e.g. a new version is provided by a different service which isn’t interoperable with an older version). Also, version updates on the URL level tends to be perceived as a bigger deal (probably due to its higher visibility) so it’s appears to be done less frequently and if so with bigger impact.
Thanks for sharing my post, I was going to reply something similar.
However, I also keep meaning to write a “.. But it’s not necessarily what I’d recommend” post because it’s often poorly understood/implemented.
For instance I spent a good year or so getting really stuck into the weeds, even writing https://gitlab.com/jamietanna/content-negotiation-test-cases/ but now I can’t fully remember why certain cases are why they are 😂 (also on me to document it better, but still)
It’s perfectly sensible to split your data and hypermedia APIs into separate resources, but framing this as a content negotiation concern is a non-sequitur.
Resources define a conceptual thing in your system. If you want to represent separate things, design separate resources. This is very simple.
If you want to run an API behind Cloudflare, you need to be aware that Cloudflare does not obey the Vary header for anything other than images (and for images only on a Pro or higher plan).
This means you can’t safely vary the content you send back based on the Accept header, or you may risk serving cached HTML to a JSON client or vice-versa.
Here’s their documentation on this (which took me quite some time to find): https://developers.cloudflare.com/cache/concepts/cache-control/#other
Later, later edit: I think that maybe the “does not obey the vary header” might mean that it doesn’t even bother to try to cache resources that make use of it? Because otherwise I can’t understand why my examples work fine, in direct contradiction to the documentation parent provided.
I’m pretty confident this is incorrect, I have an ActivityPub service behind cloudflare and it can serve three types of content for the same URL: json payload, raw image, html presentation for said image.
I just realized I’m proving your point, as the object is indeed an image. :D
But no, the same can be done for non image URLs with json/html as alternatives:
are any of those cached? If there is no caching, the vary header is irrelevant
EDIT: tried myself, none of these are cached by CF:
Yep, I realized belatedly that that was the case. :)
regarding your edit, the CF default config only caches files with specific file extensions. Your paths don’t have file extensions, and thus don’t qualify for being cached anyways, unless you’ve setup specific rules telling it differently.
The various Fediverse projects have been an example where conneg has caused issues in practice: they usually provide posts both as HTML for browsers and as activitypub-JSON under the same URL. It turns out in practice many implementations do not do full parsing of the Accept: headers and return the wrong thing if they get a slightly unusual-but-valid request.
I’m glad to tell you that my implementation of a fediverse server does this correctly. The content is served based on what user-agents send in the Accept header (currently html, json, or binary). :))
I can see the logic of separating HTML output and JSON output, there is still use for content negotiation, as there’s more output options than just JSON, like CBOR, YAML, s-exprs, and so on.
Another reason is that hypermedia endpoints can very easily represent multiple types of resource. For example, your “members” page might also have a list that shows “invitations” (i.e. future members), which for good reason will be a different database table with a different schema.
There are lots of other examples of this, such as a “category” page which displays both the details of a “category” record, and items of a completely different type of thing, such as “product”, that belong to the category.
The pattern of “one type of resource per endpoint” is nowhere close to covering the majority of pages, which is why I find the Rails patterns for this to be unhelpful, and will in fact exert a negative influence on the usability of a web site which organises its endpoints around this structure. In Django, the Class Based Views have the same problem - just because my page has a list of things, that doesn’t make it a
ListView
- it has a list, that doesn’t mean it is a list.Agreed about data vs. hypermedia versioning needs. Data APIs that don’t include version numbers in their URLs are a nightmare to maintain. Eventually, one needs to redesign some existing data resource but can’t for backward compatibility reasons. A typical solution is to create new ones with nearly identical names and leave the old ones up for backward compatibility’s sake. Newcomers have no idea which ones to use and so proper use of the API becomes a matter of tribal knowledge. Data APIs—especially public ones—really ought to be versioned.
I found using content negotiation for versioning instead to work quite well. Consumers can even decide to only accept newer media type versions for a select subset of the API, allowing them to migrate gradually and gracefully at their own pace. Technically this would of course also work for APIs versioned via the URL but often this is precluded for implementation reasons (e.g. a new version is provided by a different service which isn’t interoperable with an older version). Also, version updates on the URL level tends to be perceived as a bigger deal (probably due to its higher visibility) so it’s appears to be done less frequently and if so with bigger impact.
Thanks for sharing my post, I was going to reply something similar.
However, I also keep meaning to write a “.. But it’s not necessarily what I’d recommend” post because it’s often poorly understood/implemented.
For instance I spent a good year or so getting really stuck into the weeds, even writing https://gitlab.com/jamietanna/content-negotiation-test-cases/ but now I can’t fully remember why certain cases are why they are 😂 (also on me to document it better, but still)
Aren’t
Accept
headers expected to be IANA MIME types?They can be
vnd.
vendor MIME types. GitHub, for example, does this both for their general RESTish API and for previewing new features in their GraphQL API and their RESTish API.It’s perfectly sensible to split your data and hypermedia APIs into separate resources, but framing this as a content negotiation concern is a non-sequitur.
Resources define a conceptual thing in your system. If you want to represent separate things, design separate resources. This is very simple.
Within a resource, you should still negotiate, even in data APIs.