browsers used to care about RSS and Atom, and they’d put that little yellow feed icon somewhere in the top bar when they spotted this sort of thing in a page.
That was me! At least in the case of Safari c.2007. And clicking the icon fetched and rendered the feed in the browser; and bookmarking that subscribed to the feed; and navigating to a subscribed feed rendered the cached version unless you did a force-reload. Quite a nice system we thought, but it was ripped out a few years later. Sigh…
I feel for Rachel because we had the mirror-image experience, having to make our feed reader deal with websites’ fscked-up feed implementations. Invalid XML, broken date formats, misquoted HTML bodies, missing Date headers…
RSS/Atom are pretty simple standards, but it’s impressive how people manage to screw up implementing them.
PS: the reader test service idea is brilliant. There’s been a feed validator service for publishers for a long time, but I’m not aware of one that lets you check readers for conformance.
Disagree. Atom is a simple standard. RSS is four different standards of varying degrees of quality and mutual-incompatibility. They don’t even agree on what “RSS” stands for.
How come the responsibility ended up at the consuming side? I’ve heard the same thing about HTML over and over but this sounds like a chance to begin anew.
Only because of competitors bending, forcing everybody to join in?
Because the users naturally want every site’s feed to work properly in their reader app, and if it doesn’t, they tend to blame the app, not the website.
This was exacerbated with feeds because
XML, especially with namespaces, is complicated and easy to make mistakes in
a lot of sites reused their their HTML templating engine for generating XML feeds, and often these produced invalid XML because people were too used to the loosy-goosy nature of HTML
there were multiple confusingly-similar specs (I think it was Mark Pilgrim who once identified 11 different flavors of RSS)
Dave Winer was — how do I put this? — not good at writing clear specifications, so RSS 2.x was quite vague, handwavey, and often ambiguous. He also had a habit of making changes in the spec document without touching the version number.
RSS and Atom suffer basically the same problem as XHTML: the authoring tools that are supposed to emit well-formed XML are not designed in such a way that they can do so reliably. For instance, you can’t simply wrap a feed entry’s contents in a CDATA block if the entry might contain its own CDATA. The original sin being that these tools are based on templated text, not on composing DOM fragments.
Decoding content was fun. It might contain plain text, HTML with XML-escaped meta characters, unescaped HTML that wasn’t valid XML due to missing close tags etc., or valid XHTML. Deducing which often required piles of flaky heuristics.
Bonus fun: the content might have a text encoding different from that specified for the XML document, because maybe their RSS template’s header specified UTF-8 but the rest of their CMS used ISO-8859.
You would be surprised how many professional web developers have zero idea about conditional requests and how they work. So many people will just slap a CDN in front of a website and assume it alone can fix issues (and sure it can help a lot) without understanding the basics of conditional requests.
I haven’t worked with conditional requests myself, but MDN is generally a good starting point: MDN: HTTP conditional requests.
I see that Ruby on Rails has its own docs about implementing caching using conditional requests: Caching with Rails: An Overview > Conditional GET Support. So I’d guess that the steps to make a website start benefiting from conditional requests depend on your web framework or web server.
Sure, but it can still be annoying when you’re responsible for the infrastructure on the other end of dozens of hobby projects, as I imagine the OP is. Running a hobby project for yourself doesn’t automatically give you a pass for wasting others’ resources.
Ninja edit: Or to frame it another way, I don’t get the impression that the OP is annoyed at feed readers just because they don’t follow “the standard” (not that there is a standard for being a feed reader, but you know what I mean). I think she’s annoyed at readers that make an excessive number of requests to her server as a result of that non-standard-ness.
Once upon a time, circa 2008-2010 a little e-commerce site called Backcountry.com had some woot style one deal at a time sites like steepandcheap.com or whiskeymilitia.com.
These sites had RSS feeds and a couple contractors that wrote some desktop widgets based on them. People would wait for the deal to change every 20 minutes and just hit refresh over and over.
Somehow as a junior engineer I became aware of this problem and learned about the magic of conditional GET and 304 Not Modified. It became my project to go corral all the various semi-first party and third party widgets to stop hammering our servers day and night and wasting bandwidth. I ended up saving a bunch of bandwidth and strangely updating one of those Firefox extensions is how I first got involved with addons.mozilla.org and volunteered to review new extensions for a while.
I would definitely use the (imaginary) feed reader behavior test service. Not as a reader author, but as the user of one, to check that the software I run is well behaved.
It seems that she doesn’t post a cache header on the feed. So it seems that there is no machine-readable indication of how often the author would like to see the feed polled.
I think adding a Cache-control: max-age=3600 would go a long way. Even if poorly supported today those browser-extension readers she was complaining about may accidentally support it, some readers do already consider this and it is a reasonable protocol that people can adopt rather than an expectation to special-case her feed.
Of course the deluxe option is WebSub, but these aren’t mutually exclusive.
Can confirm from nitter: RSS readers do not care about 429’s, 404’s or any kind of indicators when to come back. The only things you can do is to either block them or limit their requests/s (because they will do them all at once) - which they also won’t work very well with..
as a former nitter operator, their RSS feeds were a special case. those were 99% bots trying to scrape twitter through the RSS feed, not regular feed readers.
I am not one of the RSS subscribers to this blog, but the feed reader I wrote and use personally does fetches just based on a cron job. That means I can’t really support Retry-After, but I could probably implement conditional requests. None of the RSS sources update more than once an hour, so it hasn’t seemed like a priority to fix.
IMO as a grizzled elder, sending conditional requests should be mandatory for feed readers. It’s showing courtesy to the publisher and their finite server resources.
That was me! At least in the case of Safari c.2007. And clicking the icon fetched and rendered the feed in the browser; and bookmarking that subscribed to the feed; and navigating to a subscribed feed rendered the cached version unless you did a force-reload. Quite a nice system we thought, but it was ripped out a few years later. Sigh…
I feel for Rachel because we had the mirror-image experience, having to make our feed reader deal with websites’ fscked-up feed implementations. Invalid XML, broken date formats, misquoted HTML bodies, missing Date headers…
RSS/Atom are pretty simple standards, but it’s impressive how people manage to screw up implementing them.
PS: the reader test service idea is brilliant. There’s been a feed validator service for publishers for a long time, but I’m not aware of one that lets you check readers for conformance.
Disagree. Atom is a simple standard. RSS is four different standards of varying degrees of quality and mutual-incompatibility. They don’t even agree on what “RSS” stands for.
How come the responsibility ended up at the consuming side? I’ve heard the same thing about HTML over and over but this sounds like a chance to begin anew.
Only because of competitors bending, forcing everybody to join in?
Because the users naturally want every site’s feed to work properly in their reader app, and if it doesn’t, they tend to blame the app, not the website.
This was exacerbated with feeds because
Evergreen blog post by Mark Pilgrim about why “client refuses to render invalid data” is probably wrong: https://web.archive.org/web/20060420051806/http:/diveintomark.org/archives/2004/01/14/thought_experiment
RSS and Atom suffer basically the same problem as XHTML: the authoring tools that are supposed to emit well-formed XML are not designed in such a way that they can do so reliably. For instance, you can’t simply wrap a feed entry’s contents in a CDATA block if the entry might contain its own CDATA. The original sin being that these tools are based on templated text, not on composing DOM fragments.
Decoding content was fun. It might contain plain text, HTML with XML-escaped meta characters, unescaped HTML that wasn’t valid XML due to missing close tags etc., or valid XHTML. Deducing which often required piles of flaky heuristics.
Bonus fun: the content might have a text encoding different from that specified for the XML document, because maybe their RSS template’s header specified UTF-8 but the rest of their CMS used ISO-8859.
Mostly that. But also because many websites with broken feeds feel less approachable than feed reader developers to report bug reports to.
You would be surprised how many professional web developers have zero idea about conditional requests and how they work. So many people will just slap a CDN in front of a website and assume it alone can fix issues (and sure it can help a lot) without understanding the basics of conditional requests.
Recommended resources?
I haven’t worked with conditional requests myself, but MDN is generally a good starting point: MDN: HTTP conditional requests.
I see that Ruby on Rails has its own docs about implementing caching using conditional requests: Caching with Rails: An Overview > Conditional GET Support. So I’d guess that the steps to make a website start benefiting from conditional requests depend on your web framework or web server.
Most people just build stuff for fun, I don’t think it’s particularly bizarre that most feed readers out there aren’t strictly following standards.
Sure, but it can still be annoying when you’re responsible for the infrastructure on the other end of dozens of hobby projects, as I imagine the OP is. Running a hobby project for yourself doesn’t automatically give you a pass for wasting others’ resources.
Ninja edit: Or to frame it another way, I don’t get the impression that the OP is annoyed at feed readers just because they don’t follow “the standard” (not that there is a standard for being a feed reader, but you know what I mean). I think she’s annoyed at readers that make an excessive number of requests to her server as a result of that non-standard-ness.
Update: looks like she is actually building the thing! https://rachelbythebay.com/w/2024/05/29/score/
Once upon a time, circa 2008-2010 a little e-commerce site called Backcountry.com had some woot style one deal at a time sites like steepandcheap.com or whiskeymilitia.com.
These sites had RSS feeds and a couple contractors that wrote some desktop widgets based on them. People would wait for the deal to change every 20 minutes and just hit refresh over and over.
Somehow as a junior engineer I became aware of this problem and learned about the magic of conditional GET and 304 Not Modified. It became my project to go corral all the various semi-first party and third party widgets to stop hammering our servers day and night and wasting bandwidth. I ended up saving a bunch of bandwidth and strangely updating one of those Firefox extensions is how I first got involved with addons.mozilla.org and volunteered to review new extensions for a while.
I would definitely use the (imaginary) feed reader behavior test service. Not as a reader author, but as the user of one, to check that the software I run is well behaved.
As someone who is implementing a feed reader right now, I’m gonna take that advise to heart.
It seems that she doesn’t post a cache header on the feed. So it seems that there is no machine-readable indication of how often the author would like to see the feed polled.
I think adding a
Cache-control: max-age=3600would go a long way. Even if poorly supported today those browser-extension readers she was complaining about may accidentally support it, some readers do already consider this and it is a reasonable protocol that people can adopt rather than an expectation to special-case her feed.Of course the deluxe option is WebSub, but these aren’t mutually exclusive.
Can confirm from nitter: RSS readers do not care about 429’s, 404’s or any kind of indicators when to come back. The only things you can do is to either block them or limit their requests/s (because they will do them all at once) - which they also won’t work very well with..
as a former nitter operator, their RSS feeds were a special case. those were 99% bots trying to scrape twitter through the RSS feed, not regular feed readers.
I verified some of these requests and they were legitimate android apps. The spammy ones got removed pretty fast from my systems.
I am not one of the RSS subscribers to this blog, but the feed reader I wrote and use personally does fetches just based on a cron job. That means I can’t really support
Retry-After, but I could probably implement conditional requests. None of the RSS sources update more than once an hour, so it hasn’t seemed like a priority to fix.Even if the fetches happen on a fixed interval, you could still make sure you don’t fetch a given feed before its Retry-After delay has passed, right?
IMO as a grizzled elder, sending conditional requests should be mandatory for feed readers. It’s showing courtesy to the publisher and their finite server resources.
Sure you can:.