1. 25
  1. 3

    Yes for my own site the RSS feed is the most downloaded thing by far. I don’t include the page content so bandwidth is not a problem but it is weird that many readers are clearly ignoring the TTL and just requesting it over and over.

    1. 4

      Misconfigured readers/central aggregators have been a bane of RSS almost since the beginning. Developers don’t always read specs, or test.

      1. 9

        Even worse, commonly used readers like ttrss adamantly refuse to do the bare minimum like using etags. ttrss makes requests every 15 minutes to a feed that only really updates once per day at most. I’d have to convince all of my readers that use ttrss to fix their settings instead of them being a good net citizen and following recommendations to make the protocol work better.

        1. 6

          That’s horrific. I would even say antisocial. Shit like this degrades the entire ecosystem — RSS is already kludgy enough as it is without developers petulantly refusing to adopt the few scalability measures available.

          Plus, the developer’s response is childish:

          … get a better hosting provider or something. your sperging out over literal bytes wasted on http redirects on your blog and stuff is incredibly pathetic.

          Pathetic, indeed.

            1. 2

              I kind of liked ttrss until I understood how the developer acts in general. I’ve moved to miniflux since.

              I’ve also understood that ttrss had some issues regarding security that the developer just refused to address or fix, due to reasons.

            2. 1

              haha! tiny tiny rss happens to be the #1 useragent for my site!

              1. 1

                Tempting to vary on a response header so one could prepend a bad netizen warning post to the top of the list for those readers that are being problematic.

                1. 1

                  Having HTTP features (cache-control and last-modified) duplicated in the RSS spec is really annoying. Developers don’t want to write a custom cache for RSS feeds. I don’t know why supporting redundant caching measures encoded in XML would make a piece of software better. Why wouldn’t a HTTP cache be sufficient?

                  1. 3

                    AFAIK we are talking about HTTP caching in this thread. There are sites that don’t include headers like Etag or even Last-Modified, and there are clients that ignore them and just send unconditional requests every time.

                    There are one or two RSS properties related to this, but the only one I remember specifies a suggested poll interval. That does overlap to some degree with Cache-Control, but I don’t remember what the reasoning was for it. I’m certainly not going to defend RSS as a format, it’s pretty crude and the “official” specs are awful.

            3. 3

              Another thing that helps, at least with marginally well-behaved clients, is to add the header Cache-Control: public; max-age=3600.

              1. 2

                I have this:

                cache-control: public, max-age=86400, stale-if-error=60

                Is this sufficient? My feed isn’t updated more than once per day.

                1. 1

                  Is this sufficient? My feed isn’t updated more than once per day.

                  I think that should be plenty! It blows my mind how clients can fall down on simple stuff like this.

              2. 3

                This is a nice essay on how to tune up a site that actually is getting enough traffic to get a tune-up. One of the newer things here is the requirement lately that any non-trivial popular site use a cdn. It’s simply too much of a waste to do otherwise.

                1. 3

                  The amount of traffic I get continues to surprise me! I was surprised at a few details involving the size of the RSS feed (and how much it probably ended up costing me on the Kubernetes cluster).

                  I also threw cloudflare into hyper aggressive mode too (as opposed to the slightly aggressive mode it was in previously). Hopefully that should make the bandwidth costs (Hetzner has supposedly unlimited bandwidth, so let’s see if that’s actually true or not) even lower. Within a day or two I should have every asset on my site cached in cloudflare.

                  1. 1

                    Very cool.

                    I went back to self-hosting just after the holidays. I got tired of signing up for “free” platforms and then having the rules change later.

                    I’m on EC2 using CloudFront (and Let’s Encrypt for SSL stuff). Ghost and commento seem to be working fine, and I moved all of my image storage up to the cdn, but I’m still serving text and rss. I don’t know how it’s going to turn out for me. I just installed goaccess for a rudimentary bit of telemetry. I’m interested in following along to see how things are working for you. We definitely should compare notes.

                2. 3

                  wow, went to check my site and found a single IP requesting my RSS every minute. That IP alone transferred 13GB of data since December. Damn. Thanks for raising awareness about such bad players @cadey.

                  1. 2

                    Yes, that graph is showing in gigabytes. We’re so lucky that bandwidth is free on Hetzner.

                    But it says 300 Mil on the left. And “bytes” on top. So I guess Mil stands for million, and 300 million bytes is 300 decimal megabytes, not gigabytes, unless my math is all wrong. Is my math all wrong?

                    1. 1

                      You’re correct that 300 million bytes is 300 MB (or around 286 MiB).

                      1. 1

                        My bad. I was reading the cloudflare graph when I wrote that. I think I uploaded the wrong image to Twitter. Oops. I’ll fix it.

                        1. 1

                          I think nevertheless your scale of “this could get expensive” would only be right if you were on a very expensive provider like google cloud. Or maybe if this were 15 years ago. Hetzner predicts that 20TB/mo is completely normal, and you are nowhere near that! A gigabyte in 2021 is a small thing.

                          Of course, it’s fine to plan very much ahead and optimize things, but maybe this will give people the wrong idea that it’s absolutely necessary to put cloudflare or a caching cdn in front of their website, or cut down RSS feeds. When even at your great level of popularity, it isn’t really needed.

                    2. 1

                      It’s good to spread this information, but really this is RSS 101 and I’m kind of sad to see people apparently just discovering it.

                      (Actually it’s REST 101.)

                      1. 1

                        For how simple this is it’s pretty amazing how many sites don’t do it. In my feed reader I support etags and last-modified, but so many sites support neither that I also hash the bytes of each feed so that I can skip reparsing it unless it has actually changed (of course that only saves me CPU and I/O, not bandwidth).

                        There’s still that one site with a timestamp in a comment at the top of the feed that changes on every fetch, though. For that there’s adaptive scheduling.

                      2. 1

                        You should consider reworking your description fields. You should not be including the full post in the description.

                        My website landing page is a feed, and as you can see, it includes all posts I’ve ever made, and remains tiny: http://len.falken.ink . My description fields are 1 sentence, describing my content.

                        1. 18

                          You should not be including the full post in the description

                          Why not? I prefer sites I can read in my aggregator completely (so I don’t have to deal with whatever fonts and colors and fontsizes the “real” site uses). The feed doesn’t need to include every article ever posted, though. The last few is fine. Keep old articles around (or not), is up to the aggregator.

                          1. 7

                            This is exactly why I put the article text in the description. I don’t think readers handle the Mara stickers that well though :(

                            1. -1

                              And that’s why you had problems, you used it for what it was not intended for.

                            2. 1

                              Because it’s for a description, what else do I have to say?

                              1. 5

                                Neither common practice nor the RSS 2.0 spec support your assertion that the description element should only be used for a short description.

                                1. 1

                                  It literally says “The item synopsis”….

                                  Are we reading the same thing?

                                  1. 5

                                    I am reading: “An item may also be complete in itself”, which I interpret that the whole post is allowed to be in there.

                                    But even if you were technically right, it feels as unnecessary and wasteful to require that the user fires up a browser to get the remaining two paragraphs of a three paragraph post, because the first one was regarded as intro/synopsis and is the only one allowed to be in the feed. If people do that, I always get the sense that they force you to do that to increase their ego through their pageview counters.

                                    Text is easy to compress. If it is still too much, one can always limit the amount of items in the feed and possibly refer to a sitemap as the last item for the users who really use that feed to learn about everything on your site.

                                    1. 1

                                      If you read in the description field, it says what I wrote…

                                      I agree with the logic where if you’re including some of the text, but then require to launch a browser to read the rest, it’s a waste.

                                      If you’re delivering web content though, you’ll need a browser. You just can’t get over that. On my website, I don’t serve web content, I only serve plain text, for the exact purpose you mention: I don’t want my readers to launch a browser to read my content.

                            3. 5

                              You should not be including the full post in the description.

                              Your root-page as feed idea is nifty, and I think there are plenty of scenarios where concise descriptions along those lines make good sense. Still, for probably the majority of blog-like things, the full content of posts in the feed offers a better user experience.

                            4. 1

                              I dropped all but the 5 latest posts from my feed. I didn’t do it for bandwidth reasons, but to minimise the impact of flooding Planet aggregators if (when) I made a change that bumped old articles into the upper end of the “new articles” feed by mistake. Saving bandwidth is a happy accident.