1. 67
  1. 10

    I think on this site, everyone knows what RSS is and why people want to use it, so it’s a bit preaching to the choir. It would be more instructive if it theorized why most people don’t really use RSS anymore.

    1. 5

      I don’t think the article was aimed only at the lobste.rs crowd. Outside tech nerdery, I don’t think many people have even heard of RSS.

      1. 2

        One problem I have with RSS is that whenever a feed entry doesn’t have the <guid isPermaLink="true"> attribute set, the reader (yarr in my case) is going to display old entries as unread from time to time. Of course strictly speaking, that’s not a fault of the RSS reader, but rather of the server generating the feed but I find that many, many feeds have this problem.

      2. 8

        “RSS” is like four distinct formats; the majority of which are underspecified garbage. When people say “RSS” do they usually mean “RSS 2.0” or is it still a mishmash like it was in the early aughts?

        1. 9

          At this point it’s mostly RSS 2.0 and Atom 1.0 from what I’ve seen.

          1. 6

            You got me thinking about this, so I did a quick check of my own subscribed feeds. Out of 451:

            • 359 are RSS 2.0
            • 81 are Atom 1.0 (2 were Atom 0.3, which I updated)
            • 11 are RSS 1.x (RDF)

            (It’s interesting that I can almost correlate formats. Atom has Google properties and self-publishers, RSS 1.x are mostly advisories, and RSS 2.0 is the de facto default).

            1. 2

              I was a big proponent of RSS2 for years, but these days it surprises me how popular the least specified option became.

              1. 1

                Is RSS2 XML? It looks as if it is, from a quick skim of some docs. This was the big reason I preferred Atom to RSS back in the day. RSS allowed embedding arbitrary HTML, which meant that you needed an HTML-aware parser to handle it. In particular, I could always embed Atom in XMPP and anything that could parse XMPP could handle it, even if it didn’t know what to do with it, whereas embedding RSS would cause the server to see invalid XML and drop the connection.

                On the other hand, not being XML was one of the reasons that RSS was more popular than Atom: you could embed whatever tag-soup HTML you’d written in an RSS feed and it would be valid. These days, the XML vision of allowing arbitrary document types to be embedded in others with graceful fallback if you didn’t understand them is largely gone. I haven’t used XMPP for years, HTML5 is not XML and so you can’t embed HTML5 in SVG, and even SVG seems to be largely dying in favour of imperative JavaScript and canvas.

                1. 4

                  Every RSS variant is XML and requires well-formed XML. In all variants except Atom the only way to embed HTML is to escape it. I worked at a company ingesting every RSS or Atom feed that could be found and I don’t recall running across one that tried to embed tag soup unescaped as though that would work.

                  RSS2 has two big problems: it doesn’t specify any way to tell if a description contains escaped HTML or plain text, and in practise people did both; and as an especial sin against XML it does not use a namespace, so you cannot easily embed it in other XML contexts. The latter is why all XMPP feed protocols use Atom, but Atom was created to fix both of these.

                  RSS1 of course uses namespacing properly, being RDF. In practise it had the same problem about not knowing what kind of content people were embedding. I think probably there could have been an RSS1.5 that kept the RDF and put in the features from Atom that were needed, but back then (and sometimes today) even XML nerds didn’t always appreciate RDF enough to care about that.

                  1. 2

                    I do believe RSS(2.0) is XML but the issue is how it handles embedded content. There were some subtle ambiguities regarding embedded HTML that Dave Winer simply wasn’t interested in addressing.

                    For example, in the RSS 0.92[1] spec at Userland, we can read

                    Further, 0.92 allows entity-encoded HTML in the of an item, to reflect actual practice by bloggers, who are often proficient HTML coders.


                    This is obviously not a future-proof spec.

                    Atom did the technically correct thing by insisting on unambiguous standards, damnit, but then they ran into the buzz-saw of “countercultural” bloggers who were on Winer’s side and disliked any restrictions or rewrites as akin to demands from The Man.

                    In the end, the Syndication Wars were a technical culture war, like the parallel XHTML vs HTML. Feed readers and parsers quickly learned to deal with feeds heuristically, just like HTML, and the requirements for strict standards compliance were seen as overly onerous in the real world.

                    I prided myself in having a standards-compliant Atom feed but it turns out my elements are apparently not up to spec. No-one has complained so far.

                    As an aside, Atom has no problem handling Gemini content, even if the intrusion of strict XML into the plaintext paradise of Gemini does feel a bit weird.

                    [1] RSS 2.0 is just a light re-skin of RSS 0.92 along with a version bump to indicate it’s “final”.

                    1. 2

                      That’s a shame. XMPP was a great transport for Atom (reliable delivery, push semantics). I hoped that it would eventually displace HTTP as the aggregator -> client transport at least, and ideally as the originator -> aggregator transport. From what I remember, neither Atom nor RSS on their own provided a good mechanism for handling large feeds: either they expired old entries and so aggregators had to cache everything or they kept old things and ended up with huge files. With XMPP, you could use HTML to fetch historical entries and XMPP to get tiny updates.

                      I suppose you could still transport RSS over XMPP in CDATA elements, but that feels like missing the point somewhat. The nice thing about Atom over XMPP was that your XMPP client could pull out a couple of core elements (e.g. title) for a notification and also push the rest to a dedicated reader application.

                      1. 3

                        XMPP is still a great protocol for atom, and we’re starting to see movement in the client space.

                        1. 1

                          Atom’s ambitions were always bigger than blogging. I keep forgetting XMPP (mostly because I never use it). Wasn’t atom to be a part of Pub/Sub too? Lots of bright ideas, lots of ambitions about computer-to-computer communications, when it all really just settled on… syndication. Before the silos took over, slinging JSON internally.

                          RSS at least kept it simple. Maybe that’s the lesson to take from all this…

                        2. 2

                          Note that Atom also allows embedded escaped HTML, it just requires that you say it is HTML instead of making the parser guess.

                          1. 1

                            Correct, I was unclear about that.

                  2. 2

                    I’ve used (self-hosted) both tt-rss and freshrss (went back to tt-rss because it works with rssguard) and most of them will be able to use any RSS or Atom format. Some can even automatically get the feed for a YouTube channel (which is now sorta hidden / difficult to find).

                    1. 1

                      Yeah, as a publisher, RSS and Atom both stink. JSON-Feed is better, but no one uses it. NinJS is great, but even fewer people use it.

                      1. 2

                        What makes JSON-Feed better than RSS or Atom? I implement all three for my blog, and I found each just variations on a theme (with Atom being at least better described).

                        1. 1

                          Atom and RSS both have incomplete metadata and to actually describe a story in a reasonable way, you need to use one with namespaced tags from the other. JSON Feed has all the fields of both out of the box and a couple more, like link post support.

                          See the table in http://www.intertwingly.net/wiki/pie/Rss20AndAtom10Compared and notice that neither is a superset, whereas JSON Feed is https://www.jsonfeed.org/mappingrssandatom/

                          1. 1

                            I used a pre-existing library from CPAN when I added it to my blog engine, but I suspect the major appeal is that it’d be a lot easier to roll by hand for most devs than anything built on XML. I was hoping it’d take off for that reason, but it pretty much seems to have stalled out.

                          2. 1

                            I’ve been using atom for over a decade and I’ve always been happy with it. You can definitely tell it was standardized by someone for whom it wasn’t their first spec.

                            1. 1

                              It has pub date but not modified date. Seems really obvious that you want both.

                              1. 3

                                This seems incorrect to me.

                                From the link you posted in a sibling comment: https://www.jsonfeed.org/mappingrssandatom/

                                Atom has an array of entry objects. In JSON Feed these are item objects.

                                • Atom’s published and updated dates map to date_published and date_modified in JSON. Both Atom and JSON Feed use the same date format.
                                1. 2

                                  Okay, it’s been a while since I implemented this. Looks like Atom does have updated. In general, Atom is better than RSS, for sure. I remember there being some missing tags in Atom, but I can’t recall what it was if it’s not updated time.

                        2. 4

                          Nice tool! My newsreader of choice, Feedly, has a similar feature, but it’s great to see a standalone FOSS tool for this.

                          Some feedback:

                          • It should allow you to omit the “https://“, as a browser does.
                          • It feels wrong to me that the site whose URL you entered shows up as a search result, and sometimes not even the first one. It’s good to show the title of the site as confirmation you got the URL right, but maybe put it directly below the search field with a divider separating it from the results.
                          • When I tried it with https://cdm.link/ (Create Digital Music), it didn’t list any other sites at all.
                          • For https://arstechnica.com most of the recommendations were reasonable, like Wired, but the top one was to True Tiger Recordings, “CHECK FOR INFO ON SCANDALOUS UNLTD., MISTY DUBS, ONE DARK MARTIAN, CARLY BOND, PURPLE, SHYAM & DJ WEC”, a Blogspot site that hasn’t been updated since 2005. Kinda seems like a glitch, the sort of WTF result I remember from the days of AltaVista… 🧐
                          1. 2

                            Thank you for this feedback! Right now, it’s the most minimal of MVPs, and i was hoping to get feedback like this.

                          2. 4

                            My problem with RSS/Atom has always been the lack of prioritization. Using RSS is like drinking a firehose. For example, if I’m interested in the “programming” tag RSS of Lobsters, I can fetch https://lobste.rs/t/programming.rss which gives me 25 programming-tagged posts on this site. Because there’s no in-band way to indicate scoring information, I don’t have any of the community ranking features associated with this site. Even if I were to score these myself based on some keyword scorer, NB classifier, or some RNN there’s no way to persist my scoring onto the original RSS feed itself, unless I modify the DOM with a new metadata attribute and then build/modify a reader to respect this new attribute.

                            Combine one RSS feed like this with something from, say, HackerNews and a couple other tech sites, and I’m looking at hundreds of articles per day with no way of understanding what’s good and what’s bad. For now I’ve been playing around with a workflow where I use a classifier and a lot of human tagging to pick articles I want to read later which then get resyndicated into a feed that Wallabag ingests and adds into its Unread queue, which I then read in my own time. But this setup takes intentionality and maintenance. If there were an in-band way to indicate priority, that would go a long way to fixing RSS workflow issues for me.

                            1. 4

                              My first tech job was at a company that had some nice solutions to this. Unfortunately they got bought by Google and all the tech shelved so they could be integrated into the Google Reader team, which then got axed

                              1. 2

                                Because there’s no in-band way to indicate scoring information, I don’t have any of the community ranking features associated with this site. Even if I were to score these myself based on some keyword scorer, NB classifier, or some RNN there’s no way to persist my scoring onto the original RSS feed itself, unless I modify the DOM with a new metadata attribute and then build/modify a reader to respect this new attribute.

                                This feels like something that should be built from the protocol, rather than the format, but it’s very difficult to do in any kind of generic way. First, it’s hard to differentiate between a signal in the client of ‘I’m not interested in this subject’ and ‘this is a bad article and you should feel bad’.

                                To make use of the first on the server, you need to do some clustering to identify users with similar interests and silo their ratings (people who like the things you like, also like this). This requires tracking users, which is fine for something like lobste.rs because it requires account creation, but if you wanted to embed this in an arbitrary feed protocol then you’d need some kind of user-authentication protocol. That’s fairly tractable if it’s a protocol for aggregators to talk to clients but it’s much harder if you don’t have an aggregator in the middle (and if you do then this means that the feed providers don’t ever get this feedback and the value in the ecosystem is shifted away from the folks who produce the source material).

                                Making use of the second is much simpler in theory but requires you to have a solution to review spam. If I can create 10,000 accounts and mod-bomb articles that I do / don’t like then I can massively influence. The outcome. Again, lobste.rs does this by having an audit trail for account creation and requiring referral, but that doesn’t easily scale up to a protocol that communicates feedback with arbitrary sites.

                                Both of these have the problems of providing useful ratings for things with different numbers of readers. A news article about an MP being stabbed to death (to pick something recent from the news) probably had millions of readers, whereas the article from yesterday about the magic_enum C++ library probably had hundreds. Being able to scale these usefully so that people like me still see low-circulation geeky stories in the middle of a news feed is incredibly hard unless you completely isolate different kinds of story. Even within a publication, the number of readers can vary hugely. A typical article in Communications of the ACM gets 10-50K downloads, my most popular one got over 400K (and is definitely not ten times as interesting, I just picked an incredibly clickbaity title). When I was writing for InformIT, I think my least-read article got about 3K downloads in the first month, the most read got around 200K.

                                I’d love to see something like this that works at scale but I have no idea how you’d even start to design it.

                                1. 2

                                  It helps to realize that finding the “perfect” feed, that captures all and only the most relevant content, is not possible. Or that it is even desirable. It’s the same as with movies and books. Each year, more good books are written than I could read in a lifetime. More good movies come out than I could see in a lifetime, even if I were to completely stop reading books. It makes no sense to get frustrated over that. Instead, that is a fact that should make me happy. Because that means that I can fill my all time with reading and watching the good stuff and never worry that it stops. From my personal perspective, the supply of good stuff is endless.

                                  So if I enjoy reading a book, then I’ll continue reading it. If not, I’ll move to the next one. Who cares if I missed a book that could theoretically have scored 0.5 higher on my personal enjoyment scale on that particular day in my life? I am still enjoying myself, right?

                                  The same applies to RSS feeds. Like you, I don’t enjoy fire hoses, so Lobsters, Hackernews, Reddit, etc are not part of my list. Same goes for news papers and other high volume sites. Except for the ones that are actually able to curate their feed and send only a limited amount of articles per day. Whenever I read something interesting, I’ll check whether the site has RSS and I add it. Whenever I notice that I skip a lot of articles of a specific feed I’ll remove it. Over time this means that I have grown a list of almost 200 feeds that I can easily process during a relaxing morning coffee. And if there is time left, I’ll go over to the big sites and amuse myself with watching the hype trains.

                                  1. 2

                                    Except for the ones that are actually able to curate their feed and send only a limited amount of articles per day. Whenever I read something interesting, I’ll check whether the site has RSS and I add it.

                                    Right I’ve tried that workflow also, but then RSS just becomes another link aggregator that I have to check that has less breadth than a link aggregator like Lobste.rs or HackerNews. My interest in RSS is to be my “one stop” for news. A lot of social networks saw this issue and implemented either community scoring (like Reddit, Lobsters, and HN) or algorithmic filtering (like FB and Twitter) for this purpose.

                                    That’s the motivation behind my current setup. The thing is, if I don’t have a solution for breadth, I find myself trawling news aggregators for the breadth, and then I get distracted reading comment trees and flamewars that take up my time/energy. RSS is part of my strategy to be intentional about where my time and focus goes online, especially as I’m getting older and busier and my backlog of projects I want to get done is getting larger.

                                2. 2

                                  I recently started using “newsboat” reader for RSS feed, and find it much less distracting. My primary feeds are mostly security bulletins, but it is much quicker and efficient to find a new posted bulletin and the required info via newsboat than the actual website. I have become a fan, again, of RSS!

                                  1. 1

                                    I recently installed ttrss. I had been using rss2email for the arxiv which was nice since searching for names brought up both email and articles.

                                    In any case, I find RSS is helping reduce my compulsive checking. I wish that lobsters’ article ranking would somehow appear in the feed.

                                  2. 2

                                    Great post! I’ll admit I’ve never had a pressing need for a recommendation engine for RSS, if anything I’ve always had too much to read. But it’s something we’ll need to consider if we want to make the format viable and public again for regular people.

                                    There is so much pushback against RSS, not least by social media companies who’d rather keep us to themselves. The best thing we can do is keep talking about it. You have a new subscriber :)