1. 9
  1.  

  2. 12

    Using XML at all leads inevitably to centralization.

    This is a logical leap of epic proportions. I fail to see how the argument put forth backs it up.

    HTTP is also bloated and over-complex in the same way. And don’t get me started on certificates and certificate chains. Or fucking DNS.

    It doesn’t take a genius to come up with better formats than these for the same applications. These aren’t inherently-complex problems.

    Hint: the formats are not the compilicated (or even interesting) part. They could all be XML right now and it wouldn’t change much of anything.

    1. 3

      If you use XML, you are either writing your own XML parser or using someone else’s.

      Writing your own is hard – much harder than the project you’re planning to use XML for, probably.

      Using someone else’s is centralization: there aren’t many good XML parsers available, so you’re placing more power and trust in the hands of Jackson or libxml (which already underpin practically everything).

      A non-centralizing format is one where rolling your own implementation is not an obviously terrible idea.

      A format where any developer on the team can implement it more or less correctly in an afternoon is also a format that isn’t going to become a problem later - one where developers can reason about the behavior of parsers and generators, where different implementations can be expected to generally agree, and where corner cases that might be handled differently between implementations can be identified and avoided. It’s a format where auditing third party implementations is also potentially straightforward (and you can reject difficult-to-audit ones out of hand). So there are more reasons to prefer them than just avoiding power consolidation.

      (Maybe I should rewrite this essay, seeing as how I need to rephrase its core thesis in response to practically every response.)

      1. 6

        Using someone else’s is centralization:…

        A non-centralizing format is one where rolling your own implementation is not an obviously terrible idea.

        By this logic, code reuse is “centralization”. Is this actually what you’re arguing? Because it sounds pretty ridiculous.

        I, for one, would prefer to use a well-tested, auditable implementation than roll my own in an afternoon.

        1. 3

          As a practical example of the difference between XML and JSON when using popular libraries isn’t possible, let’s look at Monte’s JSON support. Monte doesn’t have any XML parsers. It does have a JSON parser, which started out as a handwritten lexer and parser. Later, we were able to write basic parser combinators, and that led to the current combinator-driven parser, which is smaller and faster. JSON is quicker than XML to support, even considering the effort required to write a parser library.

          Parsing even a comfortable and reasonable subset of XML is hard enough that I’ve simply not parsed XML in-process; instead, I’ll look for external tools like relpipe (XML example) which will do it for me, handling escaping and CDATA and entities and all of that fun stuff. Calling subprocesses is safer than XML to support, even considering the security risks.

          I could imagine JSON-specific arguments against this, so as a tiebreaker, we could consider Monte’s support for Capn Proto. We implement the recommended convention, and maintain a tool which compiles Capn Proto schemata into Monte ASTs. This requires a lot of binary parsing, recursive algorithms for nested data structures, deferred/lazy parsing, and other hallmarks of XML parsers. Capn Proto is much easier than XML to support, even considering the code generation that is required.

          To exemplify a point from the author’s sibling response:

          If correct implementation is such a problem that you would rather have a third party do it for you, then you are probably not qualified to independently audit or test a third party implementation! Sometimes (as with encryption) this is unavoidable: the problem domain is so complex that you need to trust expert opinion, and rolling your own is never going to be a good idea.

          Monte is a language where we generally don’t have FFI; we require binding libsodium and are prepared to pay the price for vulnerabilities in that particular C library, but we’re not willing to make that trade for a JSON or XML parser, and we won’t let users try to make that choice for themselves, either.

          1. 3

            This matches my experience.

            I’m not coming to this from a place of ignorance of XML (though I had a fair amount of ignorance of RSS while implementing it); I’m coming from a place of having spent a whole lot of time dealing with Jackson, libxml, and various implementations of xpath and xslt, in multiple languages, for work and often discovering that because the other end actually uses a tiny fraction of the expressive domain of XML, the easiest and most reliable way to get the data I needed was to use sed. At the same time, hand-processing subsets of json (or even all of msgpack) was a lot easier than doing fairly simple things with existing xml parsers.

          2. 2

            If you’re reusing your own code? I wouldn’t call that centralization of power (although sometimes poor factoring will lead to some of the same kinds of concerns – ex., if you have a module intended to be general purpose but the needs of the functions that use it cause it to become too tightly coupled, the behavior that must be implemented in order to make this ‘general purpose module’ work for one function may break the other in subtle ways). So, shared code is a shared point of failure.

            But dependencies, on top of being a shared point of failure, also need to be trusted or audited. Core complexity is both a major factor in implementation difficulty and a major factor in auditing & testing difficulty. Doing a complete audit (where you have absolute confidence that the behavior is correct) can sometimes be more difficult than a full reimplementation, since you need to have a complete model of the correct behavior (and if you are using the right tools, having a complete and correct model of the problem is the most difficult part of implementation); testing can help, at the cost of adding complexity and corner cases (ex., where is it safe and appropriate to add mocks – and must you modify the natural structure of the code to do so?).

            If correct implementation is such a problem that you would rather have a third party do it for you, then you are probably not qualified to independently audit or test a third party implementation! Sometimes (as wtih encryption) this is unavoidable: the problem domain is so complex that you need to trust expert opinion, and rolling your own is never going to be a good idea. But if you are, for instance, looking to serialize a key-value table or some kind of nested tree structure, there are no ‘essential’ gotchas (anybody with a CS degree knows everything you need to watch out for in both these cases to guarantee correctness) – so why deal with complexity introduced by the format? And on the other hand, why put trust into unknown third parties when the only thing you need to do to make checking correctness easy is to roll your own (simpler) format?

            Depending on your needs, it is often going to be easier to roll your own format than to use an existing parser implementation for a more complex one (especially with XML, where the format is awkward and all of the parsers are awkward too, simply due to the way the spec is defined); if you are using XML, odds are there’s an existing simpler format (with easier-to-audit implementations) that cleanly handles whatever you need to handle, and furthermore, a subset of that format that supports all your data without ugly corner cases.

            The correct format to use is the simplest format that is rich enough to express everything you need to express. Don’t use XML when you can use JSON and don’t use JSON when you can use TSV, and don’t use TSV with quoting if you can use TSV without quoting – and if you can use TSV without quoting, implement it with string split.

            If you use a third party dependency, every bug in that dependency is a bug you are responsible for but can’t promise to resolve – something you’re not in the right position to identify, and don’t have permission to fix. That’s a really ethically dicey proposition. Your users have no choice but to trust you to some degree (the best they can do is trust you or the most trustworthy of your peers – and often, that is no choice at all), and you have chosen to expose them to a risk that you know that you may be unable to protect them against. Most of the time, you’ve done this without it being truly necessary: you could take full responsibility for that functionality, but it’s boring, or you’re on a deadline that makes it impossible (in which case your project manager is morally culpable).

            Often, all that it takes to move from “I can’t possibly take responsibility for a hundred thousand lines of complex logic in a third party library” to “I can keep an accurate model of every operation this code performs in my head” is replacing large third party dependencies whose features you don’t need with a simpler hand-rolled implementation that has none of those features.

            1. 4

              I find it amusing that you claim that RSS is “webtech mentality” and yet unironically advocate the use of JSON, which is just as much, if not more, “webtech mentality” than RSS. And JSON probably has more corner cases than RSS.

              I’m not sure what you mean by “TSV with quoting”—as long as the values don’t contain the tab character, there’s not quoting required, unlike with CSV. I do wish the ASCII separator characters were used, but as Loup-Vailant said:

              it was mostly short term convenience: since basically forever, we had this thing we call a text editor, that displays ASCII text. So if your format is based on that, it’s easy to debug input and output by just displaying the text in an editor, or even modifying it manually. It is then very tempting to optimise the text for human readability over a standard text editor… next thing you know, you’re using text for everything.

              1. 3

                I find it amusing that you claim that RSS is “webtech mentality” and yet unironically advocate the use of JSON, which is just as much, if not more, “webtech mentality” than RSS. And JSON probably has more corner cases than RSS.

                I use the term “webtech mentality” to mean a really specific thing here: the idea (produced in part by postel’s law) that interoperability can be guaranteed without simple and consistent design, because the onus of properly interpreting even impossibly-complex or internally-inconsistent specs is upon the individual implementor. This is like the tech equivalent of neoliberalism, and has very similar results.

                I advocate JSON over XML as the container language for something like RSS because it has fewer practically-meaningful corner cases for this purpose. I’d advocate for MSGPACK over JSON in basically any case where JSON is appropriate. In many cases where JSON is used, it would make more sense to have a line-based or tabular format (though if you are working inside a web browser or are interacting with code running in a web browser, your choices are seriously limited and it’s not really possible to do anything in a completely sensible way).

                I’m not sure what you mean by “TSV with quoting”—as long as the values don’t contain the tab character, there’s not quoting required

                You just answered your own question :)

      2. 6

        On the other hand, no RSS client or RSS-generating application does any of this work until somebody complaints, because webtech culture loves postel’s law — which, put into practical terms, really means “just wing it, try to support whatever trash other applications emit, and if somebody can’t handle the trash you emit then it’s their problem”.

        The author complains that the world does exactly what they did.

        The author didn’t read the RSS specification, which states with attendant matching samples, “all date-times in RSS conform to the Date and Time Specification of RFC 822.”

        The author didn’t run a validator against their RSS!

        RSS has its problems, e.g. which RSS is RSS? But replacing XML with JSON? Or TSV? Or CSV?

        The author doesn’t realize the world is full of JSON. And making a production JSON, TSV, or CSV parser is not easy.

        […] there are basically no barriers to using existing JSON parsers and generators either.

        The same is true of XML. 🤦🏾‍♂️

        1. 6

          RSS specification is such a gem. Why omit the best part?

          All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred).

          This really is a case of webtech mentality the article is talking about. I really don’t understand what they were thinking.

          1. 3

            It is such a gem.

            As you note in another thread— Atom is well specified. But there’s no connection to the “webtech mentality.” The author has an issue with applying Postel’s Law and flails about ahistorically blaming that for things they don’t like.

            1. 1

              This really is a case of webtech mentality the article is talking about. I really don’t understand what they were thinking.

              What they were thinking is that RFC-822 (dd mm yy) was written in 1981 and not Y2K compliant. Its update (RFC-2822) wasn’t finalised until 2001.

              HTTP/1.1 supported three different datetime formats, but preferred RFC-1123 … which is … drumroll RFC-822 where “the syntax for the date is hereby changed to: date = 1*2DIGIT month 2*4DIGIT

              And lest I be accused of cutting out the juicy hilarity of the time:

              All mail software SHOULD use 4-digit years in dates, to ease the transition to the next century.

              In short, they didn’t want to add yet another format to the mix.

            2. 4

              the author complains that the world did exactly what they did

              No, I complained that the world is incentivized into doing what I did by poor design practices.

              I took one look at the RSS spec and said, “it isn’t worth fully reading and understanding this document for the sake of a ten line shell script”. This is a totally reasonable thing to do. Delivering a list of URLs should not require a standard dozens of pages long. So I read examples & tested against the least bloated RSS reader application I could find.

              The same is true of XML.

              Not really?

              JSON, CSV, and TSV have complicated edge cases that matter for “production” but can be avoided if you choose an appropriate format for your data (ex., if your data never includes newlines, tabs, or nulls, and it is tabular, TSV can be used without quoting & can be performed with tools like cut & awk).

              XML also has complicated edge cases, but it has no simple cases: you must use an XML parser, and all existing XML parsers are awkward. So maybe you buy in totally & try to use some of the other tools intended to making serializing in and out of XML easier – and then you’re stuck with specifying schemas (so that your XML data, no smaller and no less complex, is now less flexible, and you have also had to learn a second language for defining those schemas). Pretty soon, you are stuck juggling the differences between XSLT 1 and XSLT 2, or worse, you are using some off the shelf framework that generates completely alien-looking schemas out of an attempt to cover over the differences between XML and any sane way of organizing data.

              A sensible standard lets you write reliable, compatible code without many dependencies.

              The author didn’t read the RSS specification, which states with attendant matching samples, “all date-times in RSS conform to the Date and Time Specification of RFC 822.”

              I skimmed the RSS specification, read RFC 822, implemented RFC 822 date formats, and discovered that it would not work because (as sanxiyn notes) RSS doesn’t actually support RFC 822 date formats but instead a nonexistent variant of RFC 822 with four digit dates. Years after the original implementation – which, based on existing examples of RSS feeds, used the default human-readable locale date format provided by date & worked on most clients.

              If somebody was paying me to generate an RSS feed – sure, of course, I would read the entire RSS spec and use a big bloated XML library and spend weeks testing it against dozens of feed readers. If some stranger on the internet asks me to add RSS generation to a short shell script, spending more than four hours on it is a failure, because working with XML and RSS is miserable.

              1. 2

                No, I complained that the world is incentivized into doing what I did by poor design practices.

                That same process you (and the world) do is the same process that results in those “poor” designs.

                JSON, CSV, and TSV have complicated edge cases that matter for “production” but can be avoided if you choose an appropriate format for your data […] XML also has complicated edge cases, but it has no simple cases: you must use an XML parser, and all existing XML parsers are awkward. […]

                The first category has complicated edge cases, but you get to ignore them by changing the requirements. The second category has complicated edge cases… but you’ve arbitrarily argued that you can’t change the requirements to ignore them?

                For example, if quoting is hard, then don’t do it? Item descriptions are optional! And your proposed RSS alternative history formats don’t include one.

            3. 6

              There is one mistake that underpins much of the web culture. It started way back then with UNIX: textual formats. At some level, we could say that text (ASCII, XML, JSON, C++…) is just a special kind of binary format. In practice however, have one thing that fundamentally set them apart from binary formats: its use of delimiters.

              Text is all about delimiters: newlines to separate lines and paragraphs, tabs to separate fields, space to separate words… Same goes for formal languages: XML has opening and closing tags, C++ has brackets and semicolons… Binary formats on the other hand tend to use length fields instead of separators. This makes them much easier to parse: just read the length field, and if it’s small enough put the field in a buffer, which you can then decompose further. Text however requires us to read input until we see a particular delimiter, which by the way can be a sequence of characters instead of just a character. As a result, text is traditionally way, way harder to parse than binary formats, to the point where doing that in an unsafe language like C today is heavily frowned upon. Not to mention it’s generally slower as well.

              Why did they make such a huge mistake? As far as I know, it was mostly short term convenience: since basically forever, we had this thing we call a text editor, that displays ASCII text. So if your format is based on that, it’s easy to debug input and output by just displaying the text in an editor, or even modifying it manually. It is then very tempting to optimise the text for human readability over a standard text editor… next thing you know, you’re using text for everything.

              Mind you, when we use it to manipulate human written data, text is generally good. Programming languages, books, or comments like this one are a perfect fit for text (maybe not that perfect for programming languages, but it’s close). The problem is extending that to machine generated data. Using text for this is just a bad idea. Sure, it will interact with existing textual interfaces better, but it makes everything more complicated in the long run.

              For people complaining that a binary format is harder to debug: no they ain’t. Once we have a parser for it (and we do, because without it we couldn’t consume that data in the first place), it’s easy to write a textual representation or similar that we can display for human eyes, or even a full blown editor (it will be a very specialised editor, so it ought to be simple). Sure, that’s work, but consider that writing a correct parser for a textual representation might be just as expensive… unless you’re relying on some generic structure like XML or JSON, that makes everything even more complicated (sure it’s easier in the short term, but the total size and complexity of your system just went up a notch or three).

              My opinion on the matter: throwaway formats can definitely use textual representation. If it’s a specific application that will run for a limited time and affect a limited number of people, by all means, a quick and dirty job is the right thing to do. But if it’s a protocol intended for non-trivial adoption, textual formats should definitely be relegated to actual human text. Machine to machine communication should stick to much simpler binary formats. TLV alone goes a long way.

              1. 5

                Yes, RSS specification has problems, but why won’t one use Atom instead? Atom support is widespread and Atom is very well specified. The article does not mention Atom at all.

                1. 3

                  I’m surprised anyone would seriously consider generating RSS now that every remaining feed reader supports Atom and has supported it for decades.

                  JSONFeed looks somewhat appealing, but it’s not really free of ambiguities either. For example, I tried to make a patch to fix one of those that arises from having both “publication date” and “modification date” in the protocol, which still lingers there.

                  1. 1

                    TBH, I tried generating RSS because somebody asked for it & I had it working in about ten lines in the reader I was testing with in a few minutes. “Serious consideration” is not on the table - this whole project (including fetching titles, generating HTML, posting digests to SSB & twitter) is maybe 150 lines of shell and should never be substantially more complicated than it is.

                    At least a year later, several people complained that my RSS feed didn’t work in their reader, after which I took a look at the spec again and started testing against several readers.

                    I didn’t know much about either RSS or Atom at the time, since my only prior experience was writing a “stupidrss” reader that converted both formats into a newline-separated list of URLs with sed.

                2. 8

                  Winer chose XML for RSS because he used XML for OPML, and Winer hasn’t met any problem that cannot be solved by outlining.

                  RSS does suck, which was why a team developed Atom instead (which fixed the encoding issues and date formats). Of course Winer saw this as an attack and the Syndication Wars were on! Good times.

                  The decision to use XML for RSS led inevitably to both Google Reader and Google’s decision to kill Google Reader, and that has been a huge setback for the “open web” (which, while it was never really open — basically for exactly these reasons — has never been as close to open since).

                  I miss Google Reader too, but c’mon.

                  This essay originally appeared on secure scuttlebutt at %3k6qAo85Q/1hjMW6xc3S0MNt+PsBCM00S354HeXOUco=.sha256

                  I’m sure it did, but at least I can’t read it there. Looks like the big bad evil centralized web still has a leg up.

                  1. 4

                    This essay originally appeared on secure scuttlebutt at %3k6qAo85Q/1hjMW6xc3S0MNt+PsBCM00S354HeXOUco=.sha256

                    I’m sure it did, but at least I can’t read it there. Looks like the big bad evil centralized web still has a leg up.

                    It gets more ironic. SSB’s core protocol is tied to the formatting rules of JSON.stringify.

                    Talk about the webtech mentality…

                    1. 3

                      That’s my number 1 issue with SSB (which I generally think is really cool!) - essentially all clients are stuck on the same library to handle the protocol stuff because it was badly designed and nobody wants to implement a compatible client (yes, I know someone did; they tied the reimplementation to the same obscure database format as the canonical one).

                      1. 2

                        I posted it to SSB as a criticism of SSB’s own webtech mentality – which I believe was understood there :)

                        1. 1

                          Fair enough. It was a bit too subtle for me.

                          1. 1

                            It’s a shame that you’re not able to see the full thread over on SSB. It’s full of good discussion that really can’t be easily summarized. (We get into JSON vs BSON vs MSGPACK vs protobuf vs s-expressions, and gopher vs gemini). I may rework some of the points there into a new essay – SSB is not for public posts, after all.

                            1. 1

                              Very cool!

                    2. 3

                      The author’s core point seems to be:

                      Technology decisions have a genuine moral valence. Every temporary hack, as soon as more than one person uses it, becomes effectively permanent. This means that if you are creating something that multiple people will use, you are in relationship of power over those people & part of your responsibility is to make sure you don’t harm them in the long run.

                      I agree.

                      But the argument that complex formats leads to centralisation? The evidence that XML/RSS/HTTP/DNS/blahblahblah are simultaneously too easy to get wrong and also too hard to get right, so that only large well funded organisations can manage them well? No mate. The existence counter-examples are literally countless.

                      Ironically, webtech people have written some of the most cogent criticisms of Postel’s Law: RFC 3117 and The Harmful Consequences of the Robustness Principle.

                      The author again:

                      It doesn’t take a genius to come up with better formats than these for the same applications. These aren’t inherently-complex problems. These are relatively simple problems with obvious and straightforward solutions that the industry collectively ignored in favor of obviously-bad solutions that a couple famous people promoted.

                      Astounding arrogance and hindsight bias.

                      I strongly recommend reading about the history of some of these formats.

                      1. 2

                        The existence counter-examples are literally countless.

                        Would you care to list some? (Just enough to demonstrate that there is no trend in the opposite direction)

                        Also: I think you misunderstood how DNS & HTTP centralize. It’s not, in that case, so much that implementation is hard (though HTTP is more complicated to implement than a properly designed protocol that performs the same tasks would be), but that having a host name in a data address means that the data is inherently tied to the host – third party mirrors cannot help ease the load of lots of requests, and so it is the responsibility of the original host to build up more capacity than he would ever use, lest the data become unavailable. The host (or somewhere on the path to the host, since the usual solutions to this problem involve creating layers of load balancing) is a single point of failure. This is implicit in the address format.

                        I strongly recommend reading about the history of some of these formats.

                        If there’s any reason why Dave Winer could not have possibly imagined a format like < URL > < tab > < ISO date > < newline > for a reverse-chronological list of URLs in 1996, let me know :)

                        1. 2

                          Would you care to list some? (Just enough to demonstrate that there is no trend in the opposite direction)

                          You re-explained your argument about centralisation, so the following is me answering your question though I realise it isn’t cogent to your point:

                          For XML, RSS, HTTP, and DNS in specific? There are countless serialisers, deserialisers, tools, clients, servers, applications, books, courses, institutions ranging from country neighborhood house parties to international standards organisations etc. etc. etc that every single day and night are rehearsing and rebuilding from scratch these technologies. SV could be nuked and every country’s top three tech companies have their charters revoked and these technologies would live on.

                          For complex formats in general? waves hand around in all directions Complex things are easier to make than simple things. That Antoine de Saint-Exupéry quote. That Rich Hickey talk. Because of that, the world is chock full of complex things. Every natural human language. LIFE

                          1. 2

                            If there’s any reason why Dave Winer could not have possibly imagined a format like < URL > < tab > < ISO date > < newline > for a reverse-chronological list of URLs in 1996, let me know :)

                            What if the content contains newlines (or tabs, such as might be present if the content is source code)?

                            What does the ISO date refer to? Created date? Modified date? What if you want both?

                            RSS is more than a list of URLs. You generally have the full text of an entry, author name (important in the case of multiple authors on a source), modification times, other media (podcasts are built on RSS, and were Winer’s “next big thing”). You want stuff like when the feed was last modified, and so on.

                            Atom tightened up RSS a bit, but had to create its entirely own format for reasons that are now lost in the mists of time but can generally be summarized as Winer being a dick.

                            Edit this article about the rise and fall of RSS is fascinating and a lot more nuanced than my crude summary. Winer is still a dick though.

                            https://twobithistory.org/2018/12/18/rss.html

                            1. 2

                              What if the content contains newlines (or tabs, such as might be present if the content is source code)?

                              URLs and ISO dates are both specified to never contain tabs or newlines.

                              What does the ISO date refer to? Created date? Modified date? What if you want both?

                              The date the owner of the RSS feed added that link to his feed.

                              RSS is more than a list of URLs.

                              Winer’s first mistake with the format, and probably one of the reasons he thought XML might be a good idea for the container structure. He’s packed in a whole lot of stuff that people might want and in the process, made the core function a lot messier.

                              I might be convinced that it is justifiable to have a short “title” field – one specified to contain no tabs, newlines, or formatting. This lets somebody decide whether or not they want to open the link (which may be, as you noted, a link to a media file and therefore possibly quite large). But, once you’re allowing people to put arbitrarily large formatted commentary into a syndication feed, you are essentially delivering duplicates of the websites you are linking to, and so you have screwed up the basic value proposition of RSS (to be able to fetch the stuff you haven’t seen and not fetch the stuff you have).

                              With a line-based format that has the date of a post in each line, a reader can fetch each line and then stop if the date in the line is not newer than the newest date in its previous fetch cycle. That’s a whole lot better than a “feed modified date”.

                              1. 2

                                It sounds like Winer’s goals for RSS and your goals for RSS are different.

                                You might prefer OPML. 😉

                            2. 1

                              Also: I think you misunderstood how DNS & HTTP centralize. It’s not, in that case, so much that implementation is hard (though HTTP is more complicated to implement than a properly designed protocol that performs the same tasks would be), but that having a host name in a data address means that the data is inherently tied to the host – third party mirrors cannot help ease the load of lots of requests, and so it is the responsibility of the original host to build up more capacity than he would ever use, lest the data become unavailable. The host (or somewhere on the path to the host, since the usual solutions to this problem involve creating layers of load balancing) is a single point of failure. This is implicit in the address format.

                              You just got me in the feels. I spent 2010 giving presentations on this very topic. One was half jokingly entitled “SSL is racist” because it explained how the middlebox caching then common in the developing world was being killed.

                              I think we both love content centric networking technologies; but, for the most part, they didn’t really exist until recently!

                              If there’s any reason why Dave Winer could not have possibly imagined a format like < URL > < tab > < ISO date > < newline > for a reverse-chronological list of URLs in 1996, let me know :)

                              I guess no reason except we were busy litigating things likes:

                              • How to encode URLs into URIs into US-ASCII (percent encoding?)
                              • Which ISO date time to use
                              • What even is a newline? (IIRC UserLand was Windows, so I guess it’s \r\n)

                              Also, UTF-8 hadn’t won yet. (See all the corner cases of encoding UTF-8 in URLs and visa-versa.) So your format would have a heap of fun there too; it doesn’t even have the advantage that documents get in providing context for auto-detection.

                              1. 1

                                Which ISO date time to use

                                RSS specifies RFC 822 , so the date is in the format “Mon, 29 Mar 2021 15:26:45 +0200”.

                                What even is a newline? (IIRC UserLand was Windows, so I guess it’s \r\n)

                                I’m pretty sure it was originally developed for the classic Mac, so \r would have been more appropriate.

                                1. 1

                                  RFC-822 is two digit years. You might be thinking of RFC-2822, published in 2001.

                                  1. 1

                                    Porque no dos?

                                    All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred).

                                    https://validator.w3.org/feed/docs/rss2.html

                                    RSS 2.0 is of course the rebranding of RSS 0.92 and is basically an end-run around RSS 1.0, which uses real ISO dates, being derived from RDF.

                                    1. 1

                                      I addressed this in another thread.

                                      But you’ve remade my point: there was no brain dead correct date and time format in 1996.

                                      1. 1

                                        Maybe not in the US. I’ve been using ISO 8601 since I learned to write.

                                        1. 1

                                          🙄 Have you read ISO-8601:1998?

                                          1. 1

                                            Nope, can’t afford to.

                                            But Sweden has used (YY)YY-MM-DD as a date format for as long as I’ve been aware there’s been a date format… The Swedish standard SIS 10211 was replaced by ISO-8601, but it dates from 1972.

                          2. 3

                            On the other hand, no RSS client or RSS-generating application does any of this work until somebody complaints, because webtech culture loves postel’s law — which, put into practical terms, really means “just wing it, try to support whatever trash other applications emit, and if somebody can’t handle the trash you emit then it’s their problem”. No point in actually specifying the date format for RSS — everybody should just guess, and if some client can’t handle every possible date format then fuck ’em (unless your valuable user happens to prefer that client).

                            Postel’s law isn’t a choice; it’s a force of nature. Any widely used system or protocol has to interoperate with misbehaving implementations like your RSS feed. RSS is poorly specified but the same issues are always present.

                            Quoting from IPv4, IPv6, and a sudden change in attitude

                            Postel’s Law is the principle the Internet is based on. Not because Jon Postel was such a great salesperson and talked everyone into it, but because that is the only winning evolutionary strategy when internets are competing. Nature doesn’t care what you think about Postel’s Law, because the only Internet that happens will be the one that follows Postel’s Law. Every other internet will, without exception, eventually be joined to The Internet by some goofball who does it wrong, but just well enough that it adds value, so that eventually nobody will be willing to break the connection. And then to maintain that connection will require further application of Postel’s Law.

                            RSS, TSV, CSV

                            I mean, RSS is a product of its time. I understand why RSS didn’t use JSON — because JSON didn’t exist yet. But RSS could have used… TSV. Or CSV. Or a line-based key-value format with section separators. Using XML for anything should have been immediately and obviously a bad idea to any professional developer, even in the early 90s, considering just how many problems one immediately encounters in trying to work with it.

                            TSV and CSV are riddled with interoperability issues; CSV has an RFC but noncompliant files and readers are a regular occurrence. The timing is also wrong; XML was standardized in 2001.

                            I spend a depressing amount of time coping with CSV import issues; here are some issues I would expect in a CSV or TSV based RSS.

                            • Are line endings permitted within a field?
                            • Are quotes, tabs, or commas escaped?
                            • How are nested quotes handled?
                            • Are quote marks mandatory?
                            • Are trailing spaces after a comma permitted?
                            • Are empty lines between records permitted?
                            • How are missing fields addressed?
                            • Line endings - \r\n, \n, \r.
                            • Encoding issues - receiving a localized character set instead of UTF-8.

                            New web browsers cannot be written, nor can web browsers be maintained except by the three largest tech companies in the world, because postel’s law (along with the IETF policy of “loose consensus and running code”) has doomed all web standards to being enormous baroque messes of corner cases that can only be navigated by the chinese-army technique of throwing waves of cheap contractors at it. Since no single person can completely understand any W3C standard, no person can be sure that they are generating W3C-compliant code or markup, so they test using an existing dominant browser (or maybe two or three of them). Any new browser, even in the event that it happens to adhere to the W3C standard, is unlikely to behave exactly the same way as these dominant browsers, and so actually-existing code will look “broken” (even if it’s actually being interpreted correctly according to the standard). This is a moral failing: it leads inevitably to centralization.

                            How is interoperability a moral failing? It isn’t realistic to expect that people will read the standards or fix their code. XHTML tried ignoring Postel’s law - that’s why we’re using HTML5. There are only two kinds of protocols: the ones people complain about and the ones nobody uses.

                            1. 4

                              I’m going to try to repeat my main point here, with more emphasis:

                              Belief in postel’s law takes the onus off of protocol designers to design simple, clean protocols that can be easily implemented and audited and instead puts the onus on individual developers to deal with whatever nonsense happens to exist. This, in turn, means that common kinds failures to properly implement unnecessarily complicated specs are rolled into the folk understanding of what it means to “properly implement” those specs (and sometimes into the specs themselves), until (quite quickly) interoperability becomes impossibly difficult.

                              You can stop this process by:

                              1. making protocols that are easy to implement correctly
                              2. making it easy to identify implementation errors

                              If you do this, then nobody has a good excuse for an implementation error, and you can feel free to reject invalid input. Postel’s law becomes unnecessary.

                              If you don’t do this, every popular protocol eventually becomes like HTML5: something that nobody other than Google and Microsoft can fully implement.

                              1. 2

                                Do you have any examples of protocols with multiple implementations and at least 1k users that meet your criteria?
                                The IETF protocols are widely implemented and used; it’s a given that there are implementation and specification errors.

                                If you do this, then nobody has a good excuse for an implementation error, and you can feel free to reject invalid input. Postel’s law becomes unnecessary.

                                I have yet to work at a place where rejecting invalid input was an option; you can’t choose which systems you interoperate with. Has your experience been different?

                            2. 2

                              Could you explain, or point my to an explanation of the term “webtech mentality”? Haven’t heard of it before.

                              1. 2

                                In response to some of the misunderstandings I’ve seen in this comments section, I’ve written a different essay making the same points in a way that may be more clear.

                                1. 0

                                  This article is one of the most insanely idiotic things I’ve ever read. At no point in this rambling, incoherent tirade does the the author even come close to a rational thought. Everybody in here is now dumber for having read it. I award you no points, and may god have mercy on your soul.

                                  1. 2

                                    Please don’t use the comments section to insult the author without saying anything about the article. Just hide the story.