1. 60
  1.  

    1. 45

      for signing events and requests to work, matrix expects the json to be in canonical form, except the spec doesn’t actually define what the canonical json form is strictly

      I’m astonished by how often this mistake is repeated. I’ve been yelling into the void about it for what feels like an eternity, but I’ll yell once more, here and now: JSON doesn’t define, specify, guarantee, or even in practice reliably offer any kind of stable, deterministic, or (ha!) bijective encoding. Which means any signature you make on a JSON payload is never gonna be sound. You can’t sign JSON.

      If you want to enforce some kind of canonicalization of JSON bytes, that’s fine!! and you can (maybe) sign those bytes. But that means that those bytes are no longer JSON! They’re a separate protocol, or type, or whatever, which is subject to the rules of your canonical spec. You can’t send them over HTTP with Content-Type: application/json, you can’t parse them with a JSON parser, etc. etc. with the assumption that the payload will be stable over time and space.

      1. 10

        Oh god, I thought we had learned our lesson from Secure Scuttlebutt. Come on people.

        1. 8

          For anyone else who didn’t know what this referred to, a bit of searching led me to this post, which I did a Find for “JSON” in.

          Edit: adding quotes around “JSON”.

      2. 9

        canonical json is actually pretty well defined in some matrix spec appendix if i recall?

        1. 9

          Matrix Specification - Appendices § 3.1. Canonical JSON. I haven’t reviewed to see just how “canonical” it is/whether it truly excludes all but one interpretations/productions of a given object etc., but that’s been part of the spec since no later than v1.1 (November 2021), maybe earlier.

      3. 6

        Doesn’t https://www.rfc-editor.org/rfc/rfc8785 specify a good enough canonical form?

        1. 18

          It’s a perfectly lovely canonical form, but it’s not mandatory. JSON parsers will still happily accept any other non-canonical form, as long as it remains spec-compliant. Which means the JSON payloads {"a":1} and { "a": 1 } represent exactly the same value, and that parsers must treat them as equivalent.

          If you want well-defined and deterministic encoding, which produces payloads that can be e.g. signed, then you need guarantees at the spec level, like what e.g. CBOR provides. There are others. (Protobuf is explicitly not one!!)

          1. 8

            Of course other forms are equivalent, but only one is canonical. That’s what the word means.

            1. 7

              Sure, if you want to parse received JSON payloads and then re-encode them in your canonical form, you can trust that output to be stable. Just as long as you don’t sign the payload you received directly!

          2. 4

            That reminds me, I wonder what the status is on low bandwidth Matrix, which uses CBOR.

            When I read about it, I was wondering why a high bandwidth Matrix would be the default, if they do the same thing. Now I wonder for more reasons.

          3. 1

            Can you say more about Protobuf not guaranteeing a deterministic encoding at the spec level? Is it that the encoding is deterministic for a given library, but this is left to the implementation rather than the spec? Does the spec say something about purposely leaving this open?

            1. 2

              Protobuf encoding is explicitly defined to be nondeterministic and unstable.

              https://protobuf.dev/programming-guides/encoding/#order

              When a message is serialized, there is no guaranteed order for how its known or unknown fields will be written. Serialization order is an implementation detail, and the details of any particular implementation may change in the future.

              Do not assume the byte output of a serialized message is stable.

              By default, repeated invocations of serialization methods on the same protocol buffer message instance may not produce the same byte output. That is, the default serialization is not deterministic.

              https://protobuf.dev/programming-guides/dos-donts/#serialization-stability

              Never Rely on Serialization Stability

              Needless to say, you should never sign a protobuf payload :)

              1. 1

                Thanks I think I’m gonna use Rivest’s S-expressions.

              2. 1

                Serialization order is an implementation detail, and the details of any particular implementation may change in the future.

                The implementation is explicitly allowed to have a deterministic order.

                At the spec level it’s undefined.

                At the implementation level, it may be defined. That’s typical for all such technologies across the industry.

                Never Rely on Serialization Stability Across Builds

                An important detail was omitted.

                1. 1

                  The implementation is explicitly allowed to have a deterministic order.

                  At the spec level it’s undefined.

                  Yes, which is my point — unless you’re operating in a hermetically sealed environment, senders can’t assume anything about the implementation of receivers, and vice versa. You can maybe rely on the same implementation in an e.g. unit test, but not in a running process. The only guarantees that can be assumed in general are those established by the spec.

                  Across Builds

                  Exact same thing here — modulo hermetically sealed environments, senders can’t assume anything about the build used by receivers, and vice versa.

        2. 1

          Tangentially, reading the spec,

          Sorting of Object Properties […] formatted as arrays of UTF-16

          JSON is UTF-8, not UTF-16.

          That spec should specify Unicode order (Unicode, ASCII, and UTF-8, UTF-32 all share sorting order), not UTF-16 as UTF-16 has some code points out of order. That was one of the reasons why UTF-8 was created.

          Also, we don’t sort JSON object keys for cryptography. Order is inherited from the UTF-8 serialization for verification. Afterward, the object may be be unmarshalled however seen fit. This allows arbitrary order.

      4. 6

        One does not sign JSON, one signs a bytearray. That multiple JSON serializations can have the same content does not matter. One could even argue that it’s a feature: the hash of the bytearray is less predictable which makes it more secure.

        I do not get the hangup on canonicalization. Just keep the original bytearray with the signature: done.

        Lower in this thread a base64 encoding is proposed. Nonsense, just use the bytearray of the message. What the internal format is, is irrelevant. It might be JSON-LD, RDF/XML, Turtle, it does not matter for the validity of the signature. The signature applies to the bytearray: this specific serialization.

        Trying to deal with canonicalization is a non-productive intellectual hobby that makes specifications far too long, complex and error prone. It hinders adoption of digital signatures.

        1. 5

          Nonsense, just use the bytearray of the message.

          A JSON payload (byte array) is explicitly not guaranteed to be consistent between sender and receiver.

          What the internal format is, is irrelevant. It might be JSON-LD, RDF/XML, Turtle, it does not matter for the validity of the signature. The signature applies to the bytearray: this specific serialization.

          This is very difficult to enforce in practice, for JSON payloads particularly.

          1. 5

            Of course a bytearray is consistent. There’s a bytearray. It has a hash. The bytearray can be digitally signed. Perhaps the bytearray can be parsed as a JSON document. That makes it a digitally signed JSON document. It’s very simple.

            Data sent from sender to receiver is sent as a bytearray. The signature will remain valid for the bytearray. Just don’t try to parse and serialize it and hope to get back the same bytearray. That’s a pointless exercise. Why would you do that? If you know it will not work, don’t do it. Keep the bytearray.

            What is hard to enforce? When I send someone a bytearray with a digital signature, they can check the signature. If they want to play some convoluted exercise of parsing, normalizing, serializing and hoping for the same bytearray, you can do so, but don’t write such silliness in specifications. It just makes them fragile.

            Sending bytearrays is not hard to do, it’s all that computers do. Even in browsers, there is access to the bytearray.

            Canonicalization is immature optimization.

            1. 5

              Of course a bytearray is consistent. There’s a bytearray. It has a hash. The bytearray can be digitally signed. Perhaps the bytearray can be parsed as a JSON document. That makes it a digitally signed JSON document. It’s very simple.

              If you send that byte array in an HTTP body with e.g. Content-Type: octet-stream, yes — that marks the bytes as opaque, and prevents middleboxes from parsing and manipulating them. But with Content-Type: application/json, it’s a different story — that marks the bytes as representing a JSON object, which means they’re free to be parsed and re-encoded by any middlebox that satisfies the rules laid out by JSON. This is not uncommon, CDNs will sometimes compact JSON as optimizations. And it’s this case I’m mostly speaking about.

              I’m not trying to be difficult, or speculating about theoreticals, or looking for any kind of argument. I’m speaking from experience, this is real stuff that actually happens and breaks critical assumptions made by a lot of software.

              If you sign a JSON encoding of something, and include the bytes you signed directly alongside the signature as opaque bytes — i.e. explicitly not as a sibling or child object in the JSON message that includes the signature — then no problem at all.

              tl;dr: sending signatures with JSON gotta be like {"sig":"XXX", "msg":"XXX"}

              1. 5

                Such CDNs would break Subresource Integrity and etag caching. Compression is a much more powerful optimization than removing a bit of whitespace, so it’s broken and inefficient. Changing the content in any way based on a mimetype is dangerous. If a publisher uses a CDN with such features, they should know to disable them when the integrity of the content matters.

                I’m sending all my mails with a digital signature (RFC 4880 and 3156). That signature is not applied to a canonicalized form of the mail apart from having standardized line endings. It’s applied to the bytes. Mail servers should not touch the content other than adding headers.

                1. 3

                  Changing the content in any way based on a mimetype is dangerous.

                  Dangerous or not, if something says it’s JSON, it’s subject to the rules defined by JSON. A proxy that transforms the payload according to those rules might have to intermediate on lower-level concerns, like Etag (as you mention). But doing so would be perfectly valid.

                  And it’s not limited to CDNs. If I write a program that sends or receives JSON over HTTP, any third-party middleware I wire into my stack can do the same kind of thing, often without my knowledge.

                  I’m sending all my mails with a digital signature (RFC 4880 and 3156). That signature is not applied to a canonicalized form of the mail apart from having standardized line endings. It’s applied to the bytes. Mail servers should not touch the content other than adding headers.

                  Yes, sure. But AFAIK there is no concept of a “mail object” that’s analogous to a JSON object, is there?

                  1. 2

                    Dangerous or not, if something says it’s JSON, it’s subject to the rules defined by JSON.

                    A digital signature does not apply to JSON. It applies to a bytearray. If an intermediary is in a position to modify the data it transmits and does not pass along a bytearray unchanged, it’s broken for the purpose of passing on data reliably and should not be used.

                    Canonicalization cannot work sustainably because as soon as it does some new ruleset is thought up by people that enjoy designing puzzles more than creating useful software. Canonicalization has a use when you want to compare documents, but is a liability in the context of digital signatures.

                    A digital signature is meant to prove that a bytearray was endorsed by an entity with a private key.

                    If any intermediary mangles the bytearray, the signature becomes useless and the intermediary should be avoided. An algorithm that tries to undo the damage done by broken intermediaries is not the solution. Either the signature matches the bytearray or it does not.

                    1. 2

                      A digital signature does not apply to JSON. It applies to a bytearray.

                      100% agreement.

                      If an intermediary is in a position to modify the data it transmits and does not pass along a bytearray unchanged, it’s broken for the purpose of passing on data reliably and should not be used.

                      Again 100% agreement, which supports my point that you can’t sign JSON payloads, because JSON explicitly does not guarantee that any encoded form will be preserved reliably over any transport!

                      1. 2

                        JSON explicitly does not guarantee that any encoded form will be preserved reliably over any transport!

                        Citation needed. I can read nothing about this in RFC 8259. Perhaps your observation is a fatalist attitude that springs from working with broken software. Once you allow this for JSON, what’s next? Re-encoding JPEGs, adding tracking watermarks to documents? No transport should modify the payload that it is transporting. If it does, it’s broken.

                        There is no guarantee about the behavior transports in the JSON RFC 8259. There is also no text that allows serialization to change for certain transports.

                        1. 1

                          Once you allow this for JSON, what’s next? Re-encoding JPEGs, adding tracking watermarks to documents?

                          Yes, sure. If the payloads are tagged as specific things with defined specs, intermediaries are free to modify them in any way that doesn’t violate the spec. This isn’t my speculation, or fatalism, it’s direct real-world experience.

                          No transport should modify the payload that it is transporting. If it does, it’s broken.

                          If you want to ensure that your payload bytes aren’t modified, then you need to make sure they’re opaque. If you want to send such bytes in a JSON payload, you need to mark the payload as something other than JSON, or encode those bytes in a JSON string.

        2. 4

          You might be missing the core info about why many signed JSON APIs are trash: they include the signature in the same JSON document as the thing they sign:

          {
              "username": "Colin",
              "message": "Hi!",
              "signature": "some base 64 string"
          }
          

          The signature is calculated for a JSON serialization of a dict with, in this example, the keys username and message, then the signature key is added to the dict. This modified dict is serialised again and sent over the network.

          This means that the client doesn’t have the original byte array. It needs to parse the JSON it was given, remove the signature key, and then serialize again in some way that generates exactly the same bytes, and then it can sign those bytes and validate the message.

          This is clearly completely bonkers, but several protocols do variations on this, including Matrix, Secure Scuttlebut, and whatever this is https://cyberphone.github.io/doc/security/jsf.html#Sample_Object

          The PayPal APIs do the thing you’re thinking of: they generate some bytes (which you can parse to JSON) and provide the signature as a separate value (as an HTTP header, I think).

          @peterbourgon’s suggestion also avoids the core issue and additionally protects against middle boxes messing with the bytes (which I agree they shouldn’t do, but they do so 🤷) and makes the easiest way of validating the signature also the correct way.

          (If the application developer’s web framework automatically parses JSON then you just know that some of them are going to remove the signature key, reserialise and hash that (I’ve seen several people on GitHub try to do this with the JSON PayPal produces))

          The PayPal way is fine, but you then get into the question of how to transmit two values instead of one. You can use HTTP headers or multipart encoding, but now your protocol is tied to HTTP and users need to understand those things as well as JSON. Peter’s suggestion requires users only to understand JSON and some encoding like base64.

          A final practical point: webservers sometimes want to consume the request body and throw it away if they can parse it into another format (elixir phoenix does this, for efficiency, they say), so your users may need to provide a custom middleware for your protocol and get it to run before the default JSON middleware, which is likely to be more difficult for them than turning a base64 string back into JSON.

      5. 5

        likewise, it really frustrates me. I’m not surprised, just annoyed, because it’s an aspect of things that always gets fixed as an afterthought in cryptography-related standards…

        nobody likes ASN.1, especially not the experts in it, but it exists for a reason. text-based serialization formats don’t canonicalize easily and specifying a canonicalization is extra work. even some binary formats, such as protocol buffers, don’t necessarily use a canonical form (varints are the culprit there).

        1. 5

          ASN.1 does not help with canonicalization either. It has loads of different wire encodings, e.g. BER, PER. For cryptographic purposes you must use DER, which is BER with extra rules to say which of the many alternative forms in BER must be used, e.g. forbidding encodings of integers with leading zeroes.

          1. 2

            yes, that’s fair.

      6. 4

        Huge Cosmos SDK vibes.

        Signing messages was an entire procedure involving ordering JSON fields alphanumerically, minifying and then signing the hash.

        So many hours have been spent because a client, typically not written in Go, would order a field differently, yielding a different hash.

        Good times.

        1. 3

          Brother, I’ve got some stories. I’ve actually filed a CVE to the Cosmos SDK for a signing-related issue. (Spoiler: closed without action.)

          1. 3

            Yup, sounds like an SDK episode to me.

            I think I remember seeing your name on a GitHub issue conversation, with the same couple of “adversaries” justifying their actions lol.

            I distanced myself from that ecosystem both professionally and hobby-wise because I did not like how the tech stack was implemented, and how the governance behaved.

            Although most of the bad decisions have been inherited from a rather… peculiar previous leadership.

      7. 2

        A solution that I like for this is base64 encoding the json, and signing the base64 blob.

        Which is a roundabout way to agree: don’t sign json.

        1. 9

          …but this has the same problem? If you reorder the keys in an object in the JSON, you’re going to get a different base64 string.

          1. 7

            No. The point is that you get a different base64 string. It makes it obvious that the message was tampered with.

            The problem is that when canonicalizing json, there are multiple json byte sequences that can be validated with a given signature.

            A bug in canonicalizing may lead to accepting a message that should not have been accepted. For example, you may have duplicate fields. One json parser may take the first duplicate, one may take the last, and if you canonicalized after parsing and passed the message along, now you can inject malicious values:

            {
               "signed-field": "good-value"
               "signed-field": "malicious-payload"
            }
            

            You may say “but if you follow the RFC, don’t use the stock json libraries that try to make things convenient, and are really careful, you’re protected”. You’d be right, but it’s a tall order.

            With base64, there’s only one message that will validate with a given signature (birthday attacks aside). It’s much harder to get wrong.

          2. 4

            Well, not exactly. {"a":1} and { "a": 1 } are different byte sequences, and equivalent JSON payloads. But the base64 encodings of those payloads are different byte sequences, and different base64 payloads – base64 is bijective. (Or, at least, some versions of base64.)

            1. 5

              Another way to phrase this is that it makes it hard to shoot yourself in the foot. If you get straight JSON over the wire, what do you do? You need to parse it in order to canonicalize it, but your JSON parser probably doesn’t parse it the way you need it to in order to canonicalize it for verification, so now you have to do a bunch of weird stuff to try and parse it yourself, and maybe serialize a canonicalized version again just for verification, etc.

              The advantage of using base64 or something like it (e.g. straight hex encoding as mentioned in your sibling comment) is that it makes it obvious that you should stop pretending that you can reasonably sign a format that can’t be treated as “just a stream of bytes” (because you can’t - a signature over a stream of bytes is the only cryptographic primitive we have, so what you’re actually doing by “canonicalizing JSON” is turning the JSON into a stream of bytes, poorly) and just sign something that is directly and solely a stream of bytes.

              Edit: the problem with this is that you’ve now doubled your storage cost. The advantage of signing JSON is that you can deserialize, store that in a database alongside the signature, and reconstruct functionally the same thing if you need to retransmit the original message (for example to sync a room up to a newly-joined Matrix server). If you’re signing base64/hex-encoded blobs, you now need to store the original message that was signed, rather than being able to reconstruct it on-the-fly. But a stream of bits isn’t conducive to e.g. database searches, so you still have to store the deserialized version too. Hence: 2x storage.

              1. 3

                Another way to phrase this is that it makes it hard to shoot yourself in the foot. If you get straight JSON over the wire, what do you do? You need to parse it in order to canonicalize it,

                Even doing that much I would consider to be a success!

                One, It’s rare that a canonical form is even defined, and more rare still that it’s defined in a way that’s actually unambiguous. I’m dubious that Matrix’s canonical JSON spec (linked elsewhere) qualifies.

                Two, even if you have those rules, it’s rare that I’ve ever seen code that follows them. Usually a project will assume the straight JSON from the wire is canonical, and sign/verify those wire bytes directly. Or, it might parse the wire bytes into a value, but then it will sign/verify the bytes produced by the language default JSON encoder, assuming those bytes will be canonical.

            2. 4

              I don’t understand why a distinction between reordering keys and changing whitespace needs to be made. Are they treated differently in the JSON RFC?

              equivalent JSON payloads

              Equivalent according to whom? The JSON RFC doesn’t define equality.

              Are you simply saying that defining a canonical key ordering wouldn’t be sufficient since you’d need to define canonical whitespace too? If so, I don’t understand why it contradicts bdesham’s comment, since they just gave a single example of what base64 doesn’t canonicalize.

              1. 4

                I don’t understand why a distinction between reordering keys and changing whitespace needs to be made. Are they treated differently in the JSON RFC?

                I didn’t mean to distinguish key order and whitespace. Both are equally and explicitly defined to be arbitrary by the JSON spec.

                Equivalent according to whom? The JSON RFC doesn’t define equality.

                Let me rephrase: {a":1,"b":2} and {"b":2,"a":1} and { "a": 1, "b": 2 } are all different byte sequences, but represent exactly the same JSON object. The RFC specifies JSON object equality to at least this degree — we’ll ignore stuff like IEEE float precision 😉 If you defined a canonical encoding, your parser would reject non-canonical input, which isn’t permitted by the JSON spec, and means you’re no longer speaking JSON.

                1. 4

                  The RFC specifies JSON object equality to at least this degree

                  I don’t think so. At least RFC 8259 doesn’t identify any (!) of those terms. (It can’t for at least two reasons: it doesn’t know how to compare strings, and it explicitly says ordering of kv pairs may be exposed as semantically meaningful to consumers.)

                  JSON is semantically hopeless.

                  1. 2

                    RFC 8259 … explicitly says ordering of kv pairs may be exposed as semantically meaningful to consumers

                    Where? I searched for “order” and didn’t find anything that would imply this conclusion, AFACT.

                    Here’s what I did find:

                    An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.

                    and

                    JSON parsing libraries … differ as to whether or not they make the ordering of object members visible to calling software. Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences

                    which to me seems to pretty clearly say that order can’t matter to implementations. Maybe I’m misreading.

                    JSON is semantically hopeless.

                    JSON is an encoding format that’s human-readable, basically ubiquitous, and more or less able to express what most people need to express. These benefits hugely outweigh the semantic hopelessness you point out, I think.

                    1. 4

                      I think you did misread it, I’m afraid.

                      Those are the quotes I mean, particularly the latter one:

                      JSON parsing libraries have been observed to differ as to whether or not they make the ordering of object members visible to calling software. Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences.

                      Left unsaid is that implementations that do depend on or expose member ordering may not be interoperable in that sense. And we know they are still implementations of JSON because of the first sentence there. (“Left unsaid” in that one can infer that anything goes from the first sentence taken with the contrapositive of the second.) Slightly weaselly language like this exists throughout the RFC, including in areas related to string and number comparison. If I understand correctly, while many of those involved wanted to pin down JSON’s semantics somewhat, they could not reach agreement.

                      JSON is an encoding format that’s human-readable, basically ubiquitous, and more or less able to express what most people need to express. These benefits hugely outweigh the semantic hopelessness you point out, I think.

                      You might be right. That “more or less” gives me the heebie-jeebies though, because without semantics, the well-known security and interoperability problems will just keep happening. People never really just use JSON, there’s always some often-unspoken understanding about a semantics for JSON involved. Otherwise they couldn’t communicate at all. (The JSON texts would have to remain uninterpreted blobs.) And where parties differ in the fine detail of that understanding, they will reliably miscommunicate.

                      1. 1

                        Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences.

                        I read this as supporting my interpretation, rather than refuting it. I read it as saying that implementations must be interoperable (i.e. produce equivalent outcomes) regardless of ordering.

                        Slightly weaselly language like this exists throughout the RFC, including in areas related to string and number comparison.

                        Totally agreed! And in these cases, implementations have no choice but to treat the full range of possibilities as possibilities, they can’t make narrower assumptions while still remaining compliant with the spec as written.

                        1. 2

                          Implementations whose behavior does not depend on member ordering […] will not be affected by these differences.

                          It’s a tautology. If you don’t depend on the ordering, you won’t be affected by the ordering. It doesn’t anywhere say that an implementation must not depend on the ordering.

                          The wording is very similar to the wording in sections regarding string comparison, which if I understand you correctly, you believe is an underdefined area. From section 8.3:

                          Implementations that [pick a certain strategy] are interoperable in the sense that implementations will agree in all cases on equality or inequality of two strings

                          Again unsaid: those that don’t may not so agree.

                          1. 1

                            It’s a tautology. If you don’t depend on the ordering, you won’t be affected by the ordering. It doesn’t anywhere say that an implementation must not depend on the ordering.

                            It says that

                            An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings.

                            Meaning, as long as objects keys are unique, two JSON payloads with the same set of name-value mappings must be “interoperable” (i.e. semantically equivalent JSON objects) regardless of key order or whitespace or etc.

                            1. 2

                              No, it says they’ll agree on the name-value mappings. It doesn’t say anything there about whether they can observe or will agree on the ordering - that’s the purpose of the following paragraph, talking about ordering.

                              1. 1

                                Agreeing on name-value mappings is necessarily order-invariant. If this weren’t the case, then the object represented by {"a":1,"b":2} wouldn’t be interoperable with (i.e. equivalent to) the object represented by {"b":2,"a":1} — which is explicitly not the case.

                                1. 2

                                  Where does it say those objects are equivalent?

                                  I put it to you that the RFC does not equate those objects, but says that JSON implementations that choose certain additíonal constraints - order-independence, a method of comparing strings, a method of comparing numbers - not required by the specification will equate those objects.

                                  The RFC is very carefully written to avoid giving an equivalence relation over objects.

                                  1. 1

                                    I understand “interoperable” to mean “[semantically] equivalent”.

                                    If this weren’t the case, then JSON would be practically useless, AFAICT.

                                    It’s not so complicated. The JSON payloads {"a":1,"b":2} and {"b":2,"a":1} must be parsed by every valid implementation into JSON objects which are equivalent. I hope (!) this isn’t controversial.

                                    1. 2

                                      The JSON payloads {“a”:1,“b”:2} and {“b”:2,“a”:1} must be parsed by every valid implementation into JSON objects which are equivalent

                                      Does JavaScript include a valid implementation of JSON? How would we test your assertion above in JavaScript?

                                      My proposal for testing this assertion would be this:

                                      const x = "{\"a\":1,\"b\":2}";
                                      const y = "{\"b\":2,\"a\":1}";
                                      if (JSON.parse(x) == JSON.parse(y)) {
                                        console.log("JSON implementation valid");
                                      } else {
                                        console.log("JSON implementation invalid");
                                      }
                                      

                                      Would you agree that this constitutes a valid test of the assertion?

                                      1. 1

                                        I’m no Javascript expert, so there may be details or corner cases at play in this specific bit of code. But, to generalize to pseudocode

                                        const x = `{"a":1,"b":2}`
                                        const y = `{"b":2,"a":1}`
                                        if parse(x) == parse(y) {
                                            log("valid")
                                        } else {
                                            log("invalid")
                                        }
                                        

                                        then yes I’d say this is exactly what I mean.

                                        edit: Yeah, of course JS defines == and === and etc. equality in very narrow terms, so those specific operators would say “false” and therefore wouldn’t apply. I’m referring to semantic equality, which I guess is particularly tricky in JS.

                                    2. 2

                                      I understand “interoperable” to mean “[semantically] equivalent”. If this weren’t the case, then JSON would be practically useless, AFAICT

                                      Exactly! Me too. I’m saying that every example of interoperability the spec talks about is couched in terms of “if your implementation chooses to do this, …”, i.e. adherence to the letter of the spec alone isn’t enough to get that interoperability. And the practical uselessness - yes, that’s what I believe. It’s fine when parties explicitly contract into a semantics overlaying the syntax of the RFC but all bets are off in cases of middleboxes, databases, query languages etc as far as the standard is concerned.

                                      The JSON payloads {“a”:1,“b”:2} and {“b”:2,“a”:1} must be parsed by every valid implementation into JSON objects which are equivalent. I hope (!) this isn’t controversial.

                                      This is of course a very sensible position, but it goes beyond the requirements of the RFC.

                                      1. 1

                                        This is of course a very sensible position, but it goes beyond the requirements of the RFC.

                                        I read the RFC as very unambiguously requiring the thing that I said, so if we don’t agree on that point, I guess we’ll agree to disagree.

                                2. 2

                                  A nitpick - if we wrote an encoding of a map as [[“a”,1],[“b”,2]] and another with the elements swapped I hope we should agree that the two lists contain the same set of name value mappings. Agreeing on the mappings when keys are disjoint (as required by the spec) is a different relation than equivalence of terms (carefully not defined by the spec), is what I’m trying to say.

                                  1. 2

                                    if we wrote an encoding of a map as [[“a”,1],[“b”,2]] and another with the elements swapped I hope we should agree that the two lists contain the same set of name value mappings.

                                    No, why would they? A name/value mapping clearly describes key: value pairs in an object, e.g. {"name":"value"}, nothing else.

                                    Maps (objects) are unordered by definition; arrays (lists, etc.) are ordered by definition. [["a",1],["b",2]] and [["b",2],["a",1]] are distinct; {"a":1,"b":2} and {"b":2,"a":1} are equivalent.

                                    1. 2

                                      They should be equivalent, on that we agree; but the standard on its own does not establish their equivalence. It explicitly allows for them to be distinguished.

                                      1. 1

                                        The RFC says that implementations must parse {"a":1,"b":2} and {"b":2,"a":1} to values which are interoperable. Of course implementations can keep the raw bytes and use them to differentiate the one from the other on that basis, but that’s unrelated to interoperability as expressed by the RFC. You know this isn’t really an interesting point to get into the weeds on, so I’ll bow out.

                                        edit: that’s from

                                        An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings.

                                        1. 2

                                          I wish you’d point me to where in the RFC it says it “must” parse them identically, but fair enough.

        2. 3

          Yeah, something like this is necessary, but unfortunately there are multiple base64 encoding schemes 🥲 I like straight up hex encoding for this reason. No ambiguity, and not really that much bigger than base64, especially given that this stuff is almost always going through a gzipped HTTP pipe, anyway.

          1. 2

            I’ve done a lot of work in the area of base conversion (for example).

            For projects implementing a base 64, we suggest b64ut which is shorthand for RFC 4648 base 64 URI canonical with padding truncated.

            Base 64 is ~33% smaller than Hex. That savings was the chief motivating factor for Coze to migrate away from the less efficient Hex to base64. To address the issues with base 64, the stricter b64ut was defined.

            Here’s a small Go library that uses b64ut.

            base64 encoding schemes

            Here’s some notes comparing Hex and base 64 and the rational justifying b64ut. and a Github issue concerning non-canonical base 64

            A little more on b64ut

            b64ut (RFC 4648 base 64 URI canonical with padding truncated) is:

            1. RFC 4648 uses bucket conversion and not iterative divide by radix conversion.
            2. The RFC specifies two alphabets, URI unsafe and URI safe, respectively: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ and ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/. b64ut uses the safe alphabet.

            2.1. On a tangent, the RFC’s alphabets are “out of order”. A more natural order, from a number perspective but also an ASCII perspective, is to start with 0, so e.g. 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz would have been a more natural alphabet, regardless, one of the two RFC’s alphabet is employed by b64ut. I use the more natural alphabet for all my bases when not using RFC base 64. 3. b64ut does not use padding characters, but since the encoding method adds padding, they are subsequently “truncated”.
            4. b64ut uses canonical encoding. There is only a single valid canonical encoding and decoding, and they align. For example, non-canonical systems may interpret hOk and hOl as the same value. Canonical decoding errors on the non-canonical encoding.

            There multiple RFC 4648 encoding schemes, and RFC 4648 only uses a single conversion method that we’ve termed a “bucket conversion” method. There is also the natural base conversion, which is produced by the “iterative divide by radix” method. Thankfully, natural and bucket conversion align when “buckets” (another technical term) are full and alphabets are in order. Otherwise, it does not align and encodings are mismatched.

            I made a tool to play play with natural base conversions and the RFC is avaiable under the “extras” tab.
            https://convert.zamicol.com

            Here’s an example converting a binary string to a non-RFC 4648 base 64: https://convert.zamicol.com/#?inAlph=01&in=10111010100010111010&outAlph=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!%2523

            1. 2

              To my eyes, the two alphabets in point 2 in your comment look identical. What am I missing?

              1. 2

                You’re right!

                1: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_

                2: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

                1 being URI safe and 2 URI unsafe.

            2. 1

              Base 64 is ~33% smaller than Hex.

              And what’s the difference when those strings are gzipped (as effectively every such string will be)?

              1. 2

                Gzip isn’t always available and when it is it requires extra processing. Yes, I’d try to use gzip when available.

        3. 1

          JSON is marshaled into UTF-8 which is easily signed or verified.

          1. 1

            Again, UTF-8 doesn’t guarantee what you’re suggesting, here. UTF-8 guarantees properties of individual runes (characters), not anything about the specific order of runes in a string.

            1. 1

              UTF-8 by definition is a series of ordered bytes.

              1. 1

                Yes, for individual characters (runes). And a UTF-8 string is a sequence of zero or more valid UTF-8 characters (runes). But the order of those runes in a string not relevant to the UTF-8 validity of that string.

                1. 1

                  Validity is tangential; the point is order, and UTF-8 is a series of ordered bytes.

                  I believe the following abstraction layer diagram fairly characterizes your view:

                  bits (ordered) -> 
                  byte (ordered) -> 
                  string/bytes (ordered) -> 
                  ASCII/UTF-8 (ordered) -> 
                  JSON (ordered except for object keys) 
                  

                  The jump from UTF-8 to JSON is where some order information may be considered to be lost in the narrow scope of object keys, while acknowledging all the rest of JSON is still ordered, including the explicitly ordered arrays.

                  Order information is present and is passed along this abstraction chain. Order information can only be considered absent after the UTF-8 abstraction layer. At the UTF-8 layer, all relevant order information is fully present.

                  1. 1

                    UTF-8 is a series of ordered bytes

                    This isn’t true in the sense that you mean. UTF-8 is an encoding format that guarantees a valid series of ordered bytes for individual characters (i.e. runes) — it doesn’t guarantee anything about the order of valid runes in a sequence of valid runes (i.e. a string).

                    At the UTF-8 layer, all relevant order information is fully present.

                    Within each individual character (rune), yes. Across multiple characters (runes) that form the string, no. That a string is UTF-8 provides guarantees about individual elements of that string only, it doesn’t provide any guarantee about the string as a whole, beyond that each element of the string is a valid UTF-8 character (rune).

                    Sending JSON payload bytes {"a":1}” does not guarantee the receiver will receive bytes {"a":1} exactly, they can just as well receive { "a": 1 } and the receiver must treat those payloads the same.

                    edit: This sub-thread is a great example of what I meant in my OP, for the record 😞

                    1. 1

                      UTF-8 is a series of ordered bytes. UTF-8 contains order information by definition.

                      That is the point: Order is present for UTF-8. Only after UTF-8 can order information finally start to be subtracted. Omitting order information at the UTF-8 abstraction layer is against UTF-8’s specification and is simply not permitted. Order information can only be subtracted after UTF-8.

                      JSON, by specification, marshals to and from UTF-8. In the very least, we have to acknowledge order information is available at the UTF-8 layer even if it is subtracted for JSON objects.

                      1. 1

                        UTF-8 is a series of ordered bytes. UTF-8 contains order information by definition.

                        You keep repeating this, but it isn’t true in the sense that you mean.

                        See

                        UTF-8 is an encoding for individual characters (runes). It defines a set of valid byte sequences for valid runes, and contains order information for the bytes comprising those valid runes. It does not define or guarantee or assert any kind of order information for strings, except insofar as a UTF-8 string is comprised of valid UTF-8 runes.

                        That JSON marshals to a UTF-8 encoded byte sequence does not mean that UTF-8 somehow enforces the order of all of the bytes in that byte sequence. Bytes in individual runes, yes; all the bytes in the complete byte sequence, no.

                        Order is present for UTF-8. Only after UTF-8 can order information finally start to be subtracted. Omitting order information at the UTF-8 abstraction layer is against UTF-8’s specification and is simply not permitted. Order information can only be subtracted after UTF-8.

                        I’m not sure what this means. UTF-8 asserts “order information” at the level of individual runes, not complete strings.

                        In the very least, we have to acknowledge order information is available at the UTF-8 layer even if it is subtracted for JSON objects.

                        UTF-8 does not provide any order information which is relevant to JSON payloads, except insofar that JSON payloads can reliably assume their keys and values are valid UTF-8 byte sequences.

                        1. 1

                          If UTF-8 was not ordered, the letters in this sentence would be out of order as this sentence itself is encoded in UTF-8.

                          UTF-8 by definition is ordered. This is a fundamental aspect of UTF-8. There’s nothing simpler that can be said because fundamental properties are the simplest bits of truth: UTF-8 is ordered. UTF-8 strings are a series of ordered bytes.

                          UTF-8 is a string. Order is significant for all strings. All strings are a series of ordered bytes.

                          UTF-8 does not provide any order information which is relevant to JSON payloads

                          Yes, it has order information.

                          JSON inherits order, especially arrays, from the previous abstraction layer, in this case, UTF-8. If this were not the case, how is order information known to JSON arrays, which are ordered? Where is the order information inherited from if not from the previous abstraction layer?

                          Edit:

                          UTF-8 asserts “order information” at the level of individual runes, not complete strings.

                          That is incorrect. UTF-8 by definition is a series of ordered bytes, which is the definition of a string. UTF-8 already exists in that paradigm. It does not need to further confine a property it already inherits. UTF-8 is a string encoding format.

                          1. 2

                            UTF-8 is a string encoding format.

                            https://en.wikipedia.org/wiki/UTF-8

                            UTF-8 is a variable-length character encoding standard used for electronic communication.

                            JSON inherits order, especially arrays, from the previous abstraction layer, in this case, UTF-8. If this were not the case, how is order information known to JSON arrays, which are ordered? Where is the order information inherited from if not from the previous abstraction layer?

                            The order of JSON arrays is part of the JSON specification. It’s completely unrelated to how JSON objects are marshaled to bytes, whether that’s in UTF-8 or any other encoding format.

                            Is the order of fields in a CSV file “inherited from” the encoding of that file?

                            If UTF-8 was not ordered, the letters in this sentence would be out of order as this sentence itself is encoded in UTF-8.

                            At this point I’m not sure how to respond in a way that will be productive. Apologies, and good luck.

                            1. 1

                              character encoding standard

                              Is in the context of strings. JSON doesn’t define UTF-8 as it’s encoding format for a single character. JSON defines UTF-8 as the character encoding format for strings. Strings are ordered. The entirety of UTF-8 is defined in the context of string encoding.

                              The order of JSON arrays is part of the JSON specification

                              When parsing a JSON array, where is the array’s order information known from? Of course, the source string contains the order. JSON parsers must store this order information for array as required by the spec. JSON inherits order from the incoming string.

                              1. 2

                                JSON defines arrays as ordered, and objects as unordered. The specific order of array elements in a JSON payload is meaningful (per the spec) and is guaranteed to be preserved, but the specific order of object keys is not meaningful and is not guaranteed to be preserved.

                                1. 1

                                  When JSON is unmarshalled from a string, where does an array’s order information come from? Does it come from the incoming string?

                                  1. 2

                                    When JSON is unmarshalled from a string, where does an array’s order information come from? Does it come from the incoming string?

                                    Yes, it does. But the important detail here is that JSON arrays have an ordering, whereas JSON maps don’t have an ordering. So when you encode (or transcode) a JSON payload, you have to preserve the order of values in arrays, but you don’t have to preserve the order of keys in objects.

                                    If you unmarshal the JSON payload {"a":[1,2]} to some value x, and the JSON payload {"a":[2,1]} to some value y of the same type, then x != y. But if you unmarshal the JSON payload {"a":1,"b":2} to some value x, and the JSON payload {"b":2,"a":1} to some value y of the same type, then x == y.

                                    Coze models the Pay field as a json.RawMessage, which is just the raw bytes as received. It also produces hashes over those bytes directly. But that means different pay object key order produces different hashes, which means key order impacts equivalence, which is no bueno.

                                    1. 1

                                      You can’t have it both ways. You can’t argue for JSON being both the pure abstract form and also a concrete string. JSON is not a string, JSON is an abstraction that’s serialized into a string; I agree with that. The abstract JSON is parsed from a concrete string, and strings carry order information. Obviously JSON is inheriting order from the abstraction layer above, which in this case is string (ITF-8). The order is there as shown arrays being ordered.

                                      When JSON is parsed from UTF-8, it is now in an abstract JSON form. When it’s serialized into UTF-8, it’s not the abstract JSON, it is now a string. It’s not both. I don’t see any issue categorizing JSON as a pure abstraction, however, the abstraction is solidified when serialized.

                                      JOSE, Matrix, Coze, PASETO all use UTF-8 ordering, and not only does it work well, but it is idiomatic.

                                      These tools do not verify or sign JSON, it signs and verifies strings, a critical distinction. After that processing, it may then be interpreted into JSON. These tools are a logical layer around JSON, and the JSON these tools processes, is JSON. In the example of Coze, not all JSON is Coze, but all Coze is JSON. That’s a logical hierarchy without hint of logical conflict. As I like to say, that makes too much sense.

                                      I fully acknowledge your “JSON objects are unordered” standpoint, but after all this time I have no hesitation saying it’s without merit. Even if that’s were the case, in that viewpoint these tools are not signing JSON, they’re signing strings. All cryptographic primitives sign strings, not abstract unserialized formats. And that too is no problem, far better, JSON defines the exact serialization format. That’s the idiomatic bridge permitting signing. It’s logical, idiomatic, ergonomic, it works, but most of all, it’s pragmatic.

                                      If JSON said in it’s spec, “JSON is an abstract data format that prohibits serialization” this would be a problem. But what use would be such a tool? If JSON said, “JSON objects are unordered and the JSON spec prohibits any order information being transmitted in its serialized form” that too would be a problem, but why would it ever have such a silly prohibition? To say, “can’t sign JSON because it’s unordered” is exactly that silly prohibition.

                                      1. 1

                                        When JSON is parsed from UTF-8, it is now in an abstract JSON form. When it’s serialized into UTF-8, it’s not the abstract JSON, it is now a string. It’s not both. I don’t see any issue categorizing JSON as a pure abstraction, however, the abstraction is solidified when serialized.

                                        My understanding of your position is: if user A serializes a JSON object to a specific sequence of (let’s say UTF-8 encoded) bytes (or, as you say, a string) and sends those bytes to user B, then — no matter how they are sent — the bytes that are received by B can be safely assumed to be identical to the bytes that were sent by A.

                                        Is that accurate?

                                        This assumption is true most of the time, but it’s not true always. How the bytes are sent is relevant. Bytes are not just bytes, they’re interpreted at every step along the way, based on one thing or another.

                                        If JSON serialized bytes are sent via a ZeroMQ connection without annotation, or over raw TCP, or whatever, then sure, it’s reasonable to assume they are opaque and won’t be modified.

                                        But if they’re sent as the body of an HTTP request with a Content-Type of application/json, then those bytes are no longer opaque, they are explicitly designated as JSON, and that changes the rules. Any intermediary is free to transform those bytes in any way which doesn’t violate the JSON spec and results in a payload which represents an equivalent abstract JSON object.

                                        These transformations are perfectly valid and acceptable and common, and they’re effectively impossible to detect or prevent by either the sender or the receiver.

                                        JOSE, Matrix, Coze, PASETO all use UTF-8 ordering, and not only does it work well, but it is idiomatic.

                                        The JSON form defined by JOSE represents signed/verifiable payloads as base64 encoded strings in the JSON object, not as JSON objects directly. This is a valid approach which I’m advocating for.

                                        Matrix says

                                        Signing an object … requires it to be encoded … using Canonical JSON, computing the signature for that sequence and then adding the signature to the original JSON object.

                                        Which means signatures are not made (or verified) over the raw JSON bytes produced by a stdlib encoder or received from the wire. Instead, those raw wire bytes are parsed into an abstract JSON object, that object is serialized via the canonical encoding by every signer/verifier, and those canonical serialized bytes are signed/verified. That’s another valid approach that I’m advocating for.

                                        The problem is when you treat the raw bytes from the wire as canonical, and sign/verify them directly. That isn’t valid, because those bytes are not stable.

                                        1. 1

                                          Coze speaks to Coze. Coze is JSON, JSON is not necessarily Coze. Coze is a superset, not a subset. Coze explicitly says that if a JSON parser ignores Coze, and does an Coze invalid transformation, that coze may be invalid.

                                          This is true for JOSE, Matrix, Coze, PASETO

                                          https://i.imgur.com/JYS7SFI.png

                                          The JSON form defined by JOSE represents signed/verifiable payloads as base64 encoded strings in the JSON object,

                                          Incorrect. There’s no logical difference between encoding to UTF-8 or base 64.

                                          This exactly is the mismatch. Since “JSON objects don’t define order” any JWT implementation may serialize payloads into any order. Base 64 isn’t a magic fix for this.

                                          Of course, all implementations serialize into an order. That’s what serialization does by definition. And it doesn’t matter what the serialization encoding is, by definition, any serialization performs exactly this operation.

                                          It’s so obvious, so foundational, so implicitly taken from granted, that fact is being overlooked.

      8. 1

        Regarding signing JSON, Peter and I have had a discussion going since March of this year.

        I think it’s fair to say of Peter’s position is that he’s concerned about signing JSON.

        Our position is signing JSON is not problematic at all. We sign JSON (Coze) without incident using simple canonicalization, which is straightforward and easy to implement (Go implementation and Javascript implementation.

      9. 1

        Do you have a recommendation for a (relatively) painless serialization format that is bijective without having to jump through too many hoops?

        1. 5

          Doesn’t CBOR provide this, as mentioned in this comment by @peterbourgon ?

          https://lobste.rs/s/wvi9xw/why_not_matrix#c_eh9ogd

          1. 1

            Yeah, it’s probably as good as it gets. I guess I still need to sort maps manually, and be careful which types I use, in order to get the same output for equivalent input data, but I might be misremembering things. I’ll have another look at the details, I remember that dag-cbor was pretty close to what I needed when I looked last time, but it only allows a very limited set of types.

        2. 4

          It’s really hard! Bijectivity itself is easy, just take the in-memory representation of a value, dump the bytes to a hex string, and Bob’s your uncle. But that assumes two things (at least) which probably aren’t gonna fly.

          First, that in-memory representation is probably only useful in the language you produced it from — and maybe even the specific version of that language you were using at the time. That makes it impractical to do any kind of SDK in any other language.

          Second, if you extend or refactor your type in any way, backwards compatibility (newer versions can use older values) requires an adapter for that original type. Annoying, but feasible. But forwards compatibility (older versions can use newer values) is only possible if you plan for it from the beginning.

          There are plenty of serialization formats which solve these problems: Thrift, Protobuf, Avro, even JSON (if you squint), many others. But throw in bijective as another requirement, and I think CBOR is the only one that comes to mind. I would love to learn about some others, if anyone knows of some!

          But it’s a properly hard problem. So hard, in fact, that any security-sensitive projects worth its salt will solve it by not having it in the first place. If you produce the signed (msg) bytes with a stable and deterministic encoder, and — critically — you send those bytes directly alongside the signature (sig) bytes as values in your messages, then there’s no ambiguity about which bytes have been signed, or which bytes need to be verified. Which means you can use whatever encoder you want for the messages themselves — JSON can re-order fields, insert or remove whitespace between elements, etc., but it can’t change the value of a (properly-encoded) string. And because you don’t need to decode the msg bytes in order to verify the sig, you don’t need full bijectivity, in either encoder.

        3. 2

          https://preserves.dev/ (Disclaimer: it’s something I started)

          1. 2

            Thanks!, This looks quite interesting! I’ll have a play with the Rust bindings and see what it can do. I haven’t looked in detail yet, but it looks like it plugs into serde, so it should be easy and cheap to try it out.

        4. 1

          I consider Coze’s approach as simple.

      10. 1

        We sign JSON and it works just fine.

        Coze uses strict base 64 encoding and canonicalization. That’s all that’s needed to make JSON and signing work.

        In Coze, the canonical form is generated by three steps:

        1. Omit fields not present in canon.
        2. Order fields by canon.
        3. Omit insignificant whitespace.

        That’s it.

        JSON + Canonicalization allows signing/verification. Canonicalization is the key.

    2. 20

      Why not punctuation? (/snark)

      I found Matrix usability to be quite frustrating.

      • Relentless spam in channels, that is hard to combat with people being blocked and coming back repeatedly.
      • A brutal user experience for using encrypted channels on multiple devices - the upstream matrix developers are aware of this. But if I login to Matrix, I get nagged about encrypted channels, and the user interface for accessing them is complete bobbins.
      • Bridges that only benefit the Matrix side of the bridge, and make all other communication tools objectively worse when a Matrix bridge is added. Every IRC, Telegram, Discord, or whatever channel I’ve been on has been degraded when Matrix bridges are added. Forcing people to leave, or forcing people over to Matrix.
      1. 9

        Relentless spam in channels, that is hard to combat with people being blocked and coming back repeatedly.

        I find this unsurprising but interesting to hear, after Mozilla’s stated big reason for shutting down its IRC server in favor of Matrix was that they thought Matrix would be easier to moderate.

        Bridges that only benefit the Matrix side of the bridge, and make all other communication tools objectively worse when a Matrix bridge is added. Every IRC, Telegram, Discord, or whatever channel I’ve been on has been degraded when Matrix bridges are added.

        I’m an IRC holdout in channels that are bridged with Matrix. Mostly it seems to work fine, and I’ve only ever seen it cause problems for the Matrix users and not for us IRC holdouts. ¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯

        1. 5

          I find this unsurprising but interesting to hear, after Mozilla’s stated big reason for shutting down its IRC server in favor of Matrix was that they thought Matrix would be easier to moderate.

          I can’t speak to Mozilla’s reasoning but IME the spam is more pernicious in Matrix. One issue is that you block people by user@homeserver whereas on IRC you can +b *!*@someaddr to target a specific person. Sure, IP blocks are imperfect but in practice it seems easier for a malicious human to get another open signup on some homeserver than a new IP address.

          Surprisingly enough there isn’t a room permission concerning who is allowed to upload images or attachments, so anybody can do it. Shock images just show right up rather than hiding behind a link. Zip files whose filenames allege that they contain very illegal content will have links just sitting there, hosted not on some dodgy domain but proxied via your own homeserver. If I had to pick, I’d take the IRC spam.

          1. 5

            Surprisingly enough there isn’t a room permission concerning who is allowed to upload images or attachments, so anybody can do it.

            You can’t even block them altogether. I wrote a spec to do that, but my implementation of the backend part is stuck in code review since March, and I haven’t even tried to write the frontend yet :/

          2. 1

            Knocking is in the spec and soon in element clients if I’ve understood correctly. That can help some with spam.

      2. 7

        I don’t see a lack of punctuation, just capital letters - a common cause of RSI. Carolingian minuscule only used capital letters for arbitrary emphasis: the “rules” are baseless and ignoring them therefore valid. I only use capitals here because otherwise people bias against what you’ve written (I actually stopped upcasing letters and correcting typos when I noticed Ken Thompson, Rob Pike et al. didn’t bother in emails and such. If they don’t care, why should I?).

        1. 16

          the “rules” are baseless and ignoring them therefore valid. I only [follow the rules] because otherwise people bias against [you]

          This is pretty much the fundamental principle of human communication. The rules may all be made up, but if you are unable or unwilling to follow them then you shouldn’t be too surprised when people don’t want to communicate with you.

          1. 4

            While I agree that effective communication requires (some) consensus, I don’t see how choice of capitalization affects understanding. I may find all-lowercase prose aesthetically objectionable, but that doesn’t mean that I can’t read something written in lowercase.

            I feel like a lot of prescritpivist arguments take for granted the notion that prose which defies convention is inherently less understandable. That part of their high ground is that you must do things As They Should Be Done to be understood. On the contrary I find prescriptivist arguments tend to concern prose that is understood, but looks bad. Only rarely do you get things that matter in some edge cases, e.g. the Oxford comma.

            1. 17

              The article provides some immediate examples - sentences like:

              first of all, a quick primer on what matrix actually is. even though element market matrix as the foundation of a chat app, it’s far more complicated than that.

              I literally had to go back and read multiple times because:

              • with no capital on “even” I mistook the full stop for a comma and misparsed the second sentence.
              • no capitals on proper nouns like Matrix and Element meant I had to do extra work to realise these were proper nouns. I literally got lost in what looked like a sea of normal nouns: “element market matrix”.

              If course, capitals don’t remove all ambiguities. But every little helps.

              I then decided that reading the article wasn’t worth it. There are other things to read that don’t make me do this extra work.

              1. 4

                I’m pretty sure that’s a typo; “element markets matrix” would make it clear as day.

                1. 8

                  It could be treating Element as a plural entity - https://english.stackexchange.com/questions/133105/organisation-singular-or-plural - as a Brit, “Element market Matrix” doesn’t seem wrong and wouldn’t have tripped me up.

                2. 3

                  An element markets matrix sounds like something a quantitative analyst at a hedge fund might use in an their code.

              2. 2

                Thank you! I was wondering what “element market matrix” meant. I assumed it was a typo.

            2. 4

              Oh, sure. I wasn’t attacking the choice of all-lowercase specifically, and I wasn’t trying to be prescriptive; I meant to point out that, pragmatically, if your communication style differs enough from others’ then you may have trouble getting them to listen to you. It’s irrelevant whether or not there’s an objective basis for the communication rules. These things evolve organically; it’s some kind of category error to treat them as if there’s an RFC somewhere listing the things that you MUST do when communicating in English.

              1. 4

                I think we agree on all points (presumably including aesthetics given how we’re writing these comments). I mostly am trying to say that as a society we tend to overvalue the importance of aesthetics in communication when the message can still be well understood. And I think we can do better.

                It would be nice if we could separate the message from the format so to speak, rather than tutting quietly when we see comic sans in a presentation.

                That said, I am being a little hypocritical here. I am guilty of disliking all-lowercase prose, word art, the use of the phrase “comprised of,” and so on.

            3. 3

              I mostly agree with you, but I do think it’s true that defiance of convention is inherently paying an understandability cost, at least among those who are used to the convention. Of course it can make up for that cost in other ways, including by being easier to write, or by being easier to read by those who are not used to the convention.

              Understandability isn’t a binary thing; interpretive labor is required for any communication. You can make your writing easier or harder for certain audiences to understand, partly by making choices about which conventions to follow.

              Այս մեկնաբանության մնացած մասը տեղադրել եմ հայերեն: Տարօրինակ ընտրություն է, բայց դա ընտրություն է, որը ես ազատ եմ կատարել: Եթե միալեզու հայերեն խոսող լինեիք, գուցե երախտապարտ կլինեիք, որ հեշտացրի ձեզ կարդալ սա:

              1. 4

                Այս մեկնաբանության […]

                I had to use Googe Translate’s autodetect feature to figure out that this was Armenian….

                It reminds me of the time when Armenia decided to change their DST schedule (to align with the big Russian Federation shift to permanent summer time) and posted it as an official government pronouncement in Armenian, with no translation. IIRC it took a while for the TZ maintainers to cotton on that the change was coming.

        2. 9

          Besides English, I also read Hindi and a bit of Punjabi. The scripts used to write Hindi and Punjabi – Devanagari and Gurumukhi, respectively – have no notion of letter casing. Most of the time this is not a problem. However, I sometimes come across unfamiliar words in Hindi/Punjabi that I can’t figure out how to parse, most often in technical or political writing, and it’s here that not having a notion of letter casing hurts my reading comprehension.

          Is that unfamiliar word the name of a person? A place? A brand? A language? Somebody’s title? The name of day or month or time period? Or just a word I haven’t come across before?

          So I think that in modern English written using the Latin script, letter cases carry a lot of meaning and aid comprehension. The rules might seem arbitrary if English is all you read, but they make a lot of sense. I love Hindi and Punjabi as much as I love English – and in some cases those other languages have more sensible defaults compared to English – but letter casing is one area where English has them beat.

          1. 2

            Another kind of semantic typography is using italics for foreign words, tho as I am writing this comment it occurs to me that it’s less common than it used to be, maybe? I know almost nothing about Japanese, but I gather that katakana is used in a similar way to italics.

            One of the things I don’t like about HTML is replacing the <i> tag with the <em> tag, because not all uses of italics are for emphasis.

            1. 3

              not all uses of italics are for emphasis.

              That is precisely why <i> was never removed from HTML. It was never even deprecated.

              1. 1

                True, though there was a big debate about it http://lachy.id.au/log/2007/05/b-and-i

            2. 2

              Funny you should mention that. Hindi and Punjabi don’t do italics either. You can italicize your words if you’re typing on a computer, but you never see italics in print.

            3. 1

              Surely you can still use <i> even if it’s frowned upon semantically?

              (Cue someone demanding semantic markup for titles, foreign words etc etc)

              1. 4

                <i> is the semantically correct element for titles and foreign words. It’s not frowned upon at all.

                1. 1

                  Certainly this is true for foreign words, but the element for titles of books etc. is cite. (Titles of persons have no semantic markup.)

              2. 2

                That is, in fact, what I do :-) if I am actually writing raw HTML which is pretty rare. Tho similar halfarsed “semantic” markup tends to occur in other formats too, sigh.

        3. 9

          i have a similar story!

          i read a lot of poetry, and always noticed that poems felt warmer and less formal when written in all-lowercase - some of my favorite poets frequently write in lowercase. bukowsky, ee cummings.

          it requires more careful composition, because Capital Letters serve as sentence indicators, but i find that breaking up text and speaking concisely in lowercase offers a superior rhythm. it also enhances capitalization’s use as an emphasis tool, rather than a tool used out of strict adherence to rule.

          writing is a fluid process & choosing a style is valid. i (a ~30 year old person) have been mistaken for a teenager many times because of my writing style. perhaps writing in lowercase is a font of youth! ;)

          1. 4

            perhaps writing in lowercase is a font of youth!

            This is accurate. If you type with capital letters and complete punctuation and it hasn’t been set in stone as your style of typing, zoomers will think you are weird, or angry at them.

            1. 5

              Even if it is your style of typing they may think you’re angry at them, I’ve found… and I’m not even much older than them.

              I go back and forth, and it’s often context-dependent. On IRC I’ve never heard of a shift key, ever. Over SMS and Messenger and Matrix and etc. it often depends on what device I’m writing from (phone auto-capitalizes, desktop of course does not) and what mood I’m in/how “proper” of a message I’m writing. In Slack it depends: DMs are always lowercased, company-wide blasts in public channels are written more like blog posts where I use Full Punctuation Minus This Style Of “Proper Noun”-esque Capitalization as emphasis.

              And that, friends, is how you really weird out the folks under about 25 /s (I don’t actually mind their all-lowercase ways; I sit dead-smack on the millennial-genz cusp and associate mostly with the older side of it, but whatever, do what makes you happy. That said I still don’t understand much of the slang vernacular of the folks who solidly fit into the Zoomer years - like age 21 and under. I’ve finally reached “get off my lawn” years, eh?)

              1. 11

                To me, it’s simple politeness. Humans read by doing pattern recognition through a neural network. The rules for sentence structure and grammar are somewhat arbitrary (in English more so than many human languages), but having a set of rules improves the pattern matching. If you use uncommon abbreviations or slang, don’t punctuate properly, or don’t capitalise, then you are saving yourself some effort at the expense of the reader. That is the fundamental core of bad manners: deciding that saving you time is worth costing someone else time. This is far worse in broadcast or multicast communication media, where the cost to the reader is multiplied by a large number of readers.

                If someone doesn’t put the effort in then this tells me either that they don’t care about the impact that it has on their readers or that they haven’t thought about it. Neither reflects well on them. There have been a lot of studies on the impact of punctuation, capitalisation and correct grammar on reading speed and comprehension, so it’s not like this is an area where you can just say that it doesn’t matter. The impact is often much large on non-native speakers and people on various neurodiversity axes.

              2. 3

                As a zoomer I can relate to that. I mean, look at me now forming Proper Sentences with Proper Punctuation.

                Capital letters serve as anchor points in large bodies of text, improving readability. Hence why they’re good in Lobsters comments or blog posts, where you usually elaborate on a topic in many, many sentences with some amount of fluff to make yourself seem more serious. (yes, this is a self-deprecating joke.)

                In my treehouse (website) I tend to avoid capital letters when starting sentences to create a more informal and friendly atmosphere. They wouldn’t improve readability that much anyways due to the tree-like structure, with each branch being maybe a sentence or two long.

                What I can add to the discussion is that I’m bilingual: I speak Polish and English, and Polish has diacritics. Just like capital letters, I tend to omit those when DMing friends (or when I’m too lazy to tap-and-hold letters on my phone; I don’t use autocorrect or auto-capitalization because it annoys me.) In Polish this can create some ambiguities - in zoomer circles we often joke about “sąd” vs “sad” which is “court” and “orchard” respectively. “spotkamy sie w sadzie” - did they mean “we’ll meet in court” or “we’ll meet in the orchard”?

                I do use diacritics at work though, because effective, unambiguous communication there is much more important than when DMing friends about silly things.

            2. 3

              zoomers

              As a millenial, we acted the same way with AIM, ICQ, YIM over capitalization + puncuation. I would venture as you mature this changes since a) you have to type with coworkers, bosses, non-friendlies so it forms a habit & b) a made you realize what you said is largely easier to understand to a broader audience.

              A possible addition/alternative could be that phone keyboards generally capitalize automatically & you got used to the aesthetic.

        4. 4

          Many scripts aren’t bicameral either. Those are limited to Latin, Cyrillic, Greek, Coptic, Armenian, Glagolitic, Adlam, Warang Citi, Cherokee, Garay, Zaghawa, Osage, Vithkuqi, Old Hungarian, and Deseret scripts.

          Interestingly a lot of programming languages differentiate code based on casing–which is fine for reading in these scripts, but would prevent usage of other scripts in code that should otherwise be considered valid (which is to say a language like PureScript can allow you to only have variable names in the aforementioned scripts, otherwise without a case, it can’t know if it’s a variable foo versus type Foo).

          In written languages tho, English’s capitalization of proper nouns quickly let you know it’s not vocab you need to know but just a name; when I read Thai, I have to ask what words mean just to be told, “it’s just a name” which does make reading harder. Andit’seventrickiertoreadsincethereisneitherspacesbetweenwordsnorpunctuation (altho folks are supposed to use zero-width spaces, almost no one does since you can’t see them and mainly helps spelling correction, newlining, etc.; spaces separate clauses).

          1. 1

            Interesting. Thank you.

            a language like PureScript can allow you to only have variable names in the aforementioned scripts, otherwise without a case, it can’t know if it’s a variable foo versus type Foo

            Doesn’t it have a type-level subsystem that’s Turing complete? So then you don’t need variables.

      3. 3

        imo matrix introducing end-to-end encryption was a huge mistake. building a distributed modern chat platform is hard enough without having to worry about the insane amount of complexity e2ee represents. that complexity is also passed down to clients, which must implement the e2e spec: https://spec.matrix.org/v1.8/client-server-api/#end-to-end-encryption

        read through the spec - it’s huge and plum full of caveats. and in the end, your matrix server provider could easily drop a little javascript into your client session to exfiltrate your messages anyways.

        it reminds me of “end to end encrypted email”, which is mostly a sham too. all of the security in the world doesn’t matter when you’re ultimately trusting a service provider. the only way to assure real end-to-end security is to do it yourself (i use age these days, but you might use PGP if you’re a massochist).

        EDIT: i say this as an avid user & maintainer of the cyberia.club matrix system for ~4 years. i like matrix & want it to flourish.

        1. 4

          Are there actually multiple implementations of E2EE matrix clients? Last I checked there was one half-baked-looking weechat plugin that was impossible to compile, and Element.

          1. 6

            It all just works for me between Cinny, FluffyChat and Fractal.

            1. 3

              same with nheko on top

    3. 14

      I guess I’m very late to the party here - sorry for not spotting it earlier. I’ll try to quickly cover the points raised:

      • Canonical JSON isn’t the disaster it’s made out to be. The “the spec doesn’t actually define what the canonical json form is strictly” statement is just false: https://spec.matrix.org/v1.8/appendices/#canonical-json is the link. Sure it can be frustrating to find that different language’s JSON emitters are hard to canonicalise (and we certainly had some dramas back when we allowed floats in Canonical JSON, given different precision etc), but it’s a wart rather than a catastrophe. The Dendrite devs fixed their bugs on this years ago (although one of them still likes to kvetch about it). In future we can and will switch to a canonical binary format (eg the MIMI IETF work is rather quaintly fixated on using TLS Presentation Layer as a binary format).

      • Decentralised rooms are a feature, not a bug. Just like decentralised VCS repositories are a feature of Git and friends. Yes, this means that buggy implementations can get splitbrained, and rooms can be splitbrained due to netsplits, but these days problematic splits are very rare indeed. We fixed the primary mistakes which caused these over 5 years ago: https://github.com/matrix-org/matrix-spec-proposals/blob/f714aaadd011ac736d779f8460202a8d95799123/proposals/1442-state-resolution.md. The complaints about “not being able to guaranteed to delete data in a decentralised system!” are asinine, obviously: there is no way to force data to be deleted on other folks’ computers, short of utterly evil DRM.

      • In terms of room memberships not being deletable: this is mitigated by MSC4014, which provides per-room pseudo IDs so that the memberships pseudonymous, so it doesn’t necessarily matter much that you can’t delete them. This has now implemented in Dendrite.

      • Meanwhile, other state can be deleted by upgrading the room and discarding the previous version of the room (upgrading rooms between versions is a fairly common operation, albeit one we need to make more seamless). We’re also working on encrypting room state e2ee via MSC3414, which then makes the lack of finegrained deletion less important.

      • Stuff about the DAG being hard to linearise because it is deliberately allowed to be split into discontiguous chunks is just not true, and misses one of the nicest bits of Matrix: that you don’t have to replicate the full DAG to participate in a room (thus allowing fast lazyloaded joins etc). The “depth” parameter is obsolete since 2018 and is marked as such in the SS spec, and tiebreaking on forgeable timestamps is only done as a totally arbitrary, non-security related deterministic tiebreaker.

      • Similarly, the fact remote servers can send old messages into a room (which may or may not be “fake”; who’s to say?) is a feature. Just like it’s a feature for email queues to be able to be flushed, or for users to send email from 1971 if they try hard enough.

      • A valid criticism (at last) is that E2EE is fragile. We’re fixing this both by making the current implementations more robust (adopting the decent and audited matrix-rust-sdk implementation) - and reworking how devicelists work in the context of MLS and MIMI in the long run. Definite mea culpa on this one.

      • Another valid criticism is about lack of authed media and the fact that remote media gets cached on a user’s server when they view it. We’re working on this currently (despite the accusations that we’re ignoring GH issues…)

      • Finally; yes, moderation needs more work. Not for the reason mentioned here (state resets causing moderation problems are incredibly rare since we fixed state resnin 2018), but because we need things like IP or CIDR based banning still, and better tooling for moderators rather than server admins. We’ve just added someone from the synapse team to work fulltime on moderation tooling (a few weeks ago) though, so expect progress there. Also, the Mjolnir moderation bot (while feeling alarmingly eggdroplike) does work pretty well - but it’s not exactly massmarket yet.

      Hope this provides some clarity on the mix of questionable and legit points raised in the post. Sorry if any of it is incoherent; has been written off the top of my head on mobile.

      1. 2

        The “the spec doesn’t actually define what the canonical json form is strictly” statement is just false: https://spec.matrix.org/v1.8/appendices/#canonical-json is the link. Sure it can be frustrating to find that different language’s JSON emitters are hard to canonicalise …

        That link says that e.g.

        Numbers in the JSON must be integers in the range [-(2**53)+1, (2**53)-1], represented without exponents or decimal places, and negative zero -0 MUST NOT appear.

        which suggests that the subsequent example

        {
           "a": -0,
           "b": 1e10
        }
        

        should fail to parse. But instead, it’s said that from that JSON payload

        The following canonical JSON should be produced:

        {"a":0,"b":10000000000}
        

        The literal 1e10 may represent the same value as the literal 10000000000, so maybe we can infer that exponents and decimal places can be allowed as input, so long as the output transforms them to a valid value per the spec. But the literal -0 does not represent the same value as the literal 0, they are different values. An implementation that rejected input with a number literal -0 would be reasonable; why should it be transformed to 0? And what about the number literal 1.0? Should it be rejected, or transformed to 1? What about 1.00000001?

        This is why I say that the spec is at least somewhat ambiguous.

      2. 1

        The complaints about “not being able to guaranteed to delete data in a decentralised system!” are asinine, obviously: there is no way to force data to be deleted on other folks’ computers, short of utterly evil DRM.

        Technical details aside, do people not have the right to delete stuff they’ve created?

        1. 3

          Of course people have the right to delete stuff they’ve created! They’re even guaranteed it under GDPR. Which is why Matrix supports deleting messages (aka redactions), and why all well-behaved Matrix implementations uphold deletions, as explained below: https://lobste.rs/s/wvi9xw/why_not_matrix#c_2eqdof.

          The quibbling point from the original post is that you can’t guarantee that there aren’t malicious servers in the room who will ignore deletion requests. Just as you can’t guarantee that there aren’t malicious users busily publishing screenshots of your undeleted conversations too.

          1. 2

            Good to hear!

            The quibbling point from the original post is that you can’t guarantee that there aren’t malicious servers in the room who will ignore deletion requests. Just as you can’t guarantee that there aren’t malicious users busily publishing screenshots of your undeleted conversations too.

            Couldn’t you do a kind of check?

            Say someone sends a (valid) delete request to server S1 for entity ID 123. The server S1 dutifully deletes entity ID 123, and (verifiably) forwards that delete request to peer servers S2 and S3. After some time, S1 could query S2 and S3 for entity ID 123, or for some higher order collection that would have included entity ID 123, if it still existed. If the response included entity ID 123, then the corresponding server would be marked as malicious, penalized, and, eventually, maybe, removed from the network altogether.

            edit: But, yeah, this reflects a core problem with decentralized systems, I guess! Users both want and need to delegate their trust to a well-defined authority, so that issues like this one (and others) can be authoritatively resolved.

            1. 1

              To be fair, a hostile server could lie about this to you while still keeping the record for the nasties.

              But yes, that’s a great idea, and something Matrix home servers could definitely do to at least deal with bad implementations.

              In my experience though, Matrix doesn’t even seem to support deleting events on the same home server. Now maybe that’s my client not implementing redaction correctly, or it’s the server, but that’s exactly the problem Matrix is constantly facing here: it’s moving fast and changing fundamentals all the time. Clients and servers are barely catching up and there isn’t actually a end-to-end imlementation that fixes all the issues that @arathorn described here (e.g. that MSC being implemented in Dendrite and not Synapse). In my case, it was FluffyChat failing to redact an event on a Synapse server, but I suspect the level of chaos out there is much worse than being described here.

              1. 1

                To be fair, a hostile server could lie about this to you while still keeping the record for the nasties.

                Sure. At the end of the day, once you send data to a node you don’t control, it’s, well, outside of your control. Anything you try to do to control it is gonna be best-effort at, well, at best.

                issues … chaos

                Of course! Opting in to decentralization necessarily means opting out of anything that relies on a central authority. Deletable content is one example — no decentralized system can ever provide true deletes just by definition — but there are countless others. No point trying to paper over the chaos!

      3. 1

        Similarly, the fact remote servers can send old messages into a room (which may or may not be “fake”; who’s to say?) is a feature. Just like it’s a feature for email queues to be able to be flushed, or for users to send email from 1971 if they try hard enough.

        There’s so much to unpack here…

        First, wait what? It’s “a feature […] for users to send email from 1971 if they try hard enough”?? What possible use case is that supposed to represent? As someone who’s been running mail servers for decades (but not since 1971, thank god), I sure wish we could just say “sorry, you can’t send 10 year old emails” or something. That’s a ludicrous feature to have. And yes, I do think offline access is a valuable feature, just not that we should be allowed to inject supposedly 50 year old messages in a stream and expect things to make any sort of sense.

        Also, I find this sort of tone concerning:

        (which may or may not be “fake”; who’s to say?)

        well… surely we should have some way of authenticating users somewhere, somehow, no? Isn’t that what E2EE supposed to cover?


        To take a step back here, I find Matrix’s approach to those problems and critiques in general to be quite cavalier and, to a certain extent, paternalistic in the sense that you assume we don’t know what we’re talking about.. For years now I’ve been hearing similar responses “yes, E2EE is fragile, my bad, gotta fix this soon pinky promise”, “moderation needs work”, “there’s a MSC implemented in Dendrite that fixes everything, no biggie”, I’m sorry, but that just doesn’t fly for me anymore.

        Just last week I sent myself an attachment from FluffyChat on Android to my own avatar from OFTC, in the feeble hope I might have been able to get a HTTP link for a file in a print shop (don’t ask). It worked in the sense that my IRC client got it but I didn’t see the link from FluffyChat, so it actually failed to do what I needed. I was hoping I could then delete this message and get rid of that (public!) file (even though it’s hidden behind a secret URL). That didn’t work either: I deleted the message in FluffyChat, and it’s still out there, on that home server.

        In other words, I tried to redact a message from a well-known, mainstream client, and it didn’t work: the message is still there, on that very home server (not the federation!).

        It’s one thing to argue that decentralized systems are hard, and that moderation on federated system is really hard. But this, this is different. Those are basic interoperability issues for basic features that currently Do Not Work in Matrix, and makes me unlikely to recommend people switch to Matrix.

        It actually makes me even more uncomfortable with the whole thing seeing how the Matrix lead responds to such criticism, barely acknowledging any of those issues, while, I suspect, being painfully aware of how accurate many of those are internally.

        How honest are we being with ourselves here?

        1. 1

          Missed this at the time; responding for posterity: I’m not trying to be cavalier here. In my opinion there are three legit points that this article raised, which I listed at the end of my response: fragile E2EE (fixed by matrix-rust-sdk), media repo problems (lack of auth, lack of DELETE, lack of ability to disable caching) and lack of moderation tooling (although that’s improved in the last 15 days thanks to community efforts: https://matrix.org/blog/2023/09/15/this-week-in-matrix-2023-09-15/#department-of-trust-safety-shield).

          I continue to believe that the canonical JSON complaints are a wart rather than a serious defect; meanwhile state resolution is generally reliable these days; and if I appear paternalistic and imply that the author doesn’t know what they’re talking about, it’s because they lead with stupid points like “you can’t guarantee that other servers will delete your data”.

          So, I’m trying to be honest with myself, and prioritise appropriately. For instance, the media repo issues were already actively being worked on, but have been bumped still higher in priority.

      4. 1

        The complaints about “not being able to guaranteed to delete data in a decentralised system!” are asinine, obviously: there is no way to force data to be deleted on other folks’ computers, short of utterly evil DRM.

        There is comfort in knowing a system at-least tries to guarantee deleting something, versus hoping it can delete something. People like knowing they can remove things from well-intentioned people’s servers. It’s not asinine to hope for the best, even if you’re not planning for the worst.

        1. 4

          Totally, well behaved Matrix servers do delete data on request! The way it works is that the DAG signs a hash of the data, not the data itself, so if folks want to delete nodes in the DAG then they send a “redaction” event which the servers in the room which servers in the room apply to discard the underlying data in question. The details are at https://spec.matrix.org/v1.8/client-server-api/#redactions, and matrix servers implement these by default. You’d have to maliciously tweak the server as an admin not to uphold them.

    4. 10

      I’ve seen matrix in use for work and whole campus networks without any of the really bad problems described appearing really. Especially the DoS or Room-Loss problems. What I did observe were people getting in a joined-but-not-joined state via their own homeservers. Also some problems when one server lost their whole server key during a total drive failure (which was pretty much their fault), took some time for others to be able to decrypt any of their messages again. For public matrix.org channels (or federated ones) of FOSS projects I actually had a pretty good experience. Reporting possible CSAM spammers to the matrix.org team worked also pretty well. Maybe disable federation for communities where you don’t need it and thus avoid the content and account problems - something already happening for stuff like the debian gitlab instance.

      Does the event graph need the whole event or just the ID ? Because just sending fake events with IDs (or just pure IDs) for erased messages should be fine then to keep the graph history.

      I can’t speak on the topic of illegal media, but I do think that matrix does give you a good option to build communities and selfhost things. Obviously discord, telegram, slack, IRC & co have solved some of these problems, but many times simply by not allowing selfhosting (discord, telegram, slack), federation, alternative clients or not having a lot of the desired features (all of them for E2E).

      The whole signed json ordering is a mess. Sadly the author said they follow multiple issues around the matrix project for all mentioned problems, but linked none of them.

    5. 10

      Honestly, Matrix is a solid replacement for XMPP, but using it is still more annoying than just running the IRC. Especially when using its safety features like encryption, things go wrong more often than not.

      At least it provides an open API, making it less risky to use unofficial clients, unlike Discord.

      1. 13

        Up until a month ago, I was really glad Matrix existed, not because I want to use it, but because it gives people fewer reasons to use brain-meltingly bad systems like Discord and Slack, and it did a great job interoperating with chat systems I actually did enjoy using, like Libera.

        Unfortunately the bridging to Libera broke recently, so everyone I enthusiastically invited to the Matrix-side versions of my channels promising they could use it to keep in touch with the communnity is now off disconnected from the actual chat.

        1. 2

          You can setup your own bot or ask others, projects like heisenbridge are used in some of the channels I frequent. That’s one more thing to operate, but it’s definitely doable to get back bridging between matrix and irc.

          1. 4

            Yeah, the problem is they’ve said that the outage is “temporary”; I’m not sure if I believe it, but I’d feel silly if I spent all this time setting up my own bridge and then the main one came back. =)

      2. 5

        It’s not a solid replacement tho as some of the post alluded to issues like mirror the entire history + all attachments of all users can be prohibitively costly to self-host. I lot of indie Matrix servers shutdown after a few months because the bills didn’t justify it. With Matrix.org, the ones that control the spec, being almost defacto (even the Mozilla server is hosted by Matrix.org) & all metadata being mirrored to them, there are centralization concerns as well.

        XMPP still has a more decentralized vibe & the beginnings a decade ago meant it could run & still runs fast on hardware for that time (server & clients). There are some missing features, but it’s still a great platform that deserves more love.

        1. 1

          You don’t have to mirror the entire history of content. Just state events for the auth chain. There are APIs for removing attachments older than x.

      3. 4

        On a protocol vibes level, IRC and XMPP feel like C and C++ respectively. Both are classics that work pretty well, though they have pretty different approaches to adding features and their own historical burdens.

        Matrix feels like something out of the last frames of a galaxy brain meme. I suppose this could be a good thing at least some of the time, though the original post sounds like materializing the benefits has proved difficult. (Disclaimer: I have looked through the some of the specs but never used a Matrix client.)

    6. 9

      I ran a Synapse instance for a couple years and it was a ton of admin overhead. I had around 400 local users and the resource use explosion was pretty bad. Also, every time I had to do something in Postgres, I was reminded of how poor the model is in the first place (of course, as a “database guy”, I probably complain too much about this stuff).

      It was difficult to actually compress or remove anything during a pruning process to claim back unused disk space, db storage, cache, etc. so I cobbled together scripts and commands form other folks and wrote my own over time. I appointed a few admins and we were able to combat spam and other abuse using external tools to Matrix itself, which are woefully inadequate. I could go on; it was not a positive experience but I am happy that so many people were able to connect on there, coordinate IRL events and meetups, and share millions of GIFs at each other.

    7. 8

      I really want matrix to succeed, but the issues are plentiful.

      The fact that self-hosting synapse in a performant manner is no trivial feat (this is slowly improving), compounded by the fact that no mobile client yet supports sliding sync (ElementX when) makes my user experience in general very miserable. Even the element-desktop client have horrible performance, unable to make use of GPU acceleration on nearly all of my devices.

      1. 12

        unable to make use of GPU acceleration on nearly all of my devices

        As an IRC user, do I want to know why a instant messaging client would need GPU acceleration? :x

        1. 8

          It’s nothing particularly novel to matrix: rendering UIs on the CPU tends to use more battery than the hardware component whose entire goal is rendering, and it’s hard to hit the increasingly-high refresh rates expected solely via CPU rendering.

          1. 3

            A chat application ought to do very infrequent redraws, basically when a new message comes in or whenever the user is composing, worst case when a 10fps gif is being displayed. I find it concerning we now need GPU acceleration for something as simple as a chat to render itself without feeling slugish.

            1. 8

              Rendering text is one of the most processor-intensive things that a modern GUI does. If you can, grab an early Mac OS X machine some time. Almost all of the fancy visual effects that you get today were already there and were mostly smooth, but rendering a window full of text would have noticeable lag. You can’t easily offload the glyph placement to the GPU, but you can render the individual glyphs and you definitely can composite the rendered glyphs and cache pre-composited text blocks in textures. Unless you’re doing some very fancy crypto, that will probably drop the power consumption of a client for a plain text chat protocol by 50%. If you’re doing rich text and rendering images, the saving will be more.

              1. 4

                The downside with the texture atlas rugged approach is that the distribution of glyphs in the various cached atlases in every process tend to become substantially re-invented across multiple graphics sources and make out quite a bit of your local and GPU RAM use. The number of different sizes, styles and so on aren’t that varied unless you dip into some kind of opinionated networked document, and even then the default is default.

                My point is that there is quite some gain to be had by somehow segmenting off the subsurfaces and somewhat split the load – a line packing format in lieu of the pixel buffer one with the LTR/RTL toggles, codepoint or glyph-index lookup, (so the client need to know at least GSUB of the specific font-set) and attributes (bold, italic, colour, …) one way and kerning feedback for picking/selection the other.

                That’s actually the setup (albeit there’s work to be done specifically in the feedback / shaping / substitution area) done in arcan-tui. Initial connection populates font slots and preferred size with a rough ‘how does this fit a monospaced grid w/h” hint. Clients using the same drawing properties shares glyph cache. We’re not even at the atlas stage (or worse, SDFs) stage yet the savings are substantial.

                1. 3

                  The downside with the texture atlas rugged approach is that the distribution of glyphs in the various cached atlases in every process tend to become substantially re-invented across multiple graphics sources and make out quite a bit of your local and GPU RAM use

                  I’m quite surprised by this. I’d assume you wouldn’t render an entire font, but maybe blocks of 128 glyphs at a time. If you’re not doing sub-pixel AA (which seems to have gone out of fashion these days), it’s 8 bits per pixel. I’d guess a typical character size is no more than 50x50 pixels, so that’s around 300 KiB per block. You’d need quite a lot of blocks to make a noticeable dent in the > 1GiB of GPU memory on a modern system. Possibly less if you render individual glyphs as needed into larger blocks (maybe the ff ligature is the only one that you need in that 128-character range, for example). I’d be really surprised if this used up more than a few tens of MiBs, but you’ve probably done the actual experiments so I’d be very curious what the numbers are.

                  That’s actually the setup (albeit there’s work to be done specifically in the feedback / shaping / substitution area) done in arcan-tui. Initial connection populates font slots and preferred size with a rough ‘how does this fit a monospaced grid w/h” hint. Clients using the same drawing properties shares glyph cache. We’re not even at the atlas stage (or worse, SDFs) stage yet the savings are substantial.

                  That sounds like an interesting set of optimisations. Can you quantify ‘substantial’ at all? Do you know if Quartz does anything similar? I suspect it’s a bit tricky if you’ve got multiple rounds of compositing, since you need to render text to some texture that the app then renders into a window (possibly via multiple rounds of render-to-texture) that the compositor composes onto the final display. How does Arcan handle this? And how does it play with the network transparency?

                  I recall seeing a paper from MSR at SIGGRAPH around 2005ish that rendered fonts entirely on the GPU by turning each bezier curve into two triangles (formed from the four control points) and then using a pixel shader to fill them with transparent or coloured pixels on rendering. That always seemed like a better approach since you just stored a fairly small vertex list per glyph, rather than a bitmap per glyph per size, but I’m not aware of any rendering system actually using this approach. Do you know why not? I presume things like font hinting made it a bit more complex than the cases that the paper handled, but they showed some very impressive performance numbers back then.

                  1. 3

                    I’m quite surprised by this. I’d assume you wouldn’t render an entire font, but maybe blocks of 128 glyphs at a time. If you’re not doing sub-pixel AA (which seems to have gone out of fashion these days), it’s 8 bits per pixel.

                    You could’ve gotten away with an alpha-coverage only 8-bit texture had it not been for those little emoji fellows, someone gave acid to the LOGO turtles and now it’s all technicolour rainbow – so full RGBA it is. While it is formally not a requirement anymore, there’s old GPUs around and you still can get noticeable difference when textures are a nice power-of-two (POT) so you align to that as well. Then comes the quality nuances when rendering scaled, since accessibility tools like there zooms in and out you want those to look pretty and not alias or shimmer too bad. The better way for that still is mip-mapping, so there is a point to raster at a higher resolution, switch that mipmap toggle and have the GPU sort out which sampling level to use.

                    That sounds like an interesting set of optimisations. Can you quantify ‘substantial’ at all? Do you know if Quartz does anything similar?

                    There was already a big leap for the TUI cases not having WHBPP*2 or so pixels to juggle around, render to texture or buffer to texture and pass onwards (that could be another *4 because GPU pipelines and locking semantics you easily get drawing-to, in-flight, queued, presenting).

                    The rest was that the font rendering code we have is mediocre (it was 2003 and all that ..) and some choices that doesn’t fit here. We cache on fonts, then the rasterizer caches on resolved glyphs, and the outliner/shaper caches on glyph lookup. I don’t have the numbers available, but napkin level I got it to around 50-75% overhead versus the uncompressed size of the font. Multiply that by the number of windows open (I drift towards the upper two digit of active CLI shells).

                    The size of a TPACK cell is somewhere around 8 bytes or so, using UCS4 even (you already needed the 32-bit due to having font-index addressing for literal substitution), then add some per-line headers. It also does I and P frames so certain changes (albeit not scrolling yet) are more compact. I opted against trying to be overly tightly packed as that has punished people in the past and for the network case, ZSTD just chews that up into nothing. It’s also nice having annotation-compact text-only intermediate representation to juggle around. We have some subprojects about to leverage that.

                    Do you know if Quartz does anything similar? I suspect it’s a bit tricky if you’ve got multiple rounds of compositing, since you need to render text to some texture that the app then renders into a window (possibly via multiple rounds of render-to-texture) that the compositor composes onto the final display. How does Arcan handle this? And how does it play with the network transparency?

                    I don’t remember what Quartz did or how their current *Kits, sorry.

                    For Arcan itself it gets much more complicated and a larger story, as we are also our own intermediate representation for UI components and nest recursively. The venerable format string based ‘render_text’ call at the Lua layer force rasterisation into text locally as some genius thought it a good idea to allow arbitrary embedding of images and other video objects. There’s a long checklist of things to clean up, but that’s after I close down the network track. Thankfully a much more plastic youngling is poking around in those parts.

                    Speaking of networking – depending on the network conditions we outperform SSH when it starts to sting. The backpressure from things like ‘find /’ or ‘cat /dev/random’ resolves and renders locally and with actual synch in the protocol you have control over tearing.

                    I recall seeing a paper from MSR at SIGGRAPH around 2005ish that rendered fonts entirely on the GPU by turning each bezier curve into two triangles (formed from the four control points) and then using a pixel shader to fill them with transparent or coloured pixels on rendering.

                    AFAIR @moonchild has researched this more than me as to the current glowing standards. Back in ‘05 there was still a struggle getting the text part to behave, especially in 3D. Weighted channel based hinting was much more useful for tolerable quality as well, and that’s was easier as a raster preprocess. Eventually Valve set the standard with SDFs that it still(?) the dominant solution today (recently made its way natively into FreeType), and quality optimisations like multi-channel SDFs.

                    1. 1

                      Thanks. I’m more curious about the absolute sizes than the relative savings. Even with emoji, I wouldn’t expect it to be a huge proportion of video memory on a modern system (even my 10-year-old laptop has 2 GiB of video memory). I guess it’s more relevant on mobile devices, which may have only this much total memory.

                      1. 1

                        I will try and remember to actually measure those bits myself, can’t find the thread where C-pharius posted it on Discord because well, Discord.

                        The savings are even more relevant if you hope to either a. at least drive some machines from an FPGAd DIY graphics adapter instead of the modern monstrosities, b. accept a 10-15 year rollback in terms of available compute should certain conflicts escalate, and c. try to consolidate GPU processing to a few victim machines or even VMs (though the later are problematic, see below) – both of which I eventually hope for.

                        I layered things such that the Lua API looks like a balance between ‘animated display postscript’ and ‘basic for graphics’ so that packing the calls in a wire format is doable and asynchronous enough for de-coupling. The internal graphics pipeline also goes through an intermediate-representation layer intended for a wire format before that gets translated to GL calls for the same reason – at any time, these two critical junctions (+ the clients themselves) cannot be assumed/relied upon to be running on the same device / security domain.

                        Public security researchers (CVE/bounty hunters) have in my experience been pack animals as far as targeting goes. Mobile GPUs barely did its normal job correctly and absolutely not securely for a long time and little to nothing could be heard. From DRM (as in corporate malware) unfriendly friends I’ve heard of continuous success bindiffing Nvidia blobs. Fast > Features > Correct > Secure seems generally to be the priority.

                        With DRM (as in direct rendering manager) the same codebase hits BSDs and Linux alike, and for any VM compartmentation, VirGL cuts through it. The whole setup is massive. It evolves at a glacial pace and it’s main job is different forms of memcpy where the rules for src, dst, size and what happens to the data in transit are murky at best. “Wayland” (as it is apparently now the common intersection for several bad IPC systems) alone would’ve had CVEs coming out the wazoo had there been an actual culture around it, we are still waiting around for conformance tests, much less anything requiring more hygiene. Fuzzing is non-existent. I am plenty sure there are people harvesting and filling their barns.

                      2. 1

                        An amusing related curiosity I ran across while revisiting a few notes on some replated topic - https://cgit.freedesktop.org/wayland/wayland/tree/NOTES?id=33a52bd07d28853dbdc19a1426be45f17e573c6b

                        “How do apps share the glyph cache?”

                        That’s the notes from the first Wayland commit covering their design axioms. Seems like they never figured that one out :-)

          2. 3

            Ah, that makes sense, thanks. I’m definitely sympathetic to the first problem.

        2. 1

          With irissi I’m using GPU acceleration because my terminal emulator is OpenGL based.

      2. 4
        1. 1

          Sadly I’m blocked by no support for SSO

          1. 4

            as your link says:

            Account creation and SSO will come with OIDC. OIDC will come in September.

            the code’s there and works; just needs to be released and documented. NB that shifting to native OIDC will be a slightly painful migration though; some of the old auth features may disappear until reimplemented in native OIDC, whicy may or may not be a problem for you.

      3. 4

        If you’re on Android, note that an early release of Element X just hit the Play Store yesterday: https://element.io/blog/element-x-android-preview/.

    8. 7

      There’s a whole class of complaints around the inability to reliably delete stuff on remote computers we don’t control… and… well… nobody can reliably delete stuff on remote computers they don’t control. I mean how could they?

      Why complain about this inevitability as if it was Matrix’s fault?

    9. 5

      Biggest issue I’ve had with matrix is its extremely slow to sync when I haven’t used it a while, which then decreases my likelihood to use it again, which means the next time I sign in I’ll have the same problem. It’s a cycle.

      1. 5

        this is being fixed in element x with “sliding sync”

        1. 1

          Can a server sliding sync with the ‘main’ server or is this just clients?

          1. 4

            Sliding sync is just for the client-server API, doesn’t affect server-server API, AFAIK

            1. 1

              Bummer. Then the issue of needing to duplicate the entire history isn’t solved for servers. This makes self-hosting an expensive endeavor if you want to join some big groups unless you want to constantly combat the storage issues with scripts as an admin.

              1. 7

                Matrix has never needed to replicate the entire history when you join a room. It does the last ~20 messages and then pulls in others on demand. It does however need to replicate the “room state” - i.e. who’s in it, what the room permissions are, etc. We switched to lazyloading for this back in ~March; the project was imaginatively called Faster Remote Room Joins and has shipped in Synapse. (It doesn’t lazyload as much as it could or should, but the infrastructure is there now).

                1. 1

                  That is great to hear. I’ve been hearing differently from other sources when I asked in the past.

              2. 3

                IIRC federation doesn’t need to duplicate the entire history. It can fetch old messages from another server as needed. But state events are needed so that the currently valid auth state can be resolved.

    10. 4

      I’m not saying any of this is untrue, but I think a lot of it has to do with scale and certain setup, even if those certain setups might be a majority of Matrix usage.

      I’m part of a group of people who have switched to Matrix from IRC and we’re still 95% happy, many years after. It might count as a basic setup, the biggest room has 20 people, the rooms have all been created on one server, but people have joined from anywhere, including their own one-person-homeserver

      So for me it is still the perfect replacement for any of these: IRC, Slack, Discord, Signal, WhatsApp, XMPP, Twitter, Fediverse, Telegram

      Stuff I know of but have not tried in a meaningful capacity: Rocket Chat, Mattermost, Zulip

      The only times I joined huge public rooms it didn’t feel great, but it’s been a while since I’ve been active there. Guess I should try again to get a better feeling, but I’m not here to defend Matrix, I’m just saying despite all flaws I may or may not notice, it’s great for this.

      1. 1

        I’m part of a group of people who have switched to Matrix from IRC

        Why?

        1. 5

          I’m not ~wink, but at one of my companies we switch to Matrix from IRC because of IRC being an atrocious experience for non-technical people and on mobile. It’s been running for years, non-federated without major issues.

          1. 3

            That’s why we switched from IRC to XMPP to Zulip. Zulip is overwhelmingly good for both tech and non-tech people, as long as you don’t need any federation.

            Federation would be very useful.

            1. 2

              Zulip is by far my favourite of these kinds of things. The hybrid forum design is a bit strange, but it’s just a really useful model.

              The main downsides for me are that its screen sharing, streaming, video call and voice comms experience is worse than discord (zulip just does it through jitsi, which is fine when it works but is more fragile) and the presentation style is less fun than discord’s.

              OTOH, voice call through discord isn’t great anyway. Real phone calls, teamspeak3 and mumble have noticeably better latency and quality.

          2. 3

            Web and mobile IRC clients exist, but I agree that they are (mostly) slightly harder to use than Matrix. But signing up is much easier IMO.

            1. 5

              It seems it’d be really useful to adopt Discord or Matrix-like UX for some IRC clients. IIRC IRC Cloud does this but I might be wrong. Server icons and whatnot could easily be added to ircv3 as a capability.

              I think right now there’s a complete lack of IRC developers.

              1. 8

                I know a bunch of the people who used to do IRCv3, and they gave up with how little progress was made. I think IRCv3 had its chance, and that chance is gone.

                1. 2

                  Yeah I heard v3 has been stagnant / hard to move… :/

                  Do we need a more natural approach? Like have an IRC server implement a capability, and with popularity, becomes adopted by others? Might be better. Could be a fun discussion elsewhere.

                  1. 2

                    IRCv3 is already based on capabilities like these to allow progressive enhancement. See the support tables: https://ircv3.net/software/servers.html and https://ircv3.net/software/clients.html

                2. 2

                  I know a bunch of the people who used to do IRCv3, and they gave up with how little progress was made.

                  Any idea why it was so hard?

                  1. 3

                    I’m an active contributor to IRCv3 and honestly I don’t know. My guess is that anyone who cares about frontend development moved on to platform with more modern frontends, and few people volunteer to work on the GUI clients.

        2. 5

          You’d have to ask the others for specifics, I’m one of the very few holdouts who still use IRC, in my case most days. I’d say the usual stuff: no history without a bouncer, no media sharing.

          It was a channel of people who’d known each other in RL for years, plus a few additions. From time to time someone vanished and at some point we gave up the channel. Matrix managed to reignite interest, my guess is because it has a decent mobile app and is not horrible on a computer. Or maybe the time was right and I’m just projecting, but unlike other (small) communities stuff like Facebook, WhatsApp would have never worked out with this crowd

          1. 2

            In theory, IRCv3 supports server-side chat histories: https://ircv3.net/irc/#chathistory

            I agree about the media sharing (and the mobile apps) though. Android has a few decent IRC apps, iOS does as well, but they’re clunky in some aspects.

            1. 9

              Good luck finding a server and client pair that supports any of the new features. That’s the biggest problem with the IRCv3 ecosystem.

            2. 8

              My biggest problem with IRC on mobile is the horrendous battery life problems because you need to maintain a persistent connection that’s constantly sending you data you may or may not want. That needs deep, serious protocol-level changes to fix and those are simply never going to happen. That, or you need a server-side support component that probably doesn’t speak IRC to the mobile device.

              1. 3

                My experience with matrix on mobile has been horrendous battery life problems. But I guess they are app-issues not protocol-issues.

              2. 1

                What’s “horrendous” for you, and what is “data you […] may not want” that gets sent for IRC but not other IM systems?

                From looking at the system battery stats and Accubattery, Quassel on my Android 12 device has used 10%, but I have spent a lot of time in the app with screen on so I’d guess the background battery use part is <5%. Not great, not terrible. (Yes, Quassel uses its own protocol to talk to its bouncer aka “core”, but the “persistent connection” part still applies.)

    11. 4

      I really want Matrix to be a viable option for safe, private hosting of small communities. There really, really needs to be a solid non-corporate option, there’s been a lot of distressing stuff going on with the corporate chat platforms lately.

      It is not a viable option, and this essay lists a large slice of the serious issues that would have to be addressed for me to see it as one. Furthermore most of these issues are baked into the spec and cannot be changed at this point.

      1. 7

        Many of these issues are either non-issues or grossly overstated; guess I’ll have to go through responding to them. Meanwhile, the spec is mutable (we’re on room version 11 already, and pretty much everything can change between room versions), and we’ve been steadily fixing stuff over the years (including some of the stuff incorrectly flagged here as still being problematic).

        1. 5

          you are of course welcome to reply here, and I want to say that I support your project and I very much want it to be better than it is, but like, I’ve dug into the details of a lot of this myself. wishing it to be better doesn’t actually make it better. I hope that, as you suggest, your migration path is good enough to get out of this hole. the world will be significantly brighter if you do someday.

    12. 3

      Agree with quite few of these points, but I see the lack of deniability in chat as a feature. Also malicious servers not deleting events is a very small problem if you look at the fact any client could keep history as well. If you are going to say something that you will regret definitely don’t write it down.

      1. 6

        Ironically, Matrix has deniability. Encrypted messages aren’t signed; there’s no way to prove that the other user in the conversation didn’t spoof the transcript if they could have colluded with the server to put a copy of the spoofed transcript there too.

        1. 3

          Encrypted messages are signed by the sender’s homeserver. With the move to Matrix P2P this will be equivalent to being signed by the sender’s client.

        2. [Comment removed by author]