1. 42
  1. 9

    Very intersting, but I would love to hear more details on why matrix is not as scalable. They hint at the merge operations but I don’t understand why that is a problem.

    1. 18

      We’d like to know too :) Matrix as protocol isn’t inherently unscalable at all. It’s true that every time you send a message in matrix you effectively are merging the state of one chatroom with the state of another one - very similar to how you push a commit in Git. Generally this is trivial, but if there’s a merge conflict, it’s heavier to resolve. The Synapse python implementation was historically terrible at this, but has been optimised a lot in the last 12 months. The Dendrite go implementation is pretty fast too.

      There’s an interesting optimisation that we designed back in 2018 where you incrementally resolve state (so called ‘delta state res’), where you only resolve the state which has changed rather than considering all the room state (i.e. all key-value pairs of data associated with the room) en masse. https://matrix.org/_matrix/media/v1/download/jki.re/ubNfLtrmXZMmlGjJZYPnlHHy and https://github.com/matrix-org/synapse/pull/3122 give a bit of an idea of how that works. It would be really cool if Process One is doing something like that with ejabberd, but in practice we suspect that they’ve just done an efficiently implementation of the current state res algorithm. We’ve pinged them on Twitter to see if they want to discuss what they’re up to :) https://twitter.com/matrixdotorg/status/1580549591807975430

      1. 11

        There’s an interesting optimisation that we designed back in 2018 where you incrementally resolve state

        Is it really so hard to see why a protocol that cares about conversation state is more difficult to scale than a protocol that completely ignores it? Seems almost tautological to me.

        1. 15

          Matrix is certainly more complex to scale (as our inefficient first gen implementations demonstrated), but i think folks are conflating together “it’s complex to write an efficient implementation” with “it doesn’t scale”. It’s like pointing out that writing an efficient Git implementation is harder than writing an efficient CVS implementation; hardly surprising given the difference in semantics.

          In practice, you can definitely write a Matrix implementation where all operations (joining, sending, receiving, etc) are O(1) per destination, and don’t scale with the amount of state (i.e. key value pairs) in a room. And to be clear, Matrix never scales with the amount of history in a room; history is always lazyloaded so it doesn’t matter how much scrollback there is.

          Historically, joining rooms in Matrix was O(N) with the number of the users in that room, but we’ve recently fixed this with “faster remote joins”, which allows the room state to get lazily synced in the background, thus making it O(1) with size of room, as it should be. https://github.com/matrix-org/matrix.org/blob/80b36d13c3097ffb5ba33572d9011e71940f1486/gatsby/content/blog/2022/10/2022-10-04-faster-joins.mdx is a shortly-to-be-published blog post giving more context, fwiw.

          1. 9

            The post doesn’t say “Matrix doesn’t scale”, just that XMPP and MQTT scale better. This is because they’re solving dramatically simpler problems. I don’t see anything problems with that claim.

            1. 4

              As an aside, from that draft,

              whereas it used to take upwards of 12 minutes to join Matrix HQ […] this is now down to about 30 seconds (and we’re confident that we can reduce this even further).

              Holy cow they did it! Woo! So proud of the Synapse team :)

              1. 2

                On the technical side, that’s genuinely impressive work. On the product side, I can’t help but compare with iMessage, signal, WhatsApp and discord being closer to one second.

                1. 3

                  the target is indeed <1s, and should still be viable. we’ve shaved the number of events needed to join #matrix:matrix.org from ~70K to ~148 iirc, which should be transferred rapidly.

          2. 11

            We’d like to know too :) Matrix as protocol isn’t inherently unscalable at all

            I suspect that this is a question of relative scale. A lot of users of eJabberd are using it as a messaging bus, rather than a chat protocol and so sending a message is likely to be on the order of a few hundred BEAM VM instructions. This is especially true of MQTT, where you don’t have the XML parsing overhead of XMPP and you can parse the packet entirely with Erlang pattern matching. If it’s a deferred message then you may write to Mnesia, but otherwise it’s very fast. In contrast, something that keeps persistent state and does server-side merging is incredibly heavy. That doesn’t mean that it isn’t the right trade off for the group collaboration scale, but it definitely means that you wouldn’t want to use Matrix as the control plane for a massively networked factory, for example.

            1. 5

              I guess it will be interested to benchmark. To use the git v. cvs example again, I think it’s possible to have an efficient (but complex) merging system like git which outperforms a simple “it’s just a set of diffs” VCS. We certainly use Matrix successfully in some places as a general purpose message bus, although when we need faster throughput we typically negotiate a webrtc datachannel over Matrix (e.g. how thirdroom.io exchanges its world data).

              1. 5

                The analogy isn’t really matched to this context though. SIP or XMPP or MQTT doesn’t involve diffs or storage or really even state in the basic use case, whereas Matrix is always diffs and merges.

                1. 4

                  Also git and CVS are programs and file formats with (roughly) one implementation, whereas MQTT and Matrix are protocols. The semantics of protocols place an upper bound on the efficiency of any potential implementation.

            2. 9

              No one said it was unscalable, just that it was harder. If it takes a dedicated team multiple years and a full reimplementation to scale it, and even just joining a room is still slow, that says something.

              I currently run Dendrite unfederated, in part (thought not solely) because I don’t want someone to accidentally bring down my small server by joining a large channel somewhere else. I still think Matrix is a good idea, but “scaling Matrix is hard” should be a pretty uncontroversial statement.

              1. 0

                The OP said “Matrix is not as scalable”. My point is that yes, it’s harder to scale, but the actual scalability is not intrinsically worse. It’s the same complexity (these days), and the constants are not that much worse.

          3. 2

            This is big for xmpp and ejabberd. Curios how matrix corp will start changing protocol so that only their (slow and resource hungry) server works. Also they took VC money. Gj ejabberd!

            1. 4

              Why would they do that? It’s not like there was only one matrix server before this announcement. And element (company) is separate from matrix (foundation)

              1. 3

                It’s not like there was only one matrix server before this announcement.

                Yeah, but ejabberd is deployed on thousands of servers and is serving millions of users.

                And element (company) is separate from matrix (foundation)

                Wow, you still believe in the separate foundation with the corp main shareholder O_o

                Once matrix integration is in the community edition, no idea why one would want to install matrix official server or anything else besides ejabberd.

                1. 4

                  There are no shareholders in the foundation. In fact element employees don’t even have the majority vote in the foundation.

            2. 1

              “added” seems very humble.