1. 25
  1. 5

    I was at the original announcement of Matrix at FOSDEM several years back now and have been following the project since then. Sounds like they learned some lessons the hard way. I’m glad they’re being transparent about what went wrong and how they’re addressing the failures. Given how rarely conversations like this can be exposed to the public it is very valuable to have a case study like this open for the community to learn from.

    1. 3

      Reminding me it’s free isn’t really a way to be endearing during a decomposition of an incident like this. Having run synapse from early on until recently I can attest that the “bad” practices, albeit being addressed over time, show up in more than just devops for the project.

      It is altruistic, it is a project with good goals. However it’s viewed by many as being a panacea where people are not registering issues and pr(s) against the project. It needs a lot of love still before folks will be running homeservers for the families and friends that don’t become maintenance nightmares.

      1. 13

        hi storrgie - fwiw, from my perspective, our failure to handle your GH issues is certainly one of the biggest screwups over the last few years on synapse. your main one (https://github.com/matrix-org/synapse/issues/2419) has been been brought up time and time again; if you recall, I fixed it myself in https://github.com/matrix-org/synapse/pull/2421, only for it to get derailed by overzealous review). I then eventually fixed it again in https://github.com/matrix-org/synapse/pull/5083 a few weeks ago… which has this time been finished off properly and was merged 6 hours ago. For what it’s worth, I can’t think of any other bug in Synapse (or Matrix) which has had such a bumpy ride, but it’s finally been put to bed. It is excruciatingly embarrassing that it took so long, and doubly so that it sounds like it came too late for your use case.

        In terms of adminability of Synapse - the thing is still not at 1.0, thanks to being t-boned by things like the security incident in the original post here. Yes, there are still some major admin challenges (lack of richness to the admin API; lack of admin GUI; memory usage and room fragmentation being the main ones), but we are still plugging away to fix them. Then, I’m hoping better servers will emerge.

        In terms of reminding people that the matrix.org server is a best-effort free service: the intention was more to justify why we invested our ops time in building out the paid services (to try to keep the project funded) rather than trying to be endearing or to say ‘you get what you pay for’. sorry if it jarred.

        Hopefully Matrix will eventually be something you’ll consider running again once we finally escape beta for Synapse.

        1. 3

          I can attest that the “bad” practices, (…) show up in more than just devops for the project.

          This piqued my interest. Could you expand on what other areas of the project have “bad” practices?

          1. 0

            Until synapse is replaced with something written in a sane language that isn’t single threaded / has the dreaded GIL, it will not go anywhere.

            Also their database schema sucks too.

          2. 2

            SSH should not be exposed to the general internet

            We are rolling out a VPN as the main access to dev network

            Isn’t that just trading one thing for another? What makes this mysterious VPN implementation they moved to more secure than a properly set up SSH server (assuming other SSH concerns in article are addressed)?

            They are doing a lot of stuff to harden SSH access, then introducing a completely new remote access interface.. seems like an odd move given that they didn’t take steps to initially harden the original interface (SSH) until now.

            1. 2

              Author here. The point is more that the VPN adds security in depth. Access to hosts is still via SSH (and production access is by SSH + jump boxes) but now also need to have VPNed in first. The VPN can then be used for accessing other intranet services (e.g. our internal matrix servers) rather than exposing them to the ’net.

            2. 2

              Needless to say, SSH is no longer exposed to the general internet. We are rolling out a VPN as the main access to dev network

              I see this often with SSH, RDP and it baffles me. It’s as if people think VPN services cannot have security bugs, be bruteforced or otherwise abused. I have dismantled several VPN solutions that were ‘protecting’ much safer services.

              Bastion hosts, however, are a fine way of reducing the attack surface, and users can have one key for the bastion hosts and another key for the internal services they need. The ProxyJump feature is too overlooked.