1. 29
  1.  

  2. 4

    “under thirty (virtual) servers, a very few large PostgreSQL database servers and a single file-server/storage”

    I’d love to see the back of the envelope math behind the estimation that you could probably fit 6 million users into this. I’m not saying this claim is wrong or anything, just that the capacity planning would be interesting.

    1. 4

      Not directly an answer to your question, but years ago I found this post about SO infrastructure quite interesting. Certainly, a Relational DBMS can get you quite far.

      1. 1

        I should re-read that now that very fast SSDs are cheap and commonplace instead of being an unusual expensive investment.

        Edit: having reread it, oh gosh remember when a TB of SSD was tremendously expensive? Dang. The detail that stuck out to me was their having to have overkill LAN bandwidth to support backups and other similar bulk data transfers.

    2. 4

      Scattered thoughts:

      • Sending a post to a few thousand federated servers feels like a lot, but is going to be dwarfed by sending posts from servers to end users. As long as there are far fewer nodes than users, iffy scaling in that network is just not as big a problem.
      • The “wasted” traffic is sending content server-to-server that no user views–user1@A is subscribed to user2@B’s posts, but they don’t happen to log in at a time they’d ever see this particular post.
      • Even that traffic isn’t that much, given the small posts, without media. Federating media serving is possible in theory: you embed a link to your buddy’s server or a mutually trusted CDN like you’d embed a YouTube video. If no operator pays another for the cost, somebody is subsidizing somebody else, but that doesn’t make it impossible.
        • Nobody is going to do a Twitter clone’s media serving as a charity but if some CDN wanted to get some attention now, they could do worse than a first-taste-free kind of deal on CDN’ing for nodes in this alternative social network that’s growing like wild.
      • It seems worth thinking about what the next couple steps at the data layer are when a node gets too big to be backed by One Big RDBMS (“next couple steps” very different from “rebuilding Twitter”). Caching everywhere? Partitioning old data off? Super-optimized timeline updates when you follow thousands of users of whom only 10 posted since your last check?
        • Big Nodes™️ presumably get some economies of scale the small ones don’t, for all the reasons being discussed.
        • I don’t think having a lot of users on Big Nodes breaks the federation idea. Even a dozen big nodes is different from one Facebook or Twitter. And it’s plausible lots of people pick the easy big-node route and a few pick the option that gives them the most control, like in lots of other areas.
      • It kind of sounds as if optimizing the server code itself would really be useful right now? Would be fun to work on GoToSocial at a time that more-efficient frontends could really help some folks.

      When looking for Twitter alternatives started to get real, I got an account on Cohost. It’s fun and quirky and I like the Tumblr-ish design, community, and the HTML and CSS crimes, and I figured when I signed up that one team running one instance of the app might have less trouble scaling. All the nice design is still nice, but Mastodon, partly by virtue of being around a while, is just running better for me and has great tools for, say, finding people, which makes it hard for me to count it out despite arguments about what theoretically should or shouldn’t work.

      1. 8

        Some of the criticisms are weird because something like Twitter or Facebook is also a distributed system. The difference is that it’s one where the topology is (mostly) under the control of a single organisation. It’s not like there’s one big computer somewhere that runs Twitter, there are a bunch of data centers that all run instances of parts of the logical ‘Twitter server’.

        Use cases like Twitter are perfect for distributed systems for two key reasons:

        • Very weal consistency is fine, no one really cares if two people send two messages to other people and they arrive in the wrong order.
        • Eventual consistency is also fine - no one cares if two people look at the same feed and one sees an older snapshot.
        • Latency that users care about is measured in minutes (or, at most, tens of seconds), not milliseconds.

        All of this is very different from the ‘economies of scale’ argument. You can benefit a lot from economies of scale from other people by buying cloud services rather than running your own. Mastodon does this, optionally using cloud storage for large data. I’m not sure if it proxies these back, but (at least with Azure storage, not sure about other services, but I presume they’re similar) you can trivially mint short-lived tokens that allow a user to read some resource, so you can punt serving things like images and videos to a cloud service and just have to give them updated URLs periodically. You could implement most of the control messaging with Azure Functions or AWS Lambda and probably end up with something where anyone can deploy an instance but the cost is much lower than needing to host a full VM.

        And it does this over HTTP, hence TCP/IP. All of which is chatty and relatively inefficient.

        This is not necessarily true with HTTP/3. Then each s2s connection can be a single long-lived QUIC connection and each request can be a separate stream. The overhead there is very low.

        My biggest issue with Mastodon is scalability in the opposite direction. If I want to run a single-user instance, or one for my family, then the cost is very high per user. I’d love to see a ‘cloud native’ (I hate that phrase) version that I could deploy for $1/user/year on any major cloud provider.

        1. 1

          Scaling down is a way harder problem than scaling up. For one, you basically can’t get a VPS for less than ~5$/month - and that number has been pretty much constant for as long as I can remember. So that would immediately put the minimum number of users for the limit you give at 60. You’d need to be able to get instances for at least 10 times cheaper than are currently available if you want to do so.

          If you want to go the serverless way, I don’t think you’d get much savings there either. My single-user instance handles about 31k of incoming requests per day, which would sum to close to a million requests per month. But this does not include any outgoing requests, which for me, have been averaging out to roughly 500 outgoing requests per day, ignoring retries. So I’d say, that’s at least 1 million function invocations per month, which is exactly the amount of invocations that AWS offers for Lambda for free. But then you also need to add object storage for media (grows depending on usage, I’d guess ~0.1$/month after a while), queuing for outgoing and some incoming requests (would probably fit into SQS’s 1M message limit), and importantly, the main database, which for maximum cheapness, would probably be DynamoDB, and I think you might be able to fit into free tier limits, but I’m not that sure because some operations commonly done by fediverse servers aren’t that efficient on key-value databases.

          So, you could probably fit it into the free limits of various cloud providers. But if you’d take the free limits away, I’m fairly certain that you’d soon see the costs grow way higher than 5$/month of a VPS.

          1. 1

            Scaling down is a way harder problem than scaling up. For one, you basically can’t get a VPS for less than ~5$/month - and that number has been pretty much constant for as long as I can remember

            This is why you don’t implement it as something that needs a VM.

            So I’d say, that’s at least 1 million function invocations per month

            Azure Functions cost $0.000016/GB-s of memory and $0.20 per million executions, so I’d expect your uses to come in below $1/month.

            But then you also need to add object storage for media (grows depending on usage, I’d guess ~0.1$/month after a while)

            and importantly, the main database, which for maximum cheapness, would probably be DynamoDB, and I think you might be able to fit into free tier limits, but I’m not that sure because some operations commonly done by fediverse servers aren’t that efficient on key-value databases.

            Azure Data Lake has some rich querying interfaces and charges per-GB. I presume AWS has something similar.

            1. 1

              I don’t think that Azure Data Lake is fit for the purpose. I see it mentioned more as a big data storage, meant for OLAP workloads rather than for OLTP workloads. I think CosmosDB would be the better example on Azure.

        2. 1

          Good assumptions but need data. :-P

        3. 3

          I had a lot of thoughts about “Scaling [the fediverse] is impossible” which I dumped on my fedi account, but in summary:

          • I don’t think “distrust of governments/taxation” or “cryptocurrency” are the main drivers for interest in decentralization. I think the majority of us are disillusioned by large corporations with no oversight and terrible intentions (Twitter, Facebook) and believe that we can do better with small trust-based communities cooperating toward a larger goal: that’s federation.

          • Moderation, human relationships, and managing social interactions is hard. I don’t think a centralized service like Twitter solved any of those problems. It just handed the authority to a group of investment bankers who (shock!) turned out to make bad decisions on your behalf, with no recourse.

          • We’ve had federated online systems before: BBSs and usenet, at the very least. They don’t require everyone to run a server, and they don’t require everyone to be good at moderating. They build on “centralization in the small”: your town may only have a few BBSs, but they require the volunteer time of several people. It’s messy at times but it can and does work.