Threads for msun

    1. 2

      Kind of surprised about a lot of these points, in particular the fact that they have design decisions that seem to fall over given the bimodal nature of discord servers.

      The static buckets for timing and using a database with cheap writes and more expensive reads is like… for most systems you can get away with this (and they are getting away with it for the most part IMO). But given this is their core system, it feels like by now adding a different sort of indexing system for scrollbacks that allows for certain discords to have different buckets seems very important.

      EDIT: honestly it looks like the work is almost there. Bucket sizes being channel dependent seems like an easy win, and maybe you have two bucket fields just built-in so you can have auto-migration to different bucket sizes and re-compact data over time, depending on activity.

      I don’t know about Cassandra’s storage mechanisms, but I do know that a lot of people with multitenant systems with Postgres get bit by how the data is stored on disk (namely, you make a query and you have to end up fetching a lot of stuff on disk that is mostly data you don’t need). It feels so essential for Discord for data to properly be close together as much as possible.

      1. 3

        I’m also surprised that they would bias for writes. My intuition is that chat messages are always written exactly once, and are read at least once (by the writer), usually many times (e.g. average channel membership), and with no upper bound. That would seem to be a better match for a read-biased DB. But I’m probably missing something!

        1. 8

          It’s a great question and the answer isn’t straightforward. Theoretically btree based storage is better than LSM for read heavy workloads but almost all distributed DBs–Cassandra, BigTable, Spanner, CockraochDB–use LSM based storage.

          Some explanations why LSM has been preferred for distributed DBs:

        2. 3

          I’m assuming they are broadcasting the message immediately to everyone online in the channel, and you only read from the database when you either scroll back far enough for the cache to be empty, or when you open a channel you haven’t opened in a while. That would avoid costly reads except for when you need bulk reads from the DB.

          1. 6

            Yes, we have realtime message via our Elixir services to distribute messages:

          2. 2

            I’d be very surprised if the realtime broadcast of a message represented more than a tiny fraction of its total reads. I’d expect almost all reads to come from a database (or cache) — but who knows!

            1. 2

              It’s a chatroom and people don’t scroll way up super often. They only need to check the last 50 messages in the channel unless the user deliberately wants to see more. There might be a cache to help that? But you can stop caching at a relatively small upper bound. That said I am curious how this interacts with search.

    2. 9

      I’m astonished that they can store and search all the messages on all Discord instances in just 72 nodes with 9TB storage each. I’m on a few and it seems like some people post thousands of messages a day! And they appear to stay forever.

      1. 18

        (I work at Discord on the Persistence Infra team)

        We also run many Elasticsearch clusters to handle searching through all of the messages.

        1. 6

          I would love to hear your thoughts on managing such a big ES cluster. The few times I’ve bumped into ES I felt like it was a very capable database but always a total bear from the operations side, and I imagine your cluster is 10x bigger than anything I’ve dealt with.

        2. 4

          Is that also why you don’t allow for exact search (which can be extremely annoying), being a different kind of index?

      2. 5

        I’m quite surprised it’s that big. Text is small. Wikipedia is 86 GiB for all of the text of the current English version. You could easily index and search that in RAM on a single moderately powerful server. I hadn’t realised how much more people type in Discord.

        1. 13

          I would definitely expect Discord to have more text than Wikipedia. Chat is append-only, whereas Wikipedia articles are edited so don’t necessarily grow with the number of edits. People also socialize more than they write encyclopedia articles.

          1. 3

            IIRC Wikipedia stores all edit history and lets you diff the versions on the website? But yeah that wouldn’t count against the downloadable dump size.

        2. 4

          Discord has over 100mm active users, and you also have overhead per message. That adds up fast with that many people.