1. 61
    1. 11

      Very nice to see them backtrack on proposed and (at least partly) implemented solutions that they thought might work, but which turned out not to provide enough value compared with the added complexity. Kudos!

      1. 9

        It’s amusing (and unsurprising) to see that actually looking at the data-on-the-wire and making some fairly-obvious upgrades (“another win: passive sessions v2”) beats upgrading the compression protocol ;-).

        In any case: it’s very nice to see this writeup from Discord!

        1. 7

          Did it though? The vast majority of the downwards movement on the graph happened after deployment of zstandard.

          And from the wording of the article the total savings were about 40%, with sessions v2 accounting for about 20%, so at most the two were equivalent.

          1. 1

            It being roughly equivalent is still funny! Months of effort spent changing compression were matched by a relatively simple design change.

            1. 7

              I’m not sure your comment is in any way correct though?

              • much of the time went into setting up infrastructure, rolling things out without breaking everything, and evaluating the new option(s), much of that work is a long-term asset
              • this infrastructure and upstream work is what allowed the issues with passive updates being noticed in the first place
              • and we have no idea whatsoever how much work went into v2 sessions, only that zstandard was ready to go live 6 weeks before them, even though “a few months” were spent progressively rolling out client support for zstd. That casts doubt on v2 sessions being “a relatively simple design change”.
              1. 1

                Look, I’ll start by saying that it being funny in no way makes light of the effort. I think it’s good to experiment! It can still be amusing that a smaller change moved the needle just as much.

                Either way, you’re wrong here:

                • Your first point isn’t true. They didn’t have to set up this infrastructure for zstandard. Client Experiments have been a part of Discord’s infra platform for a long time now (you can even even tweak them yourself on the client), and they don’t say they had to set up the metrics observations just for this. Given their size and scale I’d be truly shocked to find out they weren’t measuring these at all before.

                • Your second point is also not true. The wording they use is:

                  the metrics that guided us during the dark launch phase of this project revealed a surprising behavior

                  Emphasis mine. That doesn’t mean the infra or upstream work here was required - it just means the zstandard work made them take a look at the metrics and figure out this was a problem. If they’d started from the metrics to begin with, whether or not they experimented with zstandard they’d have gotten there.

                • Your third point is your interpretation, and they themselves don’t agree:

                  While the passive sessions optimization was an unintended side-effect of the zstandard experimentation, it shows that with the right instrumentation and by looking at graphs with a critical eye, big savings can be achieved with a reasonable amount of effort.

                  Again, emphasis mine. “Reasonable amount of effort” suggests anything but work. When you compare this to what they had to do for zstandard, i.e. experimenting with it twice, once in 2019 and once in 2024; writing their own bindings for iOS; experimenting for months with and without dictionaries; tweaking the allocator etc, calling it a “relatively simple design change” is more than correct. That they waited 6 weeks to deploy it could be for any reason. Hell, I’d wait a bit before deploying a change like that especially if we were introducing a new compression system, otherwise we’d confound the metrics and make it hard to pinpoint the reason for the improvements.

                Once again, I’m not trying to beat down on you or make light of their effort, and I’m glad this blog post exists, but I’ll continue to find it funny that the passive sessions change was at least half of their improvement.

                1. 3

                  Your first point isn’t true. They didn’t have to set up this infrastructure for zstandard. Client Experiments

                  are irrelevant to this until rollout, per the article’s introduction:

                  Without this experiment, we would have to add zstandard support for our clients — desktop, iOS, and Android — which would require about a month’s lead time before we could fully determine the effects of zstandard.

                  and they don’t say they had to set up the metrics observations just for this

                  Yes they do:

                  We opted to do a “dark launch” of plain zstandard: the plan was to compress a small percentage of production traffic both with zlib and zstandard, collect a bunch of metrics, then discard the zstandard data.

                  Why would they need to collect these metrics if they already had those?

                  Your second point is also not true. The wording they use is:

                  the metrics that guided us during the dark launch phase of this project revealed a surprising behavior

                  See above, they collected metrics they didn’t collect previously for the purpose of the zstd experiment.

                  The entire dictionaries expriment is about type-specific metrics, which dovetail nicely into the observation that one type of message has a much larger presence than expected.

                  Again, emphasis mine. “Reasonable amount of effort” suggests anything but work.

                  That is an insane interpretation, “effort” and “work” are literally synonymous.

                  1. 1

                    Why would they need to collect these metrics if they already had those?

                    “collect a bunch of metrics” could equally mean that they had the metrics but weren’t looking at them in any kind of coherent way. Same way you could “collect your children” - that doesn’t imply in any way you created the children purely to perform this act of collection. The phrasing here is too ambiguous to know.

                    (But, on balance, I’d interpret it that they probably had to create a single metric for the zstd compression results in order to compare them to the zlib compression results, charitably assuming that they were already monitoring their zlib efficiency. Also I don’t think there’s nearly enough information or context in the article to argue either way as to how much work was actually done over what time period for which half for comparison purposes.)

                    1. -1

                      Obviously you’re taking this personally somehow so I’m not going to bother replying after this, but once again, you’re reading too far into things for no real reason.

                      Client Experiments are irrelevant to this until rollout, per the article’s introduction:
                      

                      Yes, but you mentioned that in your response:

                      much of the time went into setting up infrastructure, rolling things out without breaking everything, and evaluating the new option(s), much of that work is a long-term asset

                      That’s not work they had to do for this, it already existed, they’ve used it before.

                      Why would they need to collect these metrics if they already had those?

                      Easy: they’re using their existing measurements infrastructure to collect samples for a new experiment. Even assuming they had to add these specific metrics to their infra for this experiment (which they don’t say! so this is your interpretation!) that doesn’t mean they had to do any additional infra work.

                      That is an insane interpretation, “effort” and “work” are literally synonymous.

                      That was a typo on my behalf - I meant “lot of work”, but regardless, the point is that it is relatively less work (which is exactly what I said in my first post). You cannot deny that it they say it is more work than what they had to do for zstandard. There is no other way to interpret this.

                      Again, nowhere in the post do they explicitly say they had to set that up for the zstandard experiment, and nowhere do I say their effort is worthless - you are reading that into it and my post and attacking me for some reason.

          2. 5

            So much performance engineering goes into systems to measure experiments. Excellent post

            1. 2

              I would love to see how they actually roll out these tests in code.