1. 31
  1. 15

    I’m reminded of a conversation I had at work one day about scalable infrastructure. Co-worker was talking about “big data” and I asked: how big is a big database? He said “idk maybe gigabytes”.

    I wonder how much of the big cloud company’s sales are based on misconceptions like this.

    1. 13

      The joke I always heard is that data is big when you can’t load it in Excel :-)

      One of my coworkers got me with this early in my career. He was a senior developer and I the junior. I had a task to do very custom data transformation. He managed to stop me before I had a whole mapreduce cluster set up – then showed me how grep, awk, and a little Python glue would solve this on my laptop. I asked him what would happen if the data grew by a factor of 100? He told me to go type man split and learn something.

      Since then I think a lot about Gunpei Yokoi, Nintendo, and their philosophy of “lateral thinking with withered technologies” – and how it often isn’t the tools we need to worry about.

      1. 3

        The joke I always heard is that data is big when you can’t load it in Excel :-)

        I’ve heard the joke, but it dramatically underestimates the willingness of Excel users to find ways to scale… 😉

        1. 1

          Yeah, it’s getting really hard to find data big enough to actually need a distributed system for processing. But serving systems are a slightly different use-case because you want redundancy for availability even with very low traffic so you need at least 3 machines that are not all on the same network endpoint.

      2. 6

        It is very cool to see this work, and I think it is interesting that she has (a) proved her point versus interpreted languages (this is a bigger number) and (b) is simultaneously attracting comments of “but of course it can be done better in X way” (this is a low number).

        The described system (doing actual transport and message serialisation, doing some real work per request) seems a good approximation to real load to me. My only comment would be that it sounds like the client maintains persistent connections, so it isn’t measuring the connection setup/teardown costs.

        The C10K article was 1999 - https://en.wikipedia.org/wiki/C10k_problem. By the time of the 2011 hardware, it was the C10M problem, so it does seem that other approaches might give higher numbers.

        It would be a fun kind of golf to have the protocol and server as a spec to see how different approaches/languages could compare. (The random number generator could perhaps be based on a seed read from the request so the response becomes deterministic and a reference client could check for accuracy).

        Additionally, having a running system like this and asking candidates to identify and optimise bottlenecks would be a fantastic devops interview flow :-)

        1. 1

          Related, I’ve always found the whatsapp numbers per server to be super impressive. Here’s 2M connections per machine back in 2012: https://blog.whatsapp.com/1-million-is-so-2011

        2. 6

          Great blog post. I think a lot of people really underestimate sometimes just how fast computers are when we don’t get in their way. I recall working in Node doing some async streaming stuff, and the amount of connections our node servers were handling with zero optimizations was pretty surprising.

          1. 4

            I can’t speak to Rachel’s intent, but the phrase that comes to mind is “easy things should be easy”.

            This post seems to be saying “10k connections is easy if you don’t shoot yourself in the foot with your language/framework.” That 10M connections is possible (but perhaps not easy) is beside the point. Whether threads are the optimal choice is also beside the point.

            From that perspective, even the details of the code are (arguably) beside the point. If the claim is “this can be done with bog-standard techniques”, who cares what her code looks like?

            1. 4

              The author starts off by offering to share some knowledge to newer people but then more or less shares a story, with some benchmark numbers, which I think are from wrk. (Or maybe some other tool?)

              This would be a lot more interesting if there was actual code. Otherwise, it’s frustrating because it’s just story time. Stories are fine but sometimes to really understand what’s going on it’s helpful to see code.

              1. 14

                I think if she shared code it would just turn into a nit pick festival.

                The point is just to explain in general terms that machines can handle a lot of connections with fairly low effort.

                Many of her stories feature situations with lazy or unsmart devs torturing machines, maybe hinting that just a bit of planning and testing your code will get you pretty fair.

                1. 3

                  I think if she shared code it would just turn into a nit pick festival.

                  I kinda disagree. She had enough disclaimers pointing out that the code was thrown together without a large amount of careful thought. I also think it would be more interesting if there were code provided, not even to verify her claims, just to learn what kind of code causes that kind of performance beyond what she describes.

                2. 4

                  Not the author of the post, but at work, I wrote a bit more capable server than Rachel’s in Lua based around an event loop framework. The sockets are UDP based, so it’s a bit easier on the networking side to deal with, but we’re processing millions of SIP messages per day with it. Not bad for something that I intended to be a “proof-of-concept” (and was put into production without my knowledge) about five years ago and only last year did I actually profile the code.

                  So yeah, I think computers are more powerful than we realize.

                  1. 1

                    This would be a lot more interesting if there was actual code. Otherwise, it’s frustrating because it’s just story time.

                    Given that people have demonstrated again and again that you can get this kind of performance out of a server, I don’t think proof is necessary.

                    (On the other hand, I’m not sure what she’s trying to illustrate that couldn’t have just been “hey look at all these other tests people have done”.)

                  2. 3

                    Only 10 000 connections ? A 2011 box isn’t that old, it should handle way more than that, and at sub-millisecond latencies. This article clearly demonstrates the overhead of having so many threads running. Just don’t do that.

                    1. 2

                      congrats, you’ve met, but not exceeded, the 21yr old C10k problem - https://en.wikipedia.org/wiki/C10k_problem - on 10yr old hardware.

                      I guess I agree with the author that it is very sad if most people don’t know that this is what is possible. This is what has been possible for 21yrs.