Threads for 01walid

    1. 6

      I like to tinker with abstract concepts, for doing that, the underlying system is more often a hindrance. I don’t disagree with the general thrust of the article, as a full stack developer I have had many hours wasted because of insufficient systems knowledge and can also attribute any success I have to the ability to learn such systems. I just feel like there are plenty of exceptions where this does not apply. Sometimes I want to tinker with something abstract like a new general path-finding algorithm. Sometimes I just want to have a working application for desktop use that does what I want it to do. In such cases having an IDE that writes half the code for you, with a button that says ‘build’ and another that says ‘deploy’ is extremely valuable and having to learn additional systems knowledge is merely a hindrance. I am currently experimenting with Godot as an application development platform and I am loving it for exactly this reason. It is like a less stupid version of electron. It allows me to make desktop GUI applications that work on most devices and can be deployed as a web application without any knowledge of any of the underlying systems.

      In the end there is more to learn in the human sphere than one person can ever learn in a lifetime, what you should focus on depends on your goals. If you wish to be a better developer who is adaptable and competent and able to solve problems that other developers get stuck on then the advice in this article is excellent.

      1. 1

        That’s a good point. The author seems a bit biased toward tinkering with system-level *nix stuff, but the advice to favor tinkering over the ship-at-any-cost mentality/learning strategy I think works at any level of abstraction.

        1. 1

          Author here, yes I’m biased. But that was also an example from a personal experience, it’s not always about *nix or system-level stuff, it’s more about uncovering certain levels of abstraction and tinker with them.

          1. 1

            Yes of course, everyone is biased in one way or another. I didn’t mean that to be derisive. :)

    2. 12

      The post sorely missies a benchmark measuring actual performance difference. Given that message_is_compressed(message) is probably not message.startswith("{"), the actual JSON parsing should completely dominate the check, and I would be surprised if this indeed makes a measurable difference.

      1. 3

        True, I felt the same writing the post, since since it’s from a year ago, and the reason for blogging it was a friend’s chat I didn’t have the chance to measure it.

        Will try to measure it next time. But the point is -even though not measured- a string scan easily adds up in overhead over millions of messages. For ~0.02% of the cases?

        1. 4

          What do you mean by “a string scan”? What was the nature of the check to see if a message was compressed? Was it trialing an unpacking of the message?

          I ask because you’ve brought it up a lot of times, once in reference to checking a single char at the start of the string, and it feels like you’ve been bitten by something related hard enough that you’ve got a visceral reaction here, rather than measuring things out?

    3. 10

      Clever trick, but I think that down this path lies madness. The 256kb limit hasn’t gone away; it will just creep up on you when you least expect it. Suppose you switch compression algorithms to buy more time - now every message detected as base64 will have to rotate through a bunch of decompression algorithms to find one that generates reasonable-looking output. There goes your performance improvement.

      Some alternatives:

      1. Spin up an alternative SNS topic for compressed messages, and have different lambdas processing compressed/uncompressed data. Possibly causes a maintenance explosion if you need to have multiple consumers of both topic.
      2. Prefix your payloads with some metadata (at least a version number), in a way that’s fast to parse.
      3. Use a faster language if speed really is that important, and probably pick up safety guarantees too given recent advances in popular typesafe languages.
      4. Store large bodies elsewhere, like S3 or DynamoDB. Adds a lot more complexity to your messaging system.
      1. 8

        Devil’s advocate: why impose a maintenance or complexity burden up front? If you can use a good enough solution right now, you can attack the complexity later on when you have a better idea of how the problem is going to change. If it never comes up then you avoided over designing your system.

        That said, I don’t totally disagree with you.

        1. 10

          Alternative 2 is the best idea imo. Encoding the message “kind” in a header can be very cheap and very fast.

      2. 4

        We were already considering option #4, but as already mentioned by other comments, we didn’t want to add that complexity upfront at the moment.

      3. 3

        FWIW, I worked on a system dealing with SQS’s similar limit, and we’ve also dealt with compression and multiple formats. Observations, though I’m not sure they change anything for anyone:

        • We could be confident that compression does what we need because we know the kind of content we put in. For us it’s batches of similar items and we get much more than gzip’s typical ~3:1 for text or such.
        • We temporarily ran a fork of the system where individual items could be huge, and implemented option 4: a JSON struct in SQS with some metadata and a pointer to S3. (We also switched to zstd, which can better handle repetition across large items.) There was enough work to do on each batch that the S3 latency didn’t really matter.
        • Hacking in a second format without an explicit version number was easier in practice than in theory. We run the producer and the consumer. Our compressed messages predictably started with gzip’s magic bytes base64’d, and our JSON pointing to S3 with {. A msg.startswith is negligible even compared to just the SQS fetch, and we still have a lot of magic-number (or other format-distinguishing) space open.
        • SQS and SNS also both have message attributes you could use for version numbers, S3 pointers, etc. Seems like you’re essentially having your AWS library do a serialize/deserialize for you, with no advantage in terms of size limit or billing, but they still may be convenient.

        Enough of the code was the application (as opposed to the fetching/decoding) that I’m not 100% sure if we used msg.startswith or try/except and the difference is small compared to other work being done and other code.

    4. 2

      Most of the time I would chose clarity instead of performance, if I can’t have both, and IMHO the ifs version is clearer here. Moreover, in the case of a lambda function, I think you will spend way more time starting the python interpreter than computing an if statement !

      1. 6

        That happens mostly with cold starts, most of the time this lambda function is warm/hot, continuously processing a considerable load of messages. A string scan easily adds up in overhead over millions of messages. For ~0.02% of the cases?

        Readability counts, but it also a tradeable trade-off in such cases. And I don’t personally see a try/catch that terribly less readable.