1. 3

    I’m pretty sure Kernighan does a toy make in awk in this classic book:

    https://www.amazon.com/AWK-Programming-Language-Alfred-Aho/dp/020107981X

    (maybe you can find it at the library, etc.)

    I transcribed it into Python many years ago (and thus learned a bunch of awk).

    1. 2

      Very nice. Looks like internet archive has several copies:

      https://archive.org/details/pdfy-MgN0H1joIoDVoIC7

      1. 2

        Oh cool… I definitely recommend this book, it’s thin and packed with information. One of the few programming books I can say I’ve read more or less cover to cover. Although it’s aged, Awk is still useful, and it’s one of the few languages that fits in your head!

    1. 2

      Are there similar methods or simd tricks for multi-way merges? I have some 32-way merges that take long enough I’ve spent lots of time wishing were faster, but haven’t yet found any articles on making them faster. :)

      1. 2

        Good question! I have no idea but I did find this in google, which is set intersection: http://www.adms-conf.org/p1-SCHLEGEL.pdf

      1. 1

        I would add some examples for each defined term (at the end of each definition). I always get tripped up by this vocab when reading about parsing, so every little hint helps.

        1. 1

          Thanks!, I will do that.

        1. 1

          Did you have any batching requirements (primarily to cut down on radio time), or does it always send updates in ~real-time?

          1. 2

            If Mobile Data and WiFi are not available, the app stores the data and uploads it after it gets the connectivity. That’s why the timestamp comes from a query parameter instead of the server timestamp.

          1. 16

            Hmm. I can’t help but wonder why virtualenv + Docker is necessary in 95% of cases? Why not just install the requirements globally to the Python install… given, you’re in a container and running likely only 1 app… ?

            1. 7
              1. System python packages might occasionally conflict with packages in your virtualenv (see https://hynek.me/articles/virtualenv-lives/).

              2. For multi-stage Docker builds, where you have compiler etc. in first stage, and then copy over compiled code (Python C extensions etc.) into second stage image that doesn’t have gcc etc so gives you a smaller image. In this case, installing directly with pip means some files end up in /usr/bin, others in site-packages, so it’s hard to copy everything over. Virtualenv solves that since everything is one directory.

              1. 1

                This would be great context to add to the top of the post!

                1. 1

                  Yeah. People asked this a lot, so going to write another article about that specifically and will then link to it from this article.

              2. 6

                … and why not using the already great Python images: https://docs.docker.com/samples/library/python/ ?

                1. 3

                  Or on top of that, why bother activating at all? You can always just give the full path to your virtualenv python binary and it’ll know where everything else is.

                  1. 3

                    As I discuss in the article, this is definitely an option. However:

                    1. This is repetitive, so it’s easy to forget when you add the 10th call to Python, in cases where you have complex setup.
                    2. It doesn’t affect Python subprocesses, which some programs will run.

                    The proposed solution suffers from neither problem.

                1. 1

                  This is great. Are there any similar docs for Presto or Spark? Or docs that compare trade-offs between them?

                  1. 2

                    The video presentation of the paper, is excellent:

                    https://www.youtube.com/watch?v=yL_-1d9OSdk