1. 46
    1. 19

      Oh hello. Looks like someone posted my article. I’d love to hear your feedback!

      1. 3

        The writing is great, I have no complaints. 😎👍👍

        One small CSS thing: on my Android phone the numbers next to your footnotes disappear into the left side of the screen.

        I have more questions that would make sense for another post. What I’m not curious about is the state of the AOT-friendly dialects like Static Python:

        • do any of them let you largely eliminate Python’s slow start up times?
        • how is the runtime perf, anyway?
        1. 2

          I think I fixed the footnotes?

          1. 2

            Looks better to me. (Pixel 4a)

            1. 2

              Yep. Thank you.

            2. 1

              Thank you!

              Yeah, the footnotes thing happens for me too and I am not enough of a wizard to fix it. Maybe I will get bored later and deep dive…

              Do any of them let you largely eliminate Python’s slow start up times?

              Sort of. Maybe. A lot of the slow runtime is importing modules, so if you still have a bunch of Python code in your project that does not get compiled, or your C extension import still takes awhile, no. But I think Nuitka is supposed to improve startup quite a bit, and any tool that does a lot of partial evaluation for initialization like SubstrateVM (which does not work on Python (yet?)) should help.

              how is the runtime perf, anyway?

              Pretty darn good. Instagram replaced almost all of their Cython with Static Python and it helped performance for two reasons. 1) The team has control over the JIT and SP semantics and type knowledge flows into the JIT 2) there is no more big FFI crossing for each call to specialized/compiled code. It’s all just JITed and maybe even inlined.

              All of these compilers will get you good runtime perf if you pick the compiler that best suits your needs. That’s a topic for another blog post :)

              1. 3

                One neat trick pyoxidizer uses that allows it to generate static executables is to encode the content of pure python code into the data segment of the executable. One side-effect is that startup is very fast because there’s no need to hit the filesystem for imports.

                1. 2

                  Oh yes this is such a good thing to mention. Thank you!

            3. 2

              This post was amazing — thank you!

              1. 1

                Yay, I’m glad you liked ^_^ Anything you would change?

                1. 2

                  There are not enough posts like this. More — that is all I would change 🍻

                  Separately I wonder about the parallels between what Python is going through now and what Lisp went through with safety/performance declamations. Seems to me Python is in a worse position actually because it’s so dynamic.

                  1. 2

                    I’m flattered. Thank you. I have other stuff in the pipeline but it’s notably more on the technical side, like doing abstract interpretation over Python bytecode.

                    I can’t much speak to Lisp but I can confirm that Python is uniquely tricky, even compared to JS.

                    EDIT: Eh, I guess I have two other “big picture”/less-gritty things I want to write about: 1) what I want to see in VM observability and performance engineering UX 2) a demonstration of turning an interpreter into a compiler to make people re-think where their “line” is

            4. 8

              While this article is of course right about stuff like PyLong_Add, you can still make a lot of improvements in certain ways.

              mypyc compiles your code down to CPython stuff like PyLong_Add, essentially turning your Python code into CPython API calls that still support all of this dynamism. But because it’s generating C code, suddenly you don’t have to deal with things like reference counting on locals. And with the types (and some type guards of course), you can totally get very very good results.

              From the mypyc docs:

              Existing code with type annotations is often 1.5x to 5x faster when compiled. Code tuned for mypyc can be 5x to 10x faster.

              Mypyc is one of those projects that almost feels like a silver bullet. You get all of the dynamism you want, don’t need a lot of ceremony for most of your code, and you can easily upgrade your hot code into code that plays well with this compiler.

              It’s not magic, but seeing results from mypyc made me convinced that there’s a lot of advantages to be had from the low hanging fruit that is “execute a fixed body of code with some assignments and simple conditionals”.

              EDIT: here is a good overview of what the compiled classes can’t do, which is pretty close to a set of things that tend to not be done in perf-critical code. But you’re mostly within the space of “I can write code for CPython and code that works in mypyc”

              1. 3

                Yes! I hope I wasn’t too doom-and-gloom in the post. You get a lot of mileage out of compiling! But people expect a pretty broad array of optimizations that just don’t happen. For example, the runtime doesn’t just magically disappear and your numbers don’t magically become machine words. Mypyc and SP are very similar and there’s a world in which we might have just stuck with mypyc had it started earlier. The other thing SP does that mypyc doesn’t, though, is skip the AOT compile step. Part of the reason for SP compiling to bytecode was developer friction with Cython being Python->C->native.

                1. 4

                  your numbers don’t magically become machine words

                  I’ll confess I assumed that this would happen when Python added optional type annotations! I did not know much about how Python worked under the hood at the time. The reason I assumed it would happen is that the first place I’d seen optional type annotations is in Common Lisp, and that is one of the common uses of type annotations there – with some fixnum declarations sprinkled around, SBCL often manages to stick the numbers in machine registers, open-code the arithmetic, etc., and people do this when optimizing numerical loops.

                  In trying to think of why the issues in the post don’t apply to Common Lisp, one difference is that, while CL has all the late-binding OOP wizardry you could possibly want, it’s a separate opt-in facility (CLOS) rather than the default. And, importantly, the standard library has not opted in. Some Lispers consider this a language wart (why is + polymorphic but only to a fixed, non-user-extendable set of types?), but it does avoid the issue you walk through in Python, where you have to consider the possibility that the user has subclassed int and written their own __add__ that you need to dispatch to.

                  Since you seem pretty knowledgeable about Lisp implementation, I wonder if you had more thoughts on the differences here.

                  1. 2

                    with some fixnum declarations sprinkled around, SBCL often manages to stick the numbers in machine registers, open-code the arithmetic, etc., and people do this when optimizing numerical loops.

                    In SP and I think soon mypyc you can also do this by specifying a machine integer type (int8, for example) that is not the bigint type int.

                    And, importantly, the standard library has not opted in.

                    Yep – have the packages opted in? What do dependency trees look like—comparable to Python or JS?

                    Since you seem pretty knowledgeable about Lisp implementation,

                    Haha, I am only on the periphery of Lisp knowledge. I think Python is more dynamic than people think (even more than CL, JS, Lua, etc people realize) and unusually C-API constrained. Those are probably the main differences.

                2. 3

                  mypyc has started adding support for native integer, float, and other types: https://mypyc.readthedocs.io/en/latest/int_operations.html, https://mypyc.readthedocs.io/en/latest/float_operations.html and the other related documents.

                  1. 1

                    Oh very cool. I did not know they used tagged pointers by default!

                3. 2

                  As usually, my typing.Any is ready.

                  1. 2

                    Related: Oils could hit most of these issues in theory, but we took a shortcut and semi-automatically translated the code to C++, with a bunch of rewrites and codegen to replace Python reflection:

                    Brief Descriptions of a Python to C++ Translator - closest projects are mypyc and Shed Skin

                    The “trick” is that Python just looks like a dynamic C++ :) They are surprisingly similar. It would be harder to translate Python to say Rust, and more challenging to translate it to Zig. So in that sense it’s kinda of a happy accident.

                    Though it was pretty strongly inspired by Shed Skin – it also uses that “trick” or “hack”, unlike mypyc


                    The key difference is whether you’re trying to speed up all Python code, or just our 40K lines of code. We limit it to the latter, and get some huge speedups, e.g. :

                    https://www.oilshell.org/release/0.16.0/benchmarks.wwz/mycpp-examples/

                    It also helps to have a lot of tests, so you know that the translation hasn’t broken the program!

                    1. 1

                      Speaking of compiling typed Python, the “lpython” project does that. https://github.com/lcompilers/lpython

                      1. 2

                        lpython is indeed mentioned in the post :)

                        1. 2

                          Yup! That’s what I get for skimming.

                        1. 2

                          My apologies…I swear I looked at the front-page, but I don’t know how I didn’t see the other instance.

                          1. 1

                            No reason to apologize! Just gardening