1. 30

  2. 7

    Guido announced funding is coming from Microsoft. You can also check out the repo they will be working in.

    1. 5

      Meanwhile, PyPy is around 4x faster than CPython.

      1. 6

        Annecdote ain’t data, but I’ve never been successful at getting PyPy to provide improved performance. My use cases have been things like running tooling (Pylint is extremely slow under PyPy, much moreso than CPython), just running web apps, and a lot of other things that aren’t benchmarks.

        I don’t want to be too critical of PyPy, I imagine it gets a lot of what a lot of people want. But I don’t know what real workloads end up benefiting from it.

        1. 4

          PyPy upstream generally treats slowness as a bug and is willing to expend resources to fix it, if you’re willing to file issues with minimal test cases. (Here is a recent example bug about slowness.)

          Anecdotes aren’t data, but about a decade ago, I ported a Minecraft server from Numpy and CPython to array.array and PyPy, and at the time, I recorded a 60x speedup on a microbenchmark, and around a 20x speedup for typical gameplay interactions, resulting in a backend that spent most of its time sleeping and waiting for I/O.

          As long as we’re on the topic, it’s worth knowing that PyPy comes with a toolkit, RPython, which allows folks to generate their own JITs from Python. So, if one wanted more speed than was available with Python’s language design, then PyPy provides a route for forking the interpreter and standard library, and making arbitrarily distant departures from Python while still having high performance. For example, if we can agree that Dolphin implements “real workloads”, then PyGirl (code, paper) probably does as well.

          1. 3

            Yeah to me it helps to think of workloads in these categories (even if there are obviously way more than this, and way more dimensions)

            1. String / hash / object workloads (similar to web apps. Similar to a linter, and similar to Oil’s parser)
            2. Numeric workloads (what people write Cython extensions for; note that NumPy is written largely in Cython.)

            JITs are a lot better at the second type of workload than the first. My experience matches yours – when I tried running Oil with PyPy, it was slower and used more memory, not faster.

            Also, I think that workload 1 is the more important one for Python. If I want to write fast numeric code, it’s not painful to do in C++. On the other hand, doing string/hash/object graph workloads in C++ is very painful. It’s also less-than-great in Rust, particularly graphs.

            So while I think PyPy is an astonishing project (and that impression grows after learning more about how it works), I also think it doesn’t speed up the most important workloads in Python. Not that I think any other effort will do so – the problems are pretty fundamental and there have been a couple decades of attempts.

            (In contrast I got much better performance results adding static types manually, and semi-automatically translating Oil to C++. This is not a general solution as its labor intensive and restricts the language, although there are some other benefits to that.)

            1. 1

              I see the outline of your point, but I’m not sure on the specifics. In particular, a mechanism is missing: What makes strings, dictionaries, and user-customized classes inherently hard to JIT, particularly with a PyPy-style tracing metainterpreter?

              Edit: Discussion in #pypy on Freenode yielded the insight that CPUs have trouble with anything which is not in their own list of primitive types, requiring composite operations for composite types. Since JITs compile to CPU instructions, they must struggle with instruction selection for composite types. A lesson for language designers is to look for opportunities to provide new primitive object implementations, using the CPU’s existing types in novel ways.

              Our experience in the Monte world is that our RPython-generated JIT successfully speeds up workloads like parsing and compiling Monte modules to bytecode, a task which is string- and map-heavy. Our string and map objects are immutable, and this helps the JIT remove work.

              1. 1

                Yes the JITs do a lot better on integers and floats because they’re machine types.

                The performance of strings and hash tables is sort of “one level up”, and the JITs don’t seem to help much at that level (and for some reason lots of people seem to misunderstand this.)

                As an anecdote, when Go was released, there were some benchmarks where it was slower than Python, just because Python’s hash tables were more optimized. And obviously Go is compiled and Python is interpreted, but that was still true. So that is a similar issue.

                So there are many dimensions to performance, and many workloads. Saying “4x faster” is doing violence to reality. In some cases it’s the difference between being able to use PyPy and not being able to use it.

              2. 1

                SciPy has some cython code along with a bunch of fortran code but NumPy is all C.

                1. 1

                  Ah sorry you are right, I think I was remembering Pandas, which has a lot of Cython in its core:


                2. 1

                  cython is also a translator to C. why didn’t you use cython for oil?

                  1. 1

                    It generates code that depends on the Python runtime, and Cython is a different language than statically-typed Python. I don’t want to be locked into the former, and translating the code is probably even more labor intensive than what I’m doing (I leveraged MyPy team work on automatic type annotation etc.). It also wouldn’t be fast enough as far as I can tell.

                3. 3

                  pypy is 4x faster…. for long-running tasks that allow the jit to warm up. Lots of python workloads (e.g. pylint) run the interpreter as a one-off so pypy won’t help there. Interpreter startup speed is also critical for one-off workflows and pypy isn’t optimized for that either.

                  1. 3

                    I think it’s more like 10x-100x faster OR 10% slower for different workloads – “4x” doesn’t really capture it. See my sibling comment about string/hash/object vs. numeric workloads.

                  2. 2

                    I used PyPy recently, for the first time and I had a nice experience. I am experimenting with SQLite and trying to figure out the fast ways to insert 1B rows. My CPython version was able to insert 100M rows in 500 is seconds, same in PyPy took 150 seconds.

                    The best part was, I did not have to change anything in my original code. It was just drop in, as advertised. Ran it with PyPy and got the speed bumps.

                  3. 2

                    Specifically, we want to achieve these performance goals with CPython to benefit all users of Python including those unable to use PyPy or other alternative virtual machines.

                    1. 1

                      Apparently the goal is a 2x speed up by 3.11 and a 5x speed up in 4 years.

                      1. 4

                        Yes. Assuming that those numbers are not exaggerated, I expect that PyPy will still be faster than CPython year after year. The reasoning is due to the underlying principle that most improvements to CPython can be ported to PyPy since they have similar internal structure.

                        In GvR’s slides, they say that they “can’t change base layout, object layout”. This is the only part of PyPy’s interpreter which is structurally different from CPython. The same slide lists components which PyPy derived directly from CPython: the bytecode, the stack frames, the bytecode compiler, and bytecode interpreter.

                        Specializing bytecode has been tried for Python before; I recall a paper which monomorphized integers and other common builtin types. These approaches tend to fail unless they can remove some interpretative overhead. I expect that a more useful product of this effort will be a better memory model and simpler bytecodes, rather than Shannon’s grand explosion of possible bytecode arrangements.

                        1. 1

                          I’m curious about mypyc personally. Seems to me like (c)python is just hard to optimize and depends too much on implementation details (the C API) to be changed; to get a significant leap in performance it seems like using a statically typed, less dynamic subset, would give significantly higher speedups. Of course the downside is that it doesn’t work for old code (unless it happens to be in this fragment).

                          1. 1

                            Monomorphizing code does not always speed it up. There are times when tags/types can be checked for free, thanks to the dominating effects of cache thrashing, and so the cost of dynamically-typed and statically-typed traversals ends up being similar.

                            It’s not an accident that some half-dozen attempts to monomorphize CPython internals have failed, while PyPy’s tracing JIT is generally effective. Monomorphization can remove inner-interpreter work, but not interpretative overhead.

                            1. 2

                              Well by “less dynamic” I also mean not having a dictionary per class and this kind of stuff :-). I should have been clearer. tag checks is one thing, but performing dictionary lookups all the time to resolve identifiers or fields is also very heavy. The statically typed aspect, I have no idea if it’s truly necessary, but it’d make it easier to implement, right?