Why a new implementation instead of PyPy:
FYI, GvR is currently working at Dropbox. GitHub repo.
I wish it targeted v3.x and not v2.7 though.
For me the biggest draw is their intention of making C extension modules just work. Pypy has slowly started getting C extensions working, but even when it works it’s slower than CPython (due to the compatibility layer involved). Having a drop-in JIT'ing replacement that still works with the likes of numpy/scipy is very exciting to me.
I know people like to rag on the GIL and python, but I honestly don’t think it’s that big of a deal, and they don’t really have any concrete plan on how to get rid of the GIL (the README at least outlines a plan of approach for C extensions, it says nothing about how they plan on getting rid of the GIL).
Obviously it’s liable to change, but listing GIL removal as a primary goal is a good sign. There are plenty of CPU bound Python programs that can’t be solved with multiprocessing, either at work or in my personal projects. Gearman or Celery is overkill for these projects.
Long ago, my experience with rubinius was that llvm backed jits take a long time to start. Faster than jvm, but way slower than the C interpreter. One of the reasons I love luajit is I don’t have to guess whether my program is going to run for more than ten milliseconds and then choose my interpreter based on that.
This feels like DropBox really sticking it to the Python community, PyPy in particular. They have a lot of lofty goals but there is zero indication they know how to achieve any of them or that any of what they want to do is necessary. Not sure why they just didn’t become major contributors to PyPy for awhile until it became clear PyPy couldn’t accomplish what they wanted.
I was always under the impression pypy was strictly a research project
The difference between a research project and a product is money.
Really interesting and I’m glad that there now are other possibilities instead of only PyPy.
I wonder why they chose to target Python2 instead of Python3 or both versions. In theory, if Python3 will be successful and take off, by the time Pyston will be production-ready, Python2 will be obsolete.
I wonder why people repeatedly fail to implement a production-ready runtime for Python.
Is there some magic bit, which makes Python incredibly more difficult to support, compared to comparable languages?
Or is there a lack of technical competence in the Python community to get things done? (The best Python hackers might not be the best C/C++/ASM hackers…)
What exactly do you mean by ‘production-ready runtime’? Many people are running CPython in production, of course. If you mean “Why is it so hard to implement a performant Python?” then I think there are a few factors at play:
For most of Python’s life, CPU’s have been getting faster quicker than people have had serious performance issues with Python.
Python has made it fairly easy to drop to C for performance critical things.
The semantics of Python make it difficult to provide good performance without a lot of work, specifically: pretty much anything can change at any time, and the semantics of collections are guaranteed to be thread safe making getting rid of the GIL somewhat difficult.
Sorry for the late reply.
I’m not really familiar with how the Python community deals with issues, but I would have kind of expected that people would have sat down in the last 10 years and discussed how issues can be solved in a standardized way, e. g.
Maybe I’m just wrong, but from my perspective not enough happened here. The concurrency story is basically non-existing, portability is a joke and people seem to have given up on the GIL completely.
Fixing those things would be what I consider production-ready. No magic, just what has been supported elsewhere for the last 4 decades.
The only thing I hear from new/independent implementations of Python is basically that yet another one is almost dying.
Can share your insights here?
I’m not really familiar with how the Python community deals with issues, but I would have kind of expected that people would have sat down in the last 10 years and discussed how issues can be solved in a standardized way
Given what the breakdown of the Python community is, performant computing has been a problem of a small fraction of the community for the most part and they have been able to solve it on their own rather than affecting the entire community.
Having some PEP which specifies Python’s memory model with consideration of today’s concurrency/multi-threading requirements and establishing clear requirements regarding FFI and C code
Can you name one language that has done this successfully? The memory model of C++ is nearly impossible to understand. C has a minefield of undefined behaviour even without multithreading. Java’s memory model requires some intricate knowledge to do it properly. Languages with reasonable multithreaded memory model (Erlang, Oz) are either completely incompatible with Python or slow too. Python’s memory model is fairly simple to understand, does not contain undefined behaviour, but unfortunately requires a very slow implementation. On top of that, changing the memory model is a backwards compatibility breaking change that would break a lot of code, for a problem that most Python developers don’t have (or don’t realize they have).
Having some discussion about how to improve the language to make it possible to write high-performance Python code without having to circumvent the language with non-standardized approaches.
These discussions have gone on before. Result: it’s really hard to do it. And not many people have a performance problem.
Having some discussion about how to solve the GIL issue
There has been a ton of discussion on this. The result: it’s hard without breaking a lot of things.
The concurrency story is basically non-existing,
There are multiple concurrency solutions out there: Tornado, Twisted, eventlet, gevent. All with pros and cons. But what languages have solved the concurrency story in a way you like? C++ has zero concurrency primitives (depending on your definition of concurrency), Java as well. C# has await/async which is decent but has some drawbacks. Ruby has equally good/bad concurrency support to Python.
portability is a joke
I can’t speak to portability, it has not been an issue for me.
and people seem to have given up on the GIL completely.
I don’t know what this means. The GIL is mostly a fact of life, not something you give up on. People have complained about it for ages though and there have been multiple attempts to remove it, none of them having too much success given the drawbacks.
You’ll have to back up this claim. Mainstream languages have zero high quality concurrency support. Mainstream languages have a very hard to understand memory model in the face of multi threading. Your definition of ‘production ready’ is very biased towards a reality that I don’t think exists. And the fact is: tons of people are running Python in production.
New/independent implementations of C are almost always dying too.
I’m getting the impression here that you’re getting quite defensive about this topic now … anyway, looks like my first comment was more accurate than I thought (regarding the lack of competence/interest in the Python community to tackle these issues).
Thanks, I learned a lot!
I think that is a rather unfair characterization of my response. I provided both technical and social reasons why the CPython runtime is in its current state. I also pointed out that CPython is production-ready for a large number of users. I also challenged you to provide an example of a language which Python should emulate. Your only contribution back has been that I am getting defensive.