To save people misunderstanding from just the title: this proposal would not remove or turn off the GIL by default. It would not let you selectively enable/remove the GIL. It would be a compile-time flag you could set when building a Python interpreter from source, and if used would cause some deeply invasive changes to the way the interpreter is built and run, which the PEP goes over in detail.
It also would mean that if you use any package with compiled extensions, you would need to obtain or build a version compiled specifically against the (different) ABI of a Python interpreter that was compiled without the GIL. And, as expected, the prototype is already a significant (~10%) performance regression on single-threaded code.
It’s pretty clear this is for scientific computing applications and no doubt the expectation is that major scientific libraries will distribute no gil versions. I suspect that users who fall within the motivating use case will take the slow down in Python execution in order to benefit from the ability to exchange data in process and use multiple cores.
The ABI break is a pretty big burden. It being a build option is certainly good but any org that decides to onboard will have a lot of work on their hands
Frankly, this sounds like a tremendously bad idea. Not only because it would split the ecosystem, but also because python libraries are almost all thread-unsafe even with the GIL. Just a couple months ago I ran into a deadlock that was caused by the requests library, and had to drop down to using urllib3 directly.
Writing robust multi-threaded code is hard enough for seasoned C/C++ veterans.
Why do you say that? From my, perhaps naive, perspective, seasoned C/C++ developers typically have significant experience writing multi-threaded code. I agree that having that experience doesn’t mean they’re able to write robust multi-threaded code, but that was kinda my point.
It just doesn’t follow. “Writing C/C++” and “Writing multithreaded software” are orthogonal.
It’s similar to saying Python developers always have significant database experience. Or that Javascript developers always have significant experience with React. Maybe true in some cases, but it’s not generally correllated at all.
There are probably more C++ devs with multithreaded experience than Python multithreaded developers (because the GIL makes that almost impossible), but I wouldn’t trust an average C++ programmer to know any more about multithreading than anybody else.
The changes aren’t for general consumption, though some of them, such as biased reference counting, could probably be a boon to Python both with and without the GIL, after the root causes of the 10% slow down are resolved.
I see this as a stalking horse for a possible future PEP that introduces an improved C API, possibly along the lines of that of Lua, which would be a good thing.
I have to wonder why people are choosing Python for this type of work. The GIL’s been a known problem for a long time, and if a workload is highly parallel or concurrent then Python’s probably not a great choice.
I guess if Facebook is paying developers to fix it, then why not? I personally think “machine learning models” are lame use case, but if it gets the problem fixed…
Python has a great ecosystem for machine learning and lots of data science stuff. The mature alternatives (basically just R?) are also slow. There are faster languages but it takes a hell of a lot of effort to build the libraries, documentation and training resources.
To save people misunderstanding from just the title: this proposal would not remove or turn off the GIL by default. It would not let you selectively enable/remove the GIL. It would be a compile-time flag you could set when building a Python interpreter from source, and if used would cause some deeply invasive changes to the way the interpreter is built and run, which the PEP goes over in detail.
It also would mean that if you use any package with compiled extensions, you would need to obtain or build a version compiled specifically against the (different) ABI of a Python interpreter that was compiled without the GIL. And, as expected, the prototype is already a significant (~10%) performance regression on single-threaded code.
It’s pretty clear this is for scientific computing applications and no doubt the expectation is that major scientific libraries will distribute no gil versions. I suspect that users who fall within the motivating use case will take the slow down in Python execution in order to benefit from the ability to exchange data in process and use multiple cores.
The ABI break is a pretty big burden. It being a build option is certainly good but any org that decides to onboard will have a lot of work on their hands
Frankly, this sounds like a tremendously bad idea. Not only because it would split the ecosystem, but also because python libraries are almost all thread-unsafe even with the GIL. Just a couple months ago I ran into a deadlock that was caused by the
requests
library, and had to drop down to using urllib3 directly.Writing robust multi-threaded code is hard enough for seasoned C/C++ veterans.
This wouldn’t add any additional difficulties to the Python side, though.
I agree with everything but your last sentence.
Being a “seasoned C/C++ veteran” has little to do with writing robust multi-threaded code.
Why do you say that? From my, perhaps naive, perspective, seasoned C/C++ developers typically have significant experience writing multi-threaded code. I agree that having that experience doesn’t mean they’re able to write robust multi-threaded code, but that was kinda my point.
It just doesn’t follow. “Writing C/C++” and “Writing multithreaded software” are orthogonal.
It’s similar to saying Python developers always have significant database experience. Or that Javascript developers always have significant experience with React. Maybe true in some cases, but it’s not generally correllated at all.
There are probably more C++ devs with multithreaded experience than Python multithreaded developers (because the GIL makes that almost impossible), but I wouldn’t trust an average C++ programmer to know any more about multithreading than anybody else.
A while ago I ran some experiments with Sam Gross’s no-GIL branch of Python and got some stunning results in terms of increased performance for threaded code in a pretty complex application. My notes here: https://github.com/simonw/datasette/issues/1727#issuecomment-1112889800
Hopefully this gets denied.
The changes aren’t for general consumption, though some of them, such as biased reference counting, could probably be a boon to Python both with and without the GIL, after the root causes of the 10% slow down are resolved.
I see this as a stalking horse for a possible future PEP that introduces an improved C API, possibly along the lines of that of Lua, which would be a good thing.
I have to wonder why people are choosing Python for this type of work. The GIL’s been a known problem for a long time, and if a workload is highly parallel or concurrent then Python’s probably not a great choice.
I guess if Facebook is paying developers to fix it, then why not? I personally think “machine learning models” are lame use case, but if it gets the problem fixed…
Python has a great ecosystem for machine learning and lots of data science stuff. The mature alternatives (basically just R?) are also slow. There are faster languages but it takes a hell of a lot of effort to build the libraries, documentation and training resources.