Python, which predates Java, originally did not have threading, because threading didn’t become super popular until Java (which was originally designed to run on systems without true process-based multitasking) became popular.
So Python was forced to adopt threading in a bit of a rush, and faced the problem that many popular “Python” libraries in that era (multiple decades ago) were actually wrappers around C libraries, and were almost certainly not thread-safe. The compromise solution was that any thread which wants to execute Python bytecode, or call the Python interpreter’s C API, must obtain a lock to do so (the Global Interpreter Lock, or GIL). Thus, only one thread at a time can be executing bytecode or calling the interpreter API.
At the time this was a not-unreasonable compromise, since most people who wanted to do threading wanted it for things like network daemons which have most of their threads spend most of their time waiting on I/O (so you don’t feel the lock contention as much), and hardly anybody actually had multiprocessor or multicore hardware capable of truly executing multiple threads at the same time.
Now, of course, we all have multiprocessor/multicore hardware, and many people use Python (or Python wrappers around libraries in other languages) for CPU-bound number-crunching tasks. So there’s pressure to remove the GIL. But it’s always been understood that the cost of doing so would be a hit to single-threaded performance of the Python interpreter, since the extra work required to keep the interpreter thread-safe does not come for free.
And so you have two camps: one which wants the GIL gone because they believe the benefits (free threading) will outweigh the single-threaded performance costs; and another which is skeptical of that tradeoff because most Python programs are and almost certainly would continue to be single-threaded, and people who use those programs probably will not be happy at a 15-20% performance drop, especially if it comes at a time when Python was finally starting to make significant performance gains.
There was a question from Shannon about ““what people think is a acceptable slowdown for single-threaded code””. To a large extent, that question went unanswered in the thread, but he had estimated an impact ““in the 15-20% range, but it could be more, depending on the impact on PEP 659””.
15-20% is a lot. That’s definitely in the “noticeable even without benchmarking” category.
This feels like the sticking point for me personally. Given that I use Python mostly for scripts and the occasional webserver, it:
doesn’t feel like there’s a lot of upside to be had for me by removing GIL, and
my code gets noticeably slower
So while it will (?) benefit the ecosystem and language as a whole, it makes my personal use of Python worse, so I’m not sure why I’d support the initiative.
Python isn’t a new interpreter starting with a blank slate. How would you propose making an interpreter that supports GIL and NoGIL code running in the same process, both Python code and C extensions? It’s not an easy problem to solve.
Multiprocessing with Python quickly leads to memory issues, as each process will take a significant amount of memory. This is partly due to how hard it is for stuff like code objects to end up being shared.
If I remember correctly from articles and blog posts (eg. on the PyPy blog) that I read during previous attempts, another non-trivial contributing factor is the particular ways in which Python approached metaprogramming, leading to a ridiculous number of places where it’s difficult to avoid data races and/or undefined behaviour in a multi-threaded environment without wrapping locking around runtime lookups within the interpreter.
Why is multi processing so controversial in 2023?
Python, which predates Java, originally did not have threading, because threading didn’t become super popular until Java (which was originally designed to run on systems without true process-based multitasking) became popular.
So Python was forced to adopt threading in a bit of a rush, and faced the problem that many popular “Python” libraries in that era (multiple decades ago) were actually wrappers around C libraries, and were almost certainly not thread-safe. The compromise solution was that any thread which wants to execute Python bytecode, or call the Python interpreter’s C API, must obtain a lock to do so (the Global Interpreter Lock, or GIL). Thus, only one thread at a time can be executing bytecode or calling the interpreter API.
At the time this was a not-unreasonable compromise, since most people who wanted to do threading wanted it for things like network daemons which have most of their threads spend most of their time waiting on I/O (so you don’t feel the lock contention as much), and hardly anybody actually had multiprocessor or multicore hardware capable of truly executing multiple threads at the same time.
Now, of course, we all have multiprocessor/multicore hardware, and many people use Python (or Python wrappers around libraries in other languages) for CPU-bound number-crunching tasks. So there’s pressure to remove the GIL. But it’s always been understood that the cost of doing so would be a hit to single-threaded performance of the Python interpreter, since the extra work required to keep the interpreter thread-safe does not come for free.
And so you have two camps: one which wants the GIL gone because they believe the benefits (free threading) will outweigh the single-threaded performance costs; and another which is skeptical of that tradeoff because most Python programs are and almost certainly would continue to be single-threaded, and people who use those programs probably will not be happy at a 15-20% performance drop, especially if it comes at a time when Python was finally starting to make significant performance gains.
15-20% is a lot. That’s definitely in the “noticeable even without benchmarking” category.
It is python, that is already 10x slower than most other mainstream languages. Is it really a significant difference at that point?
Yes, of course.
This feels like the sticking point for me personally. Given that I use Python mostly for scripts and the occasional webserver, it:
So while it will (?) benefit the ecosystem and language as a whole, it makes my personal use of Python worse, so I’m not sure why I’d support the initiative.
Python isn’t a new interpreter starting with a blank slate. How would you propose making an interpreter that supports GIL and NoGIL code running in the same process, both Python code and C extensions? It’s not an easy problem to solve.
do you mean multi-threading?
I interpreted their message as them asking why multi threading would be necessary to add to python since it already features multi-processing.
But maybe you’re right and they just meant multi-threading 😅
Multiprocessing with Python quickly leads to memory issues, as each process will take a significant amount of memory. This is partly due to how hard it is for stuff like code objects to end up being shared.
It’s a Python-specific problem. It was controversial over a decade ago, too.
If I remember correctly from articles and blog posts (eg. on the PyPy blog) that I read during previous attempts, another non-trivial contributing factor is the particular ways in which Python approached metaprogramming, leading to a ridiculous number of places where it’s difficult to avoid data races and/or undefined behaviour in a multi-threaded environment without wrapping locking around runtime lookups within the interpreter.