This only seems to define the issue, rather than really argue against such changes, or suggest alternatives.
What should have been done in both Python examples? Would it have been better for the community in the long run to stay with Python 2 or the IO model? Could the changes have been introduced in a way that’s less “traumatic” to the community?
The really weird thing here is that arguing against introducing “weak trauma” appears to be indistinguishable from arguing against making any kind of widespread improvement a language or ecosystem. Is that really the take-away? Coroutines shouldn’t have been added, because making new good code makes old code seem worse by comparison? I don’t think that’s the intention here, but it’s hard to see the difference.
I think a fairer interpretation is that coroutines shouldn’t have been added in the way that they were added if it were at all possible to add them in a way that would cause less of a big disconnect from how things worked before.
And I have to say, although I disagreed with the post when I first read it, that I totally agree with that sentiment. The way that ‘coroutines’ work in Python is really annoying. Everything needs to be rewritten to handle coroutines, in a totally trivial syntactic way: add ‘await’ all over the place. The compiler could do that for me.
Unfortunately, this has a chilling effect on existing Python code. The introduction of asyncio has made large amounts of code idiomatically obsolete. Requests, the darling of the Python world, is effectively useless in a theoretical idiomatic post-asyncio world. The same is true of Flask, SQLAlchemy, and many, many other projects. Just about anything that does I/O is unidiomatic now.
That’s a really odd stance to take imo. Asyncio is not a panacea or “cure all” by any means. Specifically; there will always be a need for thread/worker pools anytime there are “non async” services being called, or tasks that are just long running, or calls into the host language (that may be long running).
Unidiomatic code is likely to violate the Principle of Least Surprise. In turn, it introduces a gaping vector for bugs. Devs will assume the unidiomatic code works like the idiomatic code and has the same semantics.
That’s not to say that devs are okay to not understand the code - but when you’re moving fast and breaking things, the additional cognitive load of something that is unidiomatic is a burden.
There is Flask as an example given. Most flask projects I know are small and MVP - the kind where you quickly need a web endpoint for a library in python more than solving a problem per se. If your important library was in another language, you wouldn’t choose Flask (it’s awesome, but not the best web framework in the world without further discussion). So suddenly your perfectly well and idiomatic python code (in a small project) is not idiomatic anymore - no one cares. Why would you rewrite a perfectly fine Python 3.3 project just because everything has to be async in 3.7 (I didn’t look up the numbers, you get my point).
And unless there’s a new swiss army knife of nice and easy web applications, people will probably continue to use Flask, and that’s a good thing.
Agreed. If all new code that a team writes starts using the new idiomatic approach, the unidiomatic code is now a piece of tech debt that needs to either be paid, or will forever result in extra cognitive load, however small or large it may be.
And it will happen again, with more and more idioms, over time. This kind of divergence takes a toll. The barrier to entry for new developers becomes a bit higher each time.
As someone who dealt with an 18 year old codebase with a lot of this - and also as someone who contributed to that mess - it’s exhausting.
It’s not quite that simple, though I actually do agree with you that cooperative multitasking makes little sense for a dynamically typed, interpreted language with little concern for speed (yes, I also think Node is stupid). The problem is that changing Python to a language with a concurrent runtime will affect FFI. And Python uses FFI heavily enough that alternative implementations of the language have to implement the FFI layer the same way to be usable.
Following that to the logical extreme gets you effect based programming, which is so unusable in practice that not even Haskell people try to pretend it’s ready for prime time.
And certainly it’s not appropriate in Python. In theory being able to tell where every possible effect could happen is useful. In practice it means having to write the same function for every combination of effects which is awful.
You’ll see me reach for threads to solve a problem when hell freezes over, and no earlier,
I prefer threads for almost all problems… with minimal and explicitly shared state.
In other word, threads aren’t bad by themselves. I mean, they’re the ONLY way to use all your cores (besides processes)! They’re only bad if you have “uncontrolled” sharing.
If you use threads well, it rapidly leads you to a style of ZERO global variables. Unfortunately a lot of frameworks don’t follow this style. But it can be done well in Python.
If you’re thinking of moving to Go, then that would be weird, because Go’s concurrency model is logically “threads”. Goroutines can be interrupted at any point. Go just provides you with a lot of tools to control the sharing, but Python has some of those things too. It’s definitely not as smooth an idiomatic in Python, but it’s possible.
clearly it’s an improvement for Python, whose previous attempts at concurrency have fallen completely flat.
I don’t think that’s fair… tons of people use Tornado and Twisted, including major companies with customers, revenue, etc. (I played with Tornado but never used it in production, never got into Twisted)
My higher level point is that there’s an assumption in your post that threads are the “old way” in Python and asyncio is the “new way”, and everybody is going to migrate from one to the other, as they will from Python 2 to 3.
But that’s not true – people will be using threads forever in Python, and that’s intended by the language designers.
They have well-known limitations but they also work great for many use cases (e.g. simple concurrent I/O and computation with C extensions that release the GIL)
I generally agree with your point about platform stability, but Python concurrency isn’t a good example to illustrate that.
I hate threads, and will never use them, opting for coroutines or processes instead.
How about a CPU-bound task involving a lot of data? You can’t use async stuff because there’s no significant I/O wait to rely on. And marshalling a lot of data between processes is a huge hit. And using shared memory makes processes indistinguishable from threads, only now your locking primitives are more expensive.
Case in point, I once tried to split up my JSON parser into two parallel processes (a lexer and a parser). Which failed miserably due to them having to copy all the data between each other.
But really what I’m trying to say, the position “I hate it and will never use it” is not very constructive. Sooner or later you’ll find a use case.
But… I mean… no one’s bottleneck is ever parsing JSON. I don’t think micro-optimising a JSON parser is ever going to be worth it. Do you have any more practical examples?
Never say never. My logging pipeline’s bottleneck is absolutely parsing JSON! It’s really expensive in Python. I’m looking forward to giving PyPy’s new parser a try.
Model checkers are both CPU and RAM intensive, as you have to keep track of visited states. For large state spaces, this can easily hit gigabytes of RAM per process.
This article sort of touches on why I am mostly allergic to javascript. It’s not my primary or even secondary language and the ecosystem changes so fast that it’s a full-time job trying to just keep up with what is idiomatic. I guess it’s not the javascript language itself that is the issue. It’s more the libraries and tooling that create this problem.
Is there no way to automate some of these changes (asyncio namely)? For instance, tools exist to automate (part of) the translation of C code to Rust (corrode).
Isn’t this possible, or desirable, for Python code?
Python 2 to 3 transition had tools for it, as py2 vs py3 was less of a paradigm change and more of a technical change. But threads to asyncio is quite different, as it changes the way you are supposed to think about it, therefore it is a lot harder to translate the code automatically.
The worst is that traumatic changes in python (and nearly any other language) will appear again… my alternative is a lisp language. You can add any language feature you want with the same syntax. And in the case of CL, you can even program the syntax reader: you can make that in your source files, { } will denote a dictionary, use @-decorators, etc. Without ever waiting a language update, or breaking existing libraries. CL is famous for its stability. Code written in the 90s (and probably before) can still run as is.
In my opinion CL avoids thos problem in the worst way possible: no code is actually idiomatic in the ecosystem. It can be idiomatic within a project, but between projects, taking a piece of code from one and putting it into another will result in code that is not idiomatic for one project, but is for the other.
I don’t understand. We don’t copy&paste code from one project to another. The ecosystem is actually pretty conservative. And if we use features from other libraries, it’s like importing a library of pattern matching of futures into python: not idiomatic, just incorporated into your project for your own use.
We don’t copy&paste code from one project to another.
Yes, that is the consequence of the choices LISP makes. I argue it’s a bad thing. You need to learn what is idiomatic for every project if you want to contribute to it. Meanwhile in Python, or Go, or C, or almost any other language if you know what is idiomatic within a language, you know what is idiomatic within a project. This reduces the initial thinking overhead for new contributors and allows to easily reuse solutions from other projects in yours. When understanding what it does, copy-pasting code is one of the more powerful tools in programmer’s toolbox.
well even in Python I don’t copy and paste code so we’ll have a short discussion. But regarding lisp, once again the community is quite conservative meaning you don’t encounter crazy new languages atop of it, then looking at the imported libraries one has to learn what they do anyway, exactly like in Python.
This only seems to define the issue, rather than really argue against such changes, or suggest alternatives.
What should have been done in both Python examples? Would it have been better for the community in the long run to stay with Python 2 or the IO model? Could the changes have been introduced in a way that’s less “traumatic” to the community?
The really weird thing here is that arguing against introducing “weak trauma” appears to be indistinguishable from arguing against making any kind of widespread improvement a language or ecosystem. Is that really the take-away? Coroutines shouldn’t have been added, because making new good code makes old code seem worse by comparison? I don’t think that’s the intention here, but it’s hard to see the difference.
I think a fairer interpretation is that coroutines shouldn’t have been added in the way that they were added if it were at all possible to add them in a way that would cause less of a big disconnect from how things worked before.
And I have to say, although I disagreed with the post when I first read it, that I totally agree with that sentiment. The way that ‘coroutines’ work in Python is really annoying. Everything needs to be rewritten to handle coroutines, in a totally trivial syntactic way: add ‘await’ all over the place. The compiler could do that for me.
That’s a really odd stance to take imo. Asyncio is not a panacea or “cure all” by any means. Specifically; there will always be a need for thread/worker pools anytime there are “non async” services being called, or tasks that are just long running, or calls into the host language (that may be long running).
I don’t understand the focus on “idiomatic” code. Unidiomatic code is not incorrect code.
Unidiomatic code is likely to violate the Principle of Least Surprise. In turn, it introduces a gaping vector for bugs. Devs will assume the unidiomatic code works like the idiomatic code and has the same semantics.
That’s not to say that devs are okay to not understand the code - but when you’re moving fast and breaking things, the additional cognitive load of something that is unidiomatic is a burden.
Author here, I disagree. Unidiomatic code is a code smell, and needs to be corrected - especially if it has consequences on code which depends on it.
I think it’s a little more complicated.
There is Flask as an example given. Most flask projects I know are small and MVP - the kind where you quickly need a web endpoint for a library in python more than solving a problem per se. If your important library was in another language, you wouldn’t choose Flask (it’s awesome, but not the best web framework in the world without further discussion). So suddenly your perfectly well and idiomatic python code (in a small project) is not idiomatic anymore - no one cares. Why would you rewrite a perfectly fine Python 3.3 project just because everything has to be async in 3.7 (I didn’t look up the numbers, you get my point).
And unless there’s a new swiss army knife of nice and easy web applications, people will probably continue to use Flask, and that’s a good thing.
Agreed. If all new code that a team writes starts using the new idiomatic approach, the unidiomatic code is now a piece of tech debt that needs to either be paid, or will forever result in extra cognitive load, however small or large it may be.
And it will happen again, with more and more idioms, over time. This kind of divergence takes a toll. The barrier to entry for new developers becomes a bit higher each time.
As someone who dealt with an 18 year old codebase with a lot of this - and also as someone who contributed to that mess - it’s exhausting.
In my opinion async io makes little sense for python, they could have instead made an async runtime and kept the code exactly the same.
It’s not quite that simple, though I actually do agree with you that cooperative multitasking makes little sense for a dynamically typed, interpreted language with little concern for speed (yes, I also think Node is stupid). The problem is that changing Python to a language with a concurrent runtime will affect FFI. And Python uses FFI heavily enough that alternative implementations of the language have to implement the FFI layer the same way to be usable.
Here’s a good piece on why you really want asynchronous call explicitly visible in your code: https://glyph.twistedmatrix.com/2014/02/unyielding.html
(It starts out slow, but gets to the point in the end.)
Following that to the logical extreme gets you effect based programming, which is so unusable in practice that not even Haskell people try to pretend it’s ready for prime time.
And certainly it’s not appropriate in Python. In theory being able to tell where every possible effect could happen is useful. In practice it means having to write the same function for every combination of effects which is awful.
Regarding concurrency:
I prefer threads for almost all problems… with minimal and explicitly shared state.
In other word, threads aren’t bad by themselves. I mean, they’re the ONLY way to use all your cores (besides processes)! They’re only bad if you have “uncontrolled” sharing.
If you use threads well, it rapidly leads you to a style of ZERO global variables. Unfortunately a lot of frameworks don’t follow this style. But it can be done well in Python.
If you’re thinking of moving to Go, then that would be weird, because Go’s concurrency model is logically “threads”. Goroutines can be interrupted at any point. Go just provides you with a lot of tools to control the sharing, but Python has some of those things too. It’s definitely not as smooth an idiomatic in Python, but it’s possible.
I don’t think that’s fair… tons of people use Tornado and Twisted, including major companies with customers, revenue, etc. (I played with Tornado but never used it in production, never got into Twisted)
That parenthetical is the most important part of this statement. I hate threads, and will never use them, opting for coroutines or processes instead.
I fully understand Go’s concurrency model and I also think it’s one of the weakest parts of the langauge, if not the single worst part.
My higher level point is that there’s an assumption in your post that threads are the “old way” in Python and asyncio is the “new way”, and everybody is going to migrate from one to the other, as they will from Python 2 to 3.
But that’s not true – people will be using threads forever in Python, and that’s intended by the language designers.
They have well-known limitations but they also work great for many use cases (e.g. simple concurrent I/O and computation with C extensions that release the GIL)
I generally agree with your point about platform stability, but Python concurrency isn’t a good example to illustrate that.
How about a CPU-bound task involving a lot of data? You can’t use async stuff because there’s no significant I/O wait to rely on. And marshalling a lot of data between processes is a huge hit. And using shared memory makes processes indistinguishable from threads, only now your locking primitives are more expensive.
Case in point, I once tried to split up my JSON parser into two parallel processes (a lexer and a parser). Which failed miserably due to them having to copy all the data between each other.
But really what I’m trying to say, the position “I hate it and will never use it” is not very constructive. Sooner or later you’ll find a use case.
What was your use case where parsing JSON was the bottleneck?
Parsing JSON is not the point, it was an example of a CPU-bound data-heavy use case where threads + shared data is a good approach.
But… I mean… no one’s bottleneck is ever parsing JSON. I don’t think micro-optimising a JSON parser is ever going to be worth it. Do you have any more practical examples?
Never say never. My logging pipeline’s bottleneck is absolutely parsing JSON! It’s really expensive in Python. I’m looking forward to giving PyPy’s new parser a try.
Model checkers are both CPU and RAM intensive, as you have to keep track of visited states. For large state spaces, this can easily hit gigabytes of RAM per process.
This article sort of touches on why I am mostly allergic to javascript. It’s not my primary or even secondary language and the ecosystem changes so fast that it’s a full-time job trying to just keep up with what is idiomatic. I guess it’s not the javascript language itself that is the issue. It’s more the libraries and tooling that create this problem.
Nah, I disagree. Evolving syntax and other (missing) languages features are part of the problem. Lisp languages avoid those trauma…
Does that mean sr.ht will soon be rewritten? Maybe in Go?
Is there no way to automate some of these changes (asyncio namely)? For instance, tools exist to automate (part of) the translation of C code to Rust (corrode).
Isn’t this possible, or desirable, for Python code?
Python 2 to 3 transition had tools for it, as py2 vs py3 was less of a paradigm change and more of a technical change. But threads to asyncio is quite different, as it changes the way you are supposed to think about it, therefore it is a lot harder to translate the code automatically.
The worst is that traumatic changes in python (and nearly any other language) will appear again… my alternative is a lisp language. You can add any language feature you want with the same syntax. And in the case of CL, you can even program the syntax reader: you can make that in your source files,
{ }
will denote a dictionary, use@
-decorators, etc. Without ever waiting a language update, or breaking existing libraries. CL is famous for its stability. Code written in the 90s (and probably before) can still run as is.Then, async support depends on the implementations. That’s what is possible in CL: https://github.com/CodyReichert/awesome-cl#parallelism-and-concurrency
In my opinion CL avoids thos problem in the worst way possible: no code is actually idiomatic in the ecosystem. It can be idiomatic within a project, but between projects, taking a piece of code from one and putting it into another will result in code that is not idiomatic for one project, but is for the other.
I don’t understand. We don’t copy&paste code from one project to another. The ecosystem is actually pretty conservative. And if we use features from other libraries, it’s like importing a library of pattern matching of
futures
into python: not idiomatic, just incorporated into your project for your own use.Yes, that is the consequence of the choices LISP makes. I argue it’s a bad thing. You need to learn what is idiomatic for every project if you want to contribute to it. Meanwhile in Python, or Go, or C, or almost any other language if you know what is idiomatic within a language, you know what is idiomatic within a project. This reduces the initial thinking overhead for new contributors and allows to easily reuse solutions from other projects in yours. When understanding what it does, copy-pasting code is one of the more powerful tools in programmer’s toolbox.
well even in Python I don’t copy and paste code so we’ll have a short discussion. But regarding lisp, once again the community is quite conservative meaning you don’t encounter crazy new languages atop of it, then looking at the imported libraries one has to learn what they do anyway, exactly like in Python.