What a great post! It’s a simple issue (relative to lots of monster bugs), but it’s clearly explained with a lot of background so anyone can understand. Switch to Python 3!
I have a pet hypothesis: libraries matter more than programs, and the lack of a mutually compatible subset of py2.6 and py3 made things worse than they needed to be.
It’s relatively easy to port a program from py2 to py3 provided the library support is there. You change all of your code to py3, and drop py2 support. You only had to touch your own code.
For library code and particularly frameworks, it’s not so easy. They want to maintain compatibility with both during a transition period. If most devs haven’t yet moved to py3 and Django drops py2, Django will die. So they need to get all of their users to move over. Convincing other people is way harder than changing your own code.
For at least about 4 years after py3 was released, it wasn’t really feasible to ship one source code base that worked on both. It took until py2.7 and py3.3 for both languages to be fixed so that there was a workable common subset. e.g. One of the trivial to solve but very big problems is that if you want a Unicode string literal in py2 you have to write it like u"hello" to avoid having a byte string, whereas in py3.0 this syntax was not legal. In 3.3 the interpreter had pep 414 merged which made u"hello" and "hello" both be accepted and handled identically. I can’t find the patch right now but I suspect this may have been a 1 or 2 line change to the lexer.
There’s always going to be a multi year porting effort when you’re looking at frameworks with hundreds of kLoCs, and the infeasibility of maintaining a single code base supporting both delayed that process from even beginning for years. Then even once that’s finished, it takes years more for downstream users to port their own stuff over.
This all exacerbated a bootstrapping problem: most devs weren’t on py3 so the value proposition for frameworks to support it was initially poor. Most frameworks weren’t on py3 so the value proposition for devs to switch to it was initially poor. Py3 is nicer than py2 but py2 wasn’t all that bad in the first place so it took a while for the gap in functionality to be really noticeable.
(Note that tools for automatically rewriting py2 to py3, such as 2to3, did exist. My secondhand understanding is that they were not good enough for library code in practice.)
My belief now is that making sure a mutually compatible subset exists is a really good idea when breaking backwards compatibility. It would have IMO been much less harmful that many py2 programs didn’t run unmodified on py3 if there had been, out of the gate, a subset of the language which worked on both.
I can’t give you anything “objective” because anyone looking at what I write would be able to pin me as more of an insider (which is partly true, partly not). I did write something earlier this year that got a bit of traction on this site, if you want to read it.
But I feel like there are a few main things.
One is that it was never possible to go “right” in the sense of a nice smooth transition of everybody, or even nearly everybody, all in a short time. Some languages have it easier here – Ruby, for example, has diversified but is still so dominated by one domain (web development) and even one specific framework within that domain (Rails) that it can be a relatively simpler matter to drag everyone across a breaking change (and they did make it across a breaking-ish string change). Python, once upon a time, was mostly a Unix-y scripting language, but it’s now used so widely and for so many different things that getting all the constituencies on board makes herding cats look downright easy. And the old-guard folks who have been around the longest and using it for Unix-y scripting things the longest had some of the loudest objections to Python 3 (more in a minute).
Another is that organizational resistance to “maintenance programming” cannot be underestimated. When that story was going around recently about the horrors Uber went through rewriting their iOS app in Swift, a lot of people seemed surprised that the company even tried such a thing, but it really makes perfect sense once you have experience of a certain type of regrettably-common environment. In many organizations, programming work that doesn’t directly ship new features to customers, or that doesn’t otherwise have immediately quantifiable payoff, is effectively forbidden, to such a degree that often a rewrite to a new language – sold to management via the expected quantifiable payoffs of important new features or better performance or whatever – is the only way to get even basic maintenance done on existing code. This is also why I largely don’t begrudge people whose response to Python 3 was “well, time to rewrite in Go/Rust/whatever”. They probably have huge deferred maintenance burdens in their codebases, and can’t sell “we’ll rewrite in the new version of our existing language” but can sell “we’ll rewrite in $NEW_LANG and look at all the nice stuff we get”.
From a purely technical perspective, the early Python 3 releases reminded me a bit of what happened with the KDE 3 -> 4 transition, where the initial releases still were effectively technology previews rather than production-ready. Python did a better job of messaging about this, but a lot of people didn’t get or didn’t pay attention to that messaging and concluded from the earliest releases that Python 3 was a dud. The early Python 3 releases were still stabilizing APIs and still had some critical things – like a lot of the new I/O system – written in pure Python rather than C, which tanked performance, and were lacking some of the later porting conveniences (like being able to u-prefix string literals, which didn’t become legal again until Python 3.3).
And then there was the filesystem thing. Plenty of code – and a lot of critical code in things like Linux distros – that needed to move over to Python 3 was still effectively “Unix-y scripting”. And Unix-y filesystems are a horrid mess. You may have seen some of the things that go around once in a while about how there’s no portable reliable way to, say, ask to have a file actually written out to disk and get back an indication of whether the write succeeded or failed; that’s an example, but not the one that Python snagged on. Python ran into the problem of encoding: Python 3 strings are Unicode, and to get strings from bytes you have to decode from whatever encoding they’re in. Many Unix-y systems allow you to do this, but not all systems and not all configurations, and no matter how careful you are, a surprising number of people will come out of the woodwork waving copies of old specs and saying that what they are doing is technically allowed, so it’s your problem to stay compatible with what they’re doing. The only portable, reliable description of Unix filesystem paths is “bags of bytes that make no guarantees whatsoever about being decodable”. And so for a long time, Unix filesystem path encoding on Python 3 was not a great story. That has gotten better (at the cost of some hackiness in Python itself), but it was a sore point for a fair number of people for a while.
And plenty of people either jumped ship to other languages, or were dragged to Python 3 by the libraries and frameworks they use. Or are just continuing to use Python 2, often on Linux distros which have committed to long support cycles anyway. But regardless, have made peace with the transition in their own way and so the furor has died down a bit.
I don’t have one handy, and can’t even think of a good one that I’ve read. As someone who has been using python at different levels since 2001, but who has never been closely involved in design/maintenance of the language or standard library personally, here are some quick but arms-length observations:
unicode support needed fixing. badly. most everyone knew it.
no one had a strong, widely accepted, nonbreaking way to accomplish that
“since we need to break compatibility anyway…” the opportunity was seized to address several other issues
the transition wasn’t as swift as anyone (especially proponents of absorbing the big compatibility hit all at once) imagined
There was something akin to a sense that the community was small enough to weather this kind of transition almost painlessly, and that there was probably a limited window where that was true. That sense was very likely misplaced when the transition was proposed, and was certainly misplaced by the time the transition had any momentum at all.
I’m sure someone can do a better job than I just did. But that’s my understanding having been around for a while without a ton of personal skin in the production of the language/libraries. Which may be the closest thing I can imagine to being objective while still having hands on knowledge in the context of the moments in question.
A hostname can point to multiple IP numbers, so it’s not that simple either :)
As alluded to in the last chapter, maybe IP addresses shouldn’t be strings?
TypeErrors all the way down.
They aren’t. How many years after an issue has been fixed do we need to keep writing blog posts about it?
Probably for as long as there are production deployments of things where this is
True
This is a great deep dive, fun to read. Sadly recognizable in python 2, in the likes of “ordinal not in range”…
What a great post! It’s a simple issue (relative to lots of monster bugs), but it’s clearly explained with a lot of background so anyone can understand. Switch to Python 3!
I don’t understand why folks are still using Python 2 even after the announcement to sunset it on January 1, 2020.
Does anyone have a recommendation for an objective overview on what seemingly went so wrong with the two to three transition?
As an outsider who writes at most 1KLOC of Python a year, I’m not sure, but I want to learn more.
I have a pet hypothesis: libraries matter more than programs, and the lack of a mutually compatible subset of py2.6 and py3 made things worse than they needed to be.
It’s relatively easy to port a program from py2 to py3 provided the library support is there. You change all of your code to py3, and drop py2 support. You only had to touch your own code.
For library code and particularly frameworks, it’s not so easy. They want to maintain compatibility with both during a transition period. If most devs haven’t yet moved to py3 and Django drops py2, Django will die. So they need to get all of their users to move over. Convincing other people is way harder than changing your own code.
For at least about 4 years after py3 was released, it wasn’t really feasible to ship one source code base that worked on both. It took until py2.7 and py3.3 for both languages to be fixed so that there was a workable common subset. e.g. One of the trivial to solve but very big problems is that if you want a Unicode string literal in py2 you have to write it like
u"hello"
to avoid having a byte string, whereas in py3.0 this syntax was not legal. In 3.3 the interpreter had pep 414 merged which madeu"hello"
and"hello"
both be accepted and handled identically. I can’t find the patch right now but I suspect this may have been a 1 or 2 line change to the lexer.There’s always going to be a multi year porting effort when you’re looking at frameworks with hundreds of kLoCs, and the infeasibility of maintaining a single code base supporting both delayed that process from even beginning for years. Then even once that’s finished, it takes years more for downstream users to port their own stuff over.
This all exacerbated a bootstrapping problem: most devs weren’t on py3 so the value proposition for frameworks to support it was initially poor. Most frameworks weren’t on py3 so the value proposition for devs to switch to it was initially poor. Py3 is nicer than py2 but py2 wasn’t all that bad in the first place so it took a while for the gap in functionality to be really noticeable.
(Note that tools for automatically rewriting py2 to py3, such as 2to3, did exist. My secondhand understanding is that they were not good enough for library code in practice.)
My belief now is that making sure a mutually compatible subset exists is a really good idea when breaking backwards compatibility. It would have IMO been much less harmful that many py2 programs didn’t run unmodified on py3 if there had been, out of the gate, a subset of the language which worked on both.
I can’t give you anything “objective” because anyone looking at what I write would be able to pin me as more of an insider (which is partly true, partly not). I did write something earlier this year that got a bit of traction on this site, if you want to read it.
But I feel like there are a few main things.
One is that it was never possible to go “right” in the sense of a nice smooth transition of everybody, or even nearly everybody, all in a short time. Some languages have it easier here – Ruby, for example, has diversified but is still so dominated by one domain (web development) and even one specific framework within that domain (Rails) that it can be a relatively simpler matter to drag everyone across a breaking change (and they did make it across a breaking-ish string change). Python, once upon a time, was mostly a Unix-y scripting language, but it’s now used so widely and for so many different things that getting all the constituencies on board makes herding cats look downright easy. And the old-guard folks who have been around the longest and using it for Unix-y scripting things the longest had some of the loudest objections to Python 3 (more in a minute).
Another is that organizational resistance to “maintenance programming” cannot be underestimated. When that story was going around recently about the horrors Uber went through rewriting their iOS app in Swift, a lot of people seemed surprised that the company even tried such a thing, but it really makes perfect sense once you have experience of a certain type of regrettably-common environment. In many organizations, programming work that doesn’t directly ship new features to customers, or that doesn’t otherwise have immediately quantifiable payoff, is effectively forbidden, to such a degree that often a rewrite to a new language – sold to management via the expected quantifiable payoffs of important new features or better performance or whatever – is the only way to get even basic maintenance done on existing code. This is also why I largely don’t begrudge people whose response to Python 3 was “well, time to rewrite in Go/Rust/whatever”. They probably have huge deferred maintenance burdens in their codebases, and can’t sell “we’ll rewrite in the new version of our existing language” but can sell “we’ll rewrite in
$NEW_LANG
and look at all the nice stuff we get”.From a purely technical perspective, the early Python 3 releases reminded me a bit of what happened with the KDE 3 -> 4 transition, where the initial releases still were effectively technology previews rather than production-ready. Python did a better job of messaging about this, but a lot of people didn’t get or didn’t pay attention to that messaging and concluded from the earliest releases that Python 3 was a dud. The early Python 3 releases were still stabilizing APIs and still had some critical things – like a lot of the new I/O system – written in pure Python rather than C, which tanked performance, and were lacking some of the later porting conveniences (like being able to
u
-prefix string literals, which didn’t become legal again until Python 3.3).And then there was the filesystem thing. Plenty of code – and a lot of critical code in things like Linux distros – that needed to move over to Python 3 was still effectively “Unix-y scripting”. And Unix-y filesystems are a horrid mess. You may have seen some of the things that go around once in a while about how there’s no portable reliable way to, say, ask to have a file actually written out to disk and get back an indication of whether the write succeeded or failed; that’s an example, but not the one that Python snagged on. Python ran into the problem of encoding: Python 3 strings are Unicode, and to get strings from bytes you have to decode from whatever encoding they’re in. Many Unix-y systems allow you to do this, but not all systems and not all configurations, and no matter how careful you are, a surprising number of people will come out of the woodwork waving copies of old specs and saying that what they are doing is technically allowed, so it’s your problem to stay compatible with what they’re doing. The only portable, reliable description of Unix filesystem paths is “bags of bytes that make no guarantees whatsoever about being decodable”. And so for a long time, Unix filesystem path encoding on Python 3 was not a great story. That has gotten better (at the cost of some hackiness in Python itself), but it was a sore point for a fair number of people for a while.
And plenty of people either jumped ship to other languages, or were dragged to Python 3 by the libraries and frameworks they use. Or are just continuing to use Python 2, often on Linux distros which have committed to long support cycles anyway. But regardless, have made peace with the transition in their own way and so the furor has died down a bit.
I don’t have one handy, and can’t even think of a good one that I’ve read. As someone who has been using python at different levels since 2001, but who has never been closely involved in design/maintenance of the language or standard library personally, here are some quick but arms-length observations:
I’m sure someone can do a better job than I just did. But that’s my understanding having been around for a while without a ton of personal skin in the production of the language/libraries. Which may be the closest thing I can imagine to being objective while still having hands on knowledge in the context of the moments in question.
Also, Guido van Rossum gave a (clearly) not objective but still pretty fair overview in this talk in my opinion.
As far as I know, RedHat committed to security updates for Python 2 for additional time.
[Comment removed by author]