1. 25

  2. 9

    With Python 3 being out for so long and the Python 2 deadline being delayed I think it is somewhat embarassing Mercurial still doesn’t use it as default. Sure, 3rd party plugins are not up to date, but the longer they wait with the migration, the longer Python 2 will need to be kept alive.

    1. 13

      Python 3 has been especially difficult for Mercurial because Mercurial does not use encodings. It’s an application to store your bytes and your bytes are just bytes. You don’t want Mercurial (or git, for that matter) to start having opinions about how to interpret the bytes you give it. All the data that hg deals with – all of it – is just bytes.

      The Python 2 deprecation deadline was mostly extended by 5 years for Mercurial. A bunch of us (not just Mercurial, but probably we were the loudest ones) crowded around Guido at Pycon 2014 and told him about how difficult Python 3 was for Mercurial. In response, Guido later announced that day that we would have 5 more years. The bare minimum we needed was PEP 461, which wasn’t available back in 2014, and once that was done, the rest was just a lot of hard work.

      I am not at all embarrassed by the gargantuan effort that my fellow Mercurial maintainers have made to port Mercurial to Python 3. Quite the contrary, I’m proud of them and have gotten a little resentful that the Python 3 upgrade path has not been nearly as easy as its proponents claim.

      1. 6

        Oil is the same way – it doesn’t use Python’s unicode object. it uses bytes, since libc calls all take and return bytes (e.g. execve, chdir, listing directory entries, etc.)


        Right now the Python 2 interpreter is bundled with Oil. I removed at least half of the interpreter, which I’m not using [1] (e.g. the whole front end, unicodeobject.c, complexobject.c, etc.)

        Eventually I want to get rid of the dependence on the interpreter. But I think this was a decent way to make progress quickly and have a working program the whole time.

        Since hg is already a mature and working program, I imagine bundling the interpreter might also have been a worthwhile strategy.

        It would also largely fix the startup time issue (which I think got worse in Python 3) since you’re not dependent on the system PYTHONPATH and you can modify the startup sequence with some #ifdefs.

        People always talk about maintenance of Python 2 but I don’t really think it’s a problem. It’s a very mature and understandable codebase. I’ve made dozens of small, localized modifications to it.

        And if Python 2 has a security issue in say the tempfile module, well it doesn’t affect a program that doesn’t use that module.

        [1] http://www.oilshell.org/blog/2018/11/15.html

        1. 6

          We’re actually vaguely planning to bundle Python with Mercurial via PyOxidizer - it should be a one-file binary at that point.

          I’m sure Greg would be interested in Oil doing it too, but I think it can’t be done on Python 2 for a variety of reasons.

          1. 1

            People always talk about maintenance of Python 2 but I don’t really think it’s a problem. It’s a very mature and understandable codebase. I’ve made dozens of small, localized modifications to it.

            So you’ve made dozens of small modifications to it (python) and are now carrying those changes, and you don’t see a problem with many people doing the same thing? It seems like you’ll quickly end up in a situation where one implementation of python2 used by/included in one program is not the same as another used by/included in another, and that is a bigger maintenance problem.

            1. 1

              Why is that a maintenance problem? There can be N different forks of Python 2, e.g. one used for hg, one used for Oil, and I don’t see the problem. Each team maintains it separately.

              Python 2 is a stable, well-tested, and well-understood codebase.

              It’s somewhat analogous to there being N forks of the BSD kernel. In that case I’d say it’s also a feature and not a bug.

              Although you could argue why they don’t combine their efforts (though I don’t think it makes sense to argue). Either way, that argument doesn’t make sense for hg or Oil or dozens of other programs that are written in Python and may want to make small, local, backward incompatible modifications.

              1. 1

                The problem is that you have a bunch of different teams maintaining essentially (but not quite) the same thing in their own gardens. So when a vulnerability is found, you have to rely on them all to 1) notice it then 2) fix/deploy. And users now have N number of “python 2” installs on their one system that are in all sorts of different states. How do you know if you have something using a broken “version” of python 2? When a CVE is posted, how can you possibly tell if you’re affected in all the different “versions” of python 2 on your system?

                1. 2

                  On balance, I think using Python 2.7 and statically linking it reduces the number of vulnerabilities. (Although it’s not ideal for other non-security reasons, which is why it’s not a permanent solution).

                  If you rewrite the functionality from scratch rather than reusing well-tested pieces of code, you’d have as many or more vulnerabilities in your own code. Unless of course there is something you’re doing that the Python team isn’t (possible but not that likely).

                  I’d say Python 2.7 more solid than the average piece of code simply because it’s undergone a lot of testing and usage. There have been a lot of eyeballs on that code.

                  And as mentioned, the vulnerabilities in stdlib modules that Oil don’t use don’t matter because they’re not even in the binary (e.g. I see CVEs in XMLRPCServer, which has nothing to with Oil). On top of that, I also remove around 150K lines of other interpreter code, leaving somewhere around 100K. That’s more than I would like, but it’s not a problem to maintain.

                  On a related note, I think there is some cargo culting of the “CVE treadmill”. Newer code isn’t always better. If the development team isn’t careful about security, you can patch one CVE and introduce another one. In fact I think that’s pretty common because software tends to get larger over time.

                  I’d rather move to a model where there is actual reasoning about security from first principles rather than “I heard about this CVE so I need to update all the places where it occurs, and now I did, so the problem is gone”. It’s kind of where the industry is now, but it doesn’t reflect the underlying reality.

          2. 2

            bare minimum we needed was PEP 461

            Can you expand on that? That request looks just like some convenience feature that could have just been a normal method.

            1. 1

              It’s because mercurial used % formatting on python2 strings heavily to encode the mercurial wire protocol. Before pep 461, % formatting on bytestrings was not allowed, so they would have needed to carefully rewrite the wire protocol code in a way that would have left the code less readable.

              1. 2

                Isn’t that pretty much a basic refactoring operation in any typed language?

                1. 7

                  If you take a stroll through the gargantuan effort of hundreds of commits I linked above, I think you’ll see that it wasn’t a basic refactoring operation.

                  1. 4

                    Not sure what you mean, regardless it doesn’t matter much for Mercurial because it’s not written in a typed language.

                    Also I think the aspect of rewriting the code for no gain but leaving the code less readable is a big deal, readability counts.

              2. 1

                It’s an application to store your bytes and your bytes are just bytes.

                Python 3 has a “just bytes” type. One that’s actually far better than Python 2’s, because Python 2’s was overloaded by also needing to be the default string object. For example, a while ago I went back and did the first couple sets of cryptopals challenges again using Python 3 instead of 2, and it was night-and-day how much nicer Python 3 was for working with “just bytes”.

                And the fact that Mercurial was apparently utterly blocked by not being able to use % on bytes objects indicates Mercurial was not treating these bytes as “just bytes” – the % operator is there to let you do string formatting operations with printf()-style syntax, which only makes sense if the bytes are in fact being treated as strings (which in turn means you need to “use encodings”, because formatting operations on byte sequences of unknown encoding are a ticking time bomb in the code).

                1. 6

                  Please read PEP 461. Its use cases are explained there and apply to Mercurial.

                  1. 4

                    I have read it. I’m not disagreeing that the Mercurial team says PEP 461 makes the their lives easier. I’m disagreeing that use of % on str in Python 2 was a “just bytes” operation.

                    Even the rationale in the PEP, and the discussion on python-dev when the PEP was being drafted, acknowledged this – the whole purpose for this is to allow treating some bytes as actually being encoded text, and manipulate them in ways that only make sense for encoded text. Which in turn carries an assumption about what encoding was used to produce the bytes, etc.

                    Which means that all of your insistence on “just bytes”, “no encodings”, “no interpretation”, and so on in your original comment was, to be charitable, a red herring. Mercurial didn’t need % formatting for “just bytes” operations; it needed it for operations on bytes that Mercurial assumes are ASCII-encoded text.

                    I don’t doubt it’s perceived as easier for Mercurial to continue doing things that way. I do doubt whether it’s the right approach to wire protocols which consist of mixed encoded text and “just bytes”.

              3. 4

                Actually with the latest release (Mercurial 5.2), Python3 is the default.

                1. 1

                  …except on Windows, unless I missed something, correct?

                  1. 3

                    Yep. Windows has a little ways to go yet. :(

              4. 1

                On Ubuntu aptitude why python2.7 returns gimp.

                1. 1

                  aptitude why python2.7

                  On 18.04 here after uninstalling distcc-pump I only have

                  aptitude why python2.7
                  i   python Depends python2.7 (>= 2.7.15~rc1-1~)

                  oh well, and steam-launcher which needs python

                2. 1

                  llvm-9-dev package in particular. This is due to lit, LLVM Integrated Tester

                  Looking on my FreeBSD laptop: llvm-lit70 is 2.7, llvm-lit80 and 90 are 3.6. It’s odd that Debian would use 2.7 for something as new as llvm9.

                  Projects I care about, for example Firefox, uses Mercurial

                  git-cinnabar works pretty well btw, moz-phab explicitly supports it.

                  1. 2

                    Looking at git-cinnabar’s README, it seems to require Python 2. Or is README outdated?

                    1. 1

                      Yeah, porting is in progress, and it calls into mercurial itself. This recommendation was not specifically about py2, just as a “not using hg” thing :)

                      1. 3

                        I actually prefer hg to git, as a matter of UI. In any case, thanks!

                        1. 1

                          Mercurial does have an experimental Python 3 build you can use:

                          $ HGPYTHON3=1 pip3 install mercurial

                          Though, expect some rough edges (especially around extensions). However, Firefox development itself requires Python 2 and will continue to do so past the EOL date.

                          Edit: According to ngoldbaum above, looks like it might be the default with the latest release.