I feel like this entire post reinforces just how difficult Python dependency management is. I’m a huge fan of Python, despite all of its flaws, but the dependency management is horrible compared to other languages. Nobody should have to learn the intricacies of their build tools in order to build a system, nor should we have to memorize a ton of flags just for the tools to work right. And this isn’t even going into the issue where building a Python package just doesn’t work, even if you follow the directions in a README, simply because of how much is going on. It is incredibly hard to debug, and that is for just getting started on a project (and who knows what subtle versioning mistakes exist once it does build).
I think Cargo/Rust really showed just how simple dependency management can be. There are no special flags, it just works, and there are two tools (Cargo and rustup) each with one or two commands you have to remember. I have yet to find a Rust project I can’t build first try with Cargo build. Until Python gets to that point, and poetry is definitely going down the right path, then Python’s reputation as having terrible dependency management is well deserved.
Completely agree. I’ve been writing Python for 15 years professionally, and it’s a form of psychological abuse to keep telling people that their problems are imaginary and solved by switching to yet another new dependency manager, which merely has a different set of hidden pitfalls (that one only uncovers after spending considerable time and energy exploring).
Every colleague I’ve worked with in the Python space kind of feels jaded by anyone who promises some tool or technology can make life better, because they’ve been so jaded by this kind of thing in Python (not just dependency management, but false promises about how “just rewrite the slow bits in C/Numpy/multiprocessing/etc” will improve performance and other such things)–they often really can’t believe that other languages (e.g., Go, Rust, etc) don’t have their own considerable pitfalls. Programmers who work exclusively in Python kind of seem to have trust issues, and understandably so.
The problem is that no matter how good Poetry gets, it still has to deal with deficiencies that exist in the ecosystem. For example, having lockfiles are great, but they don’t help you if the packages themselves specify poor/incorrect package version bounds when you come to refresh your lockfiles (and this is something I’ve been bitten by personally).
That’s not a python-specific issue though. It’s not even python-like issue. You’ll have the same problem with autoconf / go.mod / cargo / any other system where people have to define version bounds.
if I create a go.mod in my repo and you clone that repo and run “go build” you will use the exact same dependencies I used and you cannot bypass that. I cannot forget to add dependencies, I cannot forget to lock them, you cannot accidentally pick up dependencies that are already present on your system
Keep in mind that Go and Rust get to basically ignore the difficulty here by being static-linking-only. So they can download an isolated set of dependencies at compile time, and then never need them again. Python’s import statement is effectively dynamic linking, and thus requires the dependencies to exist and be resolvable at runtime. And because it’s a Unix-y language from the 90s, it historically defaulted to a single system-wide shared location for that, which opens the way for installation of one project’s dependencies to conflict with installation of another’s.
Python’s venv is an attempt to emulate the isolation that statically-linked languages get for free.
And my point is that a lot of the things people complain about are not build-time issues, and that Go gets to sidestep them by being statically linked and not having to continue to resolve dependencies at runtime.
Isolation at build time is extremely easy – it can be as simple as just downloading everything into a subdirectory of wherever a project’s build is running. And then you can throw all that stuff away as soon as the build is done, and never have to worry about it again.
Isolation at runtime is far from trivial. Do you give each project its own permanent isolated location to put copies of its runtime dependencies? Do you try to create a shared location which will be accessed by multiple projects (and thus may break if their dependencies conflict with each other)?
So with runtime dynamic linking you could, to take one of your original examples, “accidentally pick up” things that were already on the system, if the system uses a shared location for the runtime dynamically-linked dependencies. This is not somehow a unique-to-Python problem – it’s the exact same problem as “DLL hell”, “JAR hell”, etc.
Isolation at runtime is far from trivial. Do you give each project its own permanent isolated location to put copies of its runtime dependencies? Do you try to create a shared location which will be accessed by multiple projects (and thus may break if their dependencies conflict with each other)?
But the same issues exist with managing the source of dependencies during build time.
Yeah, I’m not seeing anything different here. The problem is hard, but foisting it on users is worse.
The project-specific sandbox vs disk space usage recurs in compiled langs, and is endemic to any dependency management system that does not make strong guarantees about versioning.
No, because at build time you only are dealing with one project’s dependencies. You can download them into an isolated directory, use them for the build, then delete them, and you’re good.
At runtime you may have dozens of different projects each wanting to dynamically load their own set of dependencies, and there may not be a single solvable set of dependencies that can satisfy all of them simultaneously.
You can put them into an isolated directory at runtime, that’s literally what virtualenv, Bundler’s deployment mode or NPM do.
And at build time you don’t have to keep them in an isolated directory, that’s what Bundler’s standard mode and Go modules do. There’s just some lookup logic that loads the right things from the shared directories.
The point is that any runtime dynamic linking system has to think about this stuff in ways that compile-time static linking can just ignore by downloading into a local subdirectory.
Isolated runtime directories like a Python venv or a node_modules also don’t come for free – they proliferate multiple copies of dependencies throughout different locations on the filesystem, and make things like upgrades (especially security issues) more difficult, since now you have go track down every single copy of the outdated library.
It might be possible to have this issue in other languages and ecosystems, but most of them avoid them because their communities have developed good conventions and best practices around both package versioning (and the contracts around versioning) and dependency version bound specification, whereas a lot of the Python packages predate there being much community consensus in this area. In practice I see very little of it comparatively in say, npm and Cargo. Though obviously this is just anecdotal.
Pretty sure it’s not possible to have this issue in either of your two examples; npm because all dependencies have their transitive dependencies isolated from other dependencies’ transitive dependencies, and it just creates a whole tree of dependencies in the filesystem (which comes with its own problems), and Cargo because, as @mxey pointed out (after your comment), dependencies are statically linked into their dependents, which are statically linked into their dependents, all the way up.
This has been a big problem in the Haskell ecosystem (known as Cabal hell), although it’s been heavily attacked with Stack (a package set that are known to all work together), and cabal v2-* commands (which builds all the dependencies for a given project in an isolated directory), but I don’t think that solves it completely transitively.
@mxey pointed out (after your comment), dependencies are statically linked into their dependents, which are statically linked into their dependents, all the way up.
That’s not true for Go. Everything that is part of the same build has their requirements combined, across modules. See https://go.dev/ref/mod#minimal-version-selection for the process. In summary: if 2 modules are part of the same build and they require the same dependency, then the higher version of the 2 specified will be used (different major versions are handled as different modules). My point was only that it’s completely reproducible irrelevant of the system state or the state of the world outside the go.mod files.
Ah, I misunderstood your comment and misinterpreted @ubernostrum’s response to your comment. Thanks for clarifying. Apologies for my lack of clarity and misleading wording.
To be clear, I’m not talking about transitive dependencies being shared inappropriately, but the much simpler and higher level problem of just having inappropriate dependency versioning, which causes the packages to pick up versions with breaking API changes.
For example, having lockfiles are great, but they don’t help you if the packages themselves specify poor/incorrect package version bounds when you come to refresh your lockfiles (and this is something I’ve been bitten by personally).
Are you talking about transitive dependencies being upgraded with a major version despite the parent dependency only being upgraded by a minor or patch version because of the parent dependency being too loose in their version constraints? Are you saying this is much more endemic problem in the Python community?
As you say, incorrect version specification in dependencias allowing major version upgrades when not appropriate - this is something I rarely if ever see outside Python.
A failure of common understanding of the contracts around versioning, either by a maintainer who doesn’t make semver-like guarantees but downstream consumers who assume they do, or the accidental release of breaking changes when not intended. This happens everywhere but I (anecdotally) encounter it more often with Python packages.
npm because all dependencies have their transitive dependencies isolated from other dependencies’ transitive dependencies
npm has had dedupe and yarn has had --flat for years now.
Go handles it by enforcing that you can have multiples of major versions but not minor or patch (so having both dep v1.2.3 and v2.3.4 is okay, but you can’t have both v1.2.3 and v1.4.5).
npm has had dedupe and yarn has had --flat for years now.
I was unaware of that, but is it required or optional? If it’s optional, then by default, you wouldn’t have this problem of sharing possibly conflicting (for any reason) dependencies, right? What were the reasons for adding this?
I have mixed feelings about Poetry. I started using it when I didn’t know any better and it seemed like the way to go, but as time goes on it’s becoming evident that it’s probably not even necessary for my use case and I’m better served with a vanilla pip workflow. I’m especially bothered by the slow update and install times, how it doesn’t always do what I expected (just update a single package), and how it seems to be so very over-engineered. Anthony Sottile of anthonywritescode (great channel, check it out) has made a video highlighting why he will never use Poetry that’s also worth a watch.
If you have an article that summarizes the Poetry flaws I’d appreciate it (I’m not a big video person). I’ll defer to your opinion here since I’m not as active in Python development as I was a few years ago, so I haven’t worked with a lot of the newer tooling extensively.
But I think that further complicates the whole Python dependency management story if Poetry is heavily flawed. I do remember using it a few years back and it was weirdly tricky to get working, but I had hoped those issues were fixed. Disappointing to hear Poetry is not holding up to expectations, though I will say proper dependency management is a gritty hard problem, especially retrofitting it into an ecosystem that has not had it before.
Sure, here’s what he laid out in the video from his point of view:
he ran into 3 bugs in the first 5 minutes when using it for the first time back in 2020, which didn’t bode well
it pulls in quite a few dependencies (45 at the time of writing this, which includes transitive dependencies)
create virtual environment
pip install poetry
pip freeze --all | wc -l
it by default adds dependencies to your project that automatically would result in updates up to either a major or minor version bump, depending on the initial version
for example python = "^3.8", which is equivalent to >= 3.8, <4
this causes conflicts with dependencies of libraries that are often updated and with those that aren’t
he mentions requests specifically
pip already has a dependency resolver and a way to freeze requirements and their very specific versions
i.e. use == and not use caret or tilde versioning
he also shouts out ‘pip-tools’ here, which I haven’t used myself for the sake of keeping things simple
the maintainers of Poetry have done something weird with how they wanted to deprecate an installer, which has eroded trust (for him)
they essentially introduced a 5% chance that any CI job that used get-poetry.py (their old way of installing Poetry) would fail to get people to move away from using that script and if you weren’t in CI then the script would just fail
this is terrible because it introduces unnecessary flakiness in CI systems and does not give people time to actually migrate away in their own time, but rather forces it upon them
I have used pip-tools and it is my favorite way of doing dependency management in Python, but it’s also part of the problem because I have a solution for me, so it doesn’t matter that the core tools are user hostile. The Python core team should really be taking ownership of this problem instead of letting it dissolve into a million different little solutions.
the maintainers of Poetry have done something weird with how they wanted to deprecate an installer, which has eroded trust (for him)
I don’t wish to ascribe malice to people, but it comes off as contemptuous of users.
Infrastructure should be as invisible as possible. Poetry deprecating something is Poetry’s problem. Pushing it on all users presumes that they care, can act on it, and have time/money/energy to deal with it. Ridiculous.
Absolutely, very unprofessional. Is the tool deprecated? Just drop the damn tool, don’t bring down my CI! You don’t want future versions? Don’t release any!
Poetry is here though and is ready to use. There are good reasons to not make things included and frozen in upstream distribution. For example rubygems is separate from ruby. Cargo is separate from the rust compiler. The Python project itself doesn’t have to do anything here. It would be nice if they said: this is the blessed solution, but it doesn’t stop anyone now.
Another commenter posted about the issues with Poetry, which I take as it not being quite ready to use everywhere. I think not having a blessed solution is a big mistake, and one that the JS ecosystem is also making (it’s now npm, yarn, and some other thing) — it complicates things for no discernible reason to the end user.
While Cargo and rubygems may be separate from the compiler/interpreter, they are also closely linked and developed in sync (at least I know this is the case for Cargo). One of the best decisions the Rust team made was realizing that a language was its ecosystem, and investing heavily in the tooling that was best in class. Without a blessed solution from the Python team I feel as though the dependency management situation will continue as-is.
There was a time in the beforefore, when we didn’t have bundler, and ruby dependency management was kind of brutal as well. I guess there is still hope for python if they decide to adopt something as a first-class citizen and take on these problems with an “official” answer.
I tried to add advice about dependency and packaging tooling to my code style guide for Python. My best attempt exploded the size of the style guide by 2x the word count, so I abandoned the effort. I recently wrote about this here:
I’d really like to understand Rust and Cargo a little better, but I’m not a Rust programmer at the moment. Any recommendations to read about the cargo and crate architecture?
This is how I feel about most programming language conversations, to be honest. Especially with Go, people freak out about boilerplate, but they’re totally willing to accept some combination of default dynamic linkage, complicated build/dependency management tools, weak performance, weak ecosystems, slow developer velocity, etc. But having to write for {...} instead of map().reduce() is a deal breaker, apparently. 🤷♂️
I think much of the meme comes from the difficulty of obtaining the understanding, not of employing the best practice at a given point in time.
IME this is greatly exacerbated by an information environment soaked in recommendations from multiple generations/branches of tooling. The people who most need to be able to find the current best practices will be least equipped to tell the difference between cutting-edge and out-of-date.
That said, a post outlining the ecosystem is probably a good cure if people can find it.
Agreed on that repeating the meme is just being too lazy to call out which part specifically is broken.
That said, by far the biggest problem I have with Python dependencies is failures during installation of packages that compile C/C++ code that break on wrong headers, libraries, compiler versions etc. Most often with numpy, scipy and similar.
Now, is there hope the situation improves without migrating off of setup.py scripts? I doubt it.
Everything needs to migrate to something declarative/non-Turing-complete so that those edge cases are handled in the tool itself, once and for all packages. I’m not sure if setup.cfg/pyproject.toml cover all the use cases to become a complete setup.py replacement. Probably not (yet).
Yes, that’s the real issue. Django web devs can happily play with pure Python packages all day long and wonder why others are complaining until they try to build a package with C code from source.
Numpy, for example, goes full clown world – last time I checked you needed a Fortran compiler to build it.
God have mercy on your soul if you try to pip install a package with C dependencies on a more niche Linux system (maybe this only applies if you’re not using precompiled binaries?).
I don’t really get this. I can pip install numpy without a Fortran compiler. So can you! It has pre-built binaries, on the Python Package Index, for a bunch of popular operating systems and architectures. I can’t remember the last time I needed to actually build a package from source just to install and use it.
This is not quite inaccurate. Try to install it for example in alpine linux. Or try to install a very new version.
Or use a less common feature not included in the default pre-built packages
’‘pip install’’ does rely on compilers for countless packages.
The packaging ecosystem supports (as of PEP 656) binary packages compiled for distributions using musl. So if you find a package that requires you to install a bunch of compilers on Alpine, it is explicitly not the fault of Python’s packaging tooling – it’s the fault of a package author who didn’t provide a built binary package for musl.
As for “very new version”, numpy already appears to have built binary packages available on PyPI for the as-yet-unreleased Python 3.11.
And if you’re going to customize the build, well, yeah, you need to be able to do the build. I’m not sure how Python is supposed to solve that for you.
Now, is there hope the situation improves without migrating off of setup.py scripts? I doubt it.
No. You already explained in your second paragraph what the problem is. All this talk about poetry or praising package managers of other languages as if the better experience from those wouldn’t heavily rely on the fact that those who use them, do so in a very limited scope.
Remove all the packages that rely on the binary C libraries that do the heavy lifting, et voila, you are at the level of smoothness as other package managers. I would even argue that python virtual environments are unmatched in other languages, if anything.
Which other language provides a high level interface to cuda with a easier set up? Which other language provides functionality equivalent numpy with a smoother setup?
C libraries are installed in different ways in different systems. Pip will try to compile C and this will horribly degrade in the absence of the exact ABI on some dependency the package expects.
I am not really sure this has a solution. Support for different operative systems is important.
You’ve made a couple of points which I don’t necessarily agree with but let me just focus on the original one in this thread.
C libraries are installed in different ways in different systems. Pip will try to compile C and this will horribly degrade in the absence of the exact ABI on some dependency the package expects.
I am not really sure this has a solution. Support for different operative systems is important.
As @ubernostrum has pointed out in a different thread, it’s best if a package comes with a precompiled binary wheel (and statically linked at that). Installation is great then and it’s probably the solution. That said, I can’t install e.g. PyQt5 (i.e. not a unpopular package) on a M1 Mac at the time of writing this. curl https://pypi.org/pypi/PyQt5/json tells me there’s no apposite binary package for it.
The next best thing IMO is to vendor C/C++ library source code (either through a git submodule or even try to download the sources on the fly maybe) and attempt compiling it so that it doesn’t depend on any system-installed headers and libraries. This presupposes a presence of a working C/C++ compiler on the system which I think is acceptable.
What’s fraught with peril though, is to assume that the relevant library (along with its headers) is preinstalled on the system and attempt linking with it. This is just too complicated of a problem to be able to handle all the cases correctly. (1) It relies on the presence of some external means of installing libraries, so it conceptually takes you outside of the Python package manager into apt/brew/etc territory. (2) Your Python project can break when you upgrade the library if you dynamically linked with it, or it can break silently due to an incompatibility if you’re linked statically with it (equally bad). (3) The Python package has no control over the exact version of the system dependency so caveats abound. (4) Your Python project can break if you uninstall the system library, since there’s nothing preventing you from doing that.
If I could change one thing about the the Python ecosystem, it would be to push for the first two solutions to be preferred by package authors.
Yeah, at some level, Python’s deepest packaging problem is that Python performance stinks, so you need to use C-wrappers for math and whatnot, but once you do you run into the fact that however bad Python packaging is, at least it exists, which is more than C/C++ can say.
OTOH, Zig is just one guy, and I haven’t used it personally, but it seems like it can handle a lot of C cross compilation stuff? So ideally, the Python core team should just say “C/C++ is part of our domain, so we’re going to include the equivalent of zig build with CPython to making packaging work.”
Just FYI, Zig has a language creator but it is not “just one guy” any longer. It is being developed with the help of the ZSF, https://ziglang.org/zsf/, which is now paying for core contributions beyond the language creator. There is also a small group of core committers contributing to the language and fixing bugs. It is being very actively developed and their next release will be the “self hosted compiler” (the Zig compiler will itself be written in Zig and built with Zig), which will likely also make it easier for new contributors to join the project. There are also some core contributors beyond the language creator who have started to specialize on Zig’s use cases as a x-platform C compiler, for example by focusing on areas like libc improvements (shorthand: zlibc) and on the various target platforms beyond Linux, like macOS, Windows, WebAsm, embedded, etc. It’s likely true today that Zig is the world’s best x-platform C build toolchain, and that will likely only get better and better over time due to the vitality, pace, and mission of the open source project (especially compared to the alternatives).
Your idea about how Zig could help the Python C source build process is a good one. There is a nice post here about maintaining C code and C cross-compilation issues with Zig:
Unfortunately some Python code relies on C++, Fortran, and other exotic languages to build sources, since the Python C extension module interface is so open and so many other languages can expose something that speaks the C ABI. But you’re right that C modules written in C are most often the culprit.
The creator of the language also talked about some of the technical details behind zig cc here:
OTOH, Zig is just one guy, and I haven’t used it personally, but it seems like it can handle a lot of C cross compilation stuff? So ideally, the Python core team should just say “C/C++ is part of our domain, so we’re going to include the equivalent of zig build with CPython to making packaging work.”
I think that’s a fantastic way to improve the Python packaging status quo 1, 2! Zig is a game changer when it comes to portability.
I’ve had Sublime Text plugins broken in subtle weird ways because of outdated/incompatible Python dependencies in its custom environment that I don’t even know how to manage.
I keep running into problems with npm dependencies that use gyp that uses Python. Why would they use Python in a JS tool I don’t know, but they want a specific Python version which my OS doesn’t have.
I install Ansible via homebrew, which means it wants whatever Brew-installed Python things there are, and I need to be careful not to use any other Python software that would mess with them (which happens if I run a Python tool from elsewhere and forget to jail it in its own env).
Scripts still can’t decide if it’s python or python3.
Then there’s the dozen of various envs and package managers, and I lost track which one of them in the current one and which ones are the old crappy ones that will make things worse.
It all is probably super obvious to people who are into python, but as someone who isn’t actively involved it’s just an endless source of problems testing my patience.
I think the majority of people who propagate this meme (which thankfully I see less and less) are largely dogpiling and not in touch with the Python ecosystem (it’s trendy to hate).
On the other hand I don’t think the camp advocating to “just learn the tools and remember to do X Y Z” help the situation because there is definitely a problem to be addressed. There’s some truth to the meme after all.
Looking at other languages for inspiration like node + npm I wonder whether the root cause is just providing sensible defaults. In python land, package installations (pip install) aren’t local to your project by default (unless you create + run inside a virtual env). I think it’s just this extra step tripping newer/unfamiliar/junior developers up
I don’t see the same kind of hate for node/npm for throwing stuff into node_modules by default (except criticisms about the folder size). And the node ecosystem has similar approaches for managing node versions like nvm/n similar to pyenv (so I don’t think that’s a point of friction either).
I’ve been programming professionally in Python for 15 years, and I’ve genuinely tried to find a happy path in Python performance and package management and so on. It feels like being gaslit when people continuously promise some tool will fix my problem, and then I sink a bunch of time only to find a different set of unworkable pitfalls. I’m sure I haven’t tried every tool in the solution space, but writing me (and others) off as “out of touch” or “it’s trendy to hate” feels pretty cruel.
Notably, in Go and Rust, there is a single build tool with minimal configuration, and they work right out of the box just about every time on just about every system. Similarly, the performance of those language implementations is better than Python performance by several orders of magnitude out of the box and there’s usually headroom to easily optimize. I’m sure there are other languages that behave similarly–it really feels like Python is just stuck.
So, as someone who has been around for multiple generations of the Python packaging saga, what I think has happened is:
There was a time when the standard default tooling was really bad.
Then there was a time when the standard default tooling was kind of OK, and you could mostly do most common things.
Then there was a time when the standard default tooling actually became just generally OK, if a bit low-level and lacking a unified single-entry-point interface.
During (1), people ironically mostly didn’t complain because that was an era when expectations were low and nobody really thought Python was going to develop an equivalent to CPAN and its tooling overnight. Also, many popular packages, for many years, avoided having external dependencies because the support for properly pulling those in at install time was not good.
During (2) and into the early part of (3), a lot of people developed their own ad-hoc tooling to suit their specific use cases, and wrote about it. And then for every ad-hoc tool, two other people would build their own things on top of that, and four more on top of those, and so on. So you got this huge proliferation of personal setups that worked for one person’s use case, and came up high in the Google search results for common questions.
Now we’re at (3), and the main issue is the inertia of all those “don’t use the default tooling, use this thing that only works for my codebase” posts. If you cobble together three or four of them, you might get something that actually kinda-sorta works, but it won’t be nice, and you’ll be using a bunch of different incompatible tools which each were designed with the assumption of being the one and only tool you’d use.
Meanwhile the standard default tooling – setuptools for writing package manifests and producing the artifacts, pip for installing packages, venv for project isolation – has just been quietly improving int he background the whole time. I highly recommend ignoring anyone and everyone who tells you to use something else, and instead going back to those tools. Earlier this year, I wrote a guide to doing that. And a couple of years ago I wrote an explanation of the default tooling and the use cases each tool fulfills, if you want that intro first.
Yea, I think you have the community timeline pretty much right. And I, too, recently wrote a guide to “uncontroversial” Python tooling, looks like it aligns quite a bit with your recommendations. That is, pip, venvs, pip-tools, requirements.txt for dependencies and pyproject.toml + PyPA recommendations for package publishing.
“How Python programmers can uncontroversially approach build, dependency, and packaging tooling”:
but writing me (and others) off as “out of touch” or “it’s trendy to hate” feels pretty cruel.
I feel like I fall into the same camp as yourself. I’ve tried a bunch of tools and still think they fall short. So I didn’t intend to cause offence.
I just don’t think dogpiling on the ecosystem is productive. I also feel like much of the criticism (and this memes popularity) stems from casual onlookers outside the ecosystem.
I feel like I fall into the same camp as yourself. I’ve tried a bunch of tools and still think they fall short. So I didn’t intend to cause offence.
Thanks, I appreciate the clarification. No harm, no foul. 👍
I just don’t think dogpiling on the ecosystem is productive. I also feel like much of the criticism (and this memes popularity) stems from casual onlookers outside the ecosystem.
I think it’s productive insofar as it advertises pitfalls to people who are considering the Python ecosystem. It’s not constructive to have everyone sinking considerable time and energy into Python only to fall into the same traps over and over. It can also be productive if it increases awareness or pressure on the maintainers to address the issue (yes, that may mean breaking changes). That said, I’m sure the quality of the criticism varies, and some are surely (and understandably) just venting steam.
Granted, I’m not a JS or Python developer by trade, but node/npm is probably the only package manager I’ve had more trouble with than pip and friends. While I agree that default project-local installs are desirable, I’m not sure if node/npm is the ideal we want to strive for.
Interesting (me neither tbh) 🤔. Perhaps not the best example.
I know npm sometimes gets a bad rep, but it’s never usually the packaging situation I see people complaining about (maybe it’s just overlooked / lost in the noise).
I still believe having saner defaults could at least help resolve some of the tooling complexity issues people love to complain about.
The need for local installs has been recognised and there’s a PEP for it https://peps.python.org/pep-0582/ so soon(*) we’ll get something similar to node_packages.
Unfortunately, it’s unlikely. That PEP is from 2018 and iirc is currently stalled. There are some cool projects like https://pdm.fming.dev/ built off it though.
it goes in the “con” column for choosing python for a new project
when people encounter these problems, it lets them know they’re not alone
for the first one, my workplace usually starts C# projects by default, largely for environment/dependency reasons. we use python for throw-away work, existing python projects, and when there’s a python library that saves us a lot of effort.
the second one can be especially important, because python also has a reputation for being easy. many people get stuck on dependency hell and assume that they are the problem
(rust is also very good at this problem, but i basically never want to use rust for a problem i was considering python for. it’s like when a vegan tells you that a banana is a good substitute for an egg, but you don’t want bananas benedict)
It’s a real shame that Python is so much more often used for this than Ruby. In a Ruby script, you can just drop require 'bundler/inline' and define fully-isolated gem dependencies as a prelude, in a single file. No hassle whatsoever.
The author is right that this isn’t black magic nor is it specific to Python. But it is a source of continual annoyance to developers. You get confused about where you are and what python/ruby/whatever you are using is and stuff breaks in weird ways. You can absolutely figure it out but it doesn’t take away from the fact that daily use means you’ll stub your toe somewhat frequently because dependency management is a hard problem that has to be tailored to each codebase.
It worse than this, because it impacts many more people than Python developers. There are tons of Python scripts being used as utilities across multiple operating systems that advanced users will come into contact with on a regular basis. I’m frequently having to battle this problem, despite not ever developing Python.
I think that python dependency/project/environment management still has plenty of room for improvement, but it has also made steady progress in recent years. The slow evolution can be confusing for users, as someone else mentioned.
Wheels have made installing packages with non-python dependencies much easier. This has removed the need for something like conda/mamba for all of my projects.
Python adopted pyproject.toml as a centralized project configuration file. tomlib will be part of the standard library in the next version of python which will remove any external dependencies.
Pip got a new dependency resolver.
After installing python on any system, I start with pip install pipx so that I can install any global python based tools in isolated environments.
I’ve recently started to use hatch (installed with pipx, of course) to manage projects. It pieces together some existing tools in the python packaging ecosystem and helps enforce best practices. Think of something like black for project management. It tries not to re-invent the wheel like poetry does and fits well/uses existing tools. It is also part of the Python Packaging Authority (like pipx), which feels promising. One of my favorite parts about hatch compared to npm or typical venv usage is that it keeps the virtual environment out of the project folder.
$ hatch new my-project
$ cd my-project
$ hatch shell
(my-project) $
Again, this doesn’t solve everything and I’m still experimenting to find the best workflow for myself, but I do see things getting better.
Out of curiosity: why do you feel the need to use Hatch instead of rolling the pyproject.toml file by hand, using plain virtual environments included in the standard library, and pip on top of that? I feel, personally, that would lower the barrier of entry significantly to new users of your project when the setup guide (i.e. create a virtual environment and here’s the command, activate it and here’s the command, and install the requirements and here’s the command) is so simple and is more a part of vanilla Python as opposed to install this tool that you may have never heard of because I, the project overlord (tongue in cheek), deemed it so.
With each of these additions to the package and project management ecosystem (e.g. Hatch, PDM, and Poetry) I feel like I’m utterly clueless in not understanding why so many people jump on board. Surely, there have to be outsized benefits that I’m not privy to or not am exposed to in my workflows.
Hatch uses plain virtualenvs and pip already. Someone could definitely contribute without using hatch. Poetry feels much more heavy handed to me. https://hatch.pypa.io/latest/meta/faq/
Lazy developers that don’t solve the dependency management problem for their user(s) (i.e. they don’t make and build binaries, zipapp or use other dependency tools)
Lazy developers that don’t keep their own build time dependency management problem for other developers sane.
Both of these are easily solved by less lazy developers and/or forcing the ecosystem to ensure developers can’t be lazy. Go, Rust, etc all went the 2nd way, by ensuring developers can’t be too lazy, by making the default in the ecosystem the only sane way to do it. That generally means anything outside of the blessed way gets hard to very hard, depending on the language/ecosystem.
Python obviously doesn’t enjoy those same defaults. Could the Python ecosystem be “fixed”, of course, it’s not a technical problem. So far nobody has managed to do the hard work and make it happen, so I’m not optimistic.
Some applications, like Calibre do the work for their user(s) and include binaries for the platforms they want to support. Most python developers seem to lean on other people to fix the distribution problem. It’s not even that hard to build binaries for the different platforms to solve the user side of this, many different tools exist to solve these problems.
The developer side of the problem has plenty of tools also(poetry, pip-tools, etc).
So I’m sure I’ll get a lot of flack, but my perspective is, it’s a lazy developer problem and a lack of consensus problem(as far as getting the defaults fixed).
The TL;DR of the talk is that I use make + pyenv + poetry to ship data science pipelines, libraries, and some CLI tools. Make is the UI that tells pyenv which Python to install and then poetry does the dependency management, building, and publishing. I make a point in the talk that one could use Conda-managed Python instead, if your use of it complies with their licensing (corps gotta pay, ours does).
All of this to get the ~same out-of-the-box experience I had in Scala— the stack in which I’d worked for nearly a decade before moving to a Python team— for a decade.
The venv stuff is easy. Although mostly unique to python, it is not a difficult very problem to solve. When i dunk on python dependency management i refer to the issues that have been plaguing the ecosystem due to shortcomings in its design. I am sorry for the incoming rant.
Pip used to ignore version conficts, which has resulted in package ecosystem having bonkers version constraints. I’ve found it common for packages which require compilation to build, to have an upper python version constraint, as each new python version is likely to break the build. The most common drive-by PR i do is bumping the upper python version bound.
Poetry, although a major improvement for reproducibility, is in my opinion a bit too slow (poetry –help takes 0.7 seconds) and unstable (Of course this poetry version wipes the hashes in my lockfile!), has poor support for 3rd party package repositories, and does not even support the local-version-identifier part of the version schema correctly, which has resulted in people overriding some packages in poetry venv using pip.
Every python package manager (other than conda, which is fully 3rd party) is super slow during dependency resolution, as it can’t know what the subdependencies of a package are without first downloading the the full package archive and extracting the list of dependencies (pypi issue here from 2020), which is incredibly fun when dealing with large frameworks such as pytorch, tensorflow, scipy and numpy, where each wheel is at least a gigabyte in size.
For source distributions, dependencies are usually defined by setup.py, which must be executed to allow us to inspect its dependencies. This of course cannot be cached on pypi, as it is possible for setup.py to select its dependencies depending on the machine it runs on.
Then there is the setup.py build scripts which never seem to quite work on any of my machines. Some build scripts only ever work in docker environments I would never have been able to reproduce had it not been for dockerhub caching images build with distros whose repositories now have gone offline. This is especially becomes a problem when the prebuilt binary packages, made available by the package author, typically don’t target ARM and/or musl based platforms.
My pretty strong belief has been that OS-level installs of Python are a major source of all of these issues. Apt and homebrew showing up with system installs means that you get into so many weird scenarios (especially if you happen to have one script somewhere with a badly-formatted shebang).
There is also an issue not covered in this, which is packaging for distribution to other systems. There are lots of packaging tools out there, but it’s still pretty tough to package things that aren’t really pure python. Even avoiding dependencies you get into issues! urllib2 installs on a clean Mac machine don’t work with HTTPS (you have to run some script first…), so things are not cut out for you.
My recent idea is to have a binary like cargo, where you run the binary on your project, it will set up an environment to run your script, but first will completely wipe all Python-related env vars, and then set up as hermetic a thing as possible. It’s a ball of mud solution, but it’s nicer to me than the current state of the art (Docker, basically).
But really I think that beyond pip + requirements.txt, a lot of the new tooling simply is broken way too often. Poetry is kinda buggy, pipenv would just randomly hang all the time…. I think everyone would be very happy to adopt something that works without having major changes in workflows.
And hey, motivated people would quickly send patches to upstream libraries! Everyone wants Python to work nicely, just so far people aren’t actually showing up with nice solutions (beyond a lot of the amazing work with wheels and the like, of course. Talking more about the top-level binaries).
The only thing I have never really understood is Python deployment, but I’ve never understood deploying anything that isn’t compiled, really. The best non-Docker mode of deployment is rolling your own service files, or pex or something?
It’s extremely context specific. The answer is different if you embed (language) as a scripting language in your app, different if you’re deploying server software, different if it’s an end-user app, etc. That applies to pretty much every scripting language with a separate runtime though.
Okay but let’s say I have a python module, __init__.py, maybe an app.py and that’s it in my project to deploy to a server - I wouldn’t what to do other than write a systemd unit file, and I don’t know if there’s anything else one can or should do. I’ve never seen a good explanation for this Now that I think about it, actually I ended up using dh-virtualenv most of the time. Ignore me entirely I guess
If you wrote the app and it’s simple, you can likely copy the files and use the system python to run them. Add a virtualenv if you need dependencies. Add pyenv if you need a specific version of python.
PEX was indeed meant for this scenario, but asides that then you usually just end up either using these tools to build a virtualenv in the deployment environment, or you build the virtualenv in a staging area which is identical to the production environment and then deploy that as an artefact. There’s a reason lots of people gave in and just started using containers.
We dabbled with pex, and it worked okay except the target still needed to have the right version of Python installed, and pex doesn’t know about any shared object dependencies, so you also need those as well. It’s a pretty leaky abstraction. We also tried using these in AWS Lambda functions a few years back and relatively simple closures would bust the 250MB lambda size limit (I think the limit has since been lifted)–pandas alone was something like 70mb. Meanwhile, an equivalent Go program weighed in at like 16mb (vs 250+mb in Python).
I don’t know if I’ve just been supremely lucky with what dependencies I’ve chosen, but I’ve always found Python dependency management pretty straightforward.
Is just about everything you need. It just works. It’s not significantly different to any other language. I don’t need to learn the intricates of how pip works, or to memorise a load of flags. There’s just one flag there!
If installing dependencies fails due to needing a C compiler or whatever, I’ll just bring one into the path and try again. To generate a lock file it’s just pip freeze > requirements-freeze.txt. If you install from the lock file, you get exactly the same versions of everything again. Simple.
Pretty nice article that came just in time when I was swearing about building a conda environment for our jupyter instance. I just can’t find a set of packages that fit together, and messages I get are not that helpful. I had to switch to pip-compile to get a good overview what versions/packages have conflicting dependencies. And after reading this thread it occured to me that there is another pitfall of python’s packaging tools: if I try to create an environment in one go (a’la pip-compile followed by pip-sync) I will get failures. If I use pip, it will happily install packages with conflicting dependencies – only if I install them one by one, it will warn me. That’s not what I would expect from packaging tools.
That’s all well and good until you run into a missing wheel, which leads to a compilation error due to a missing header file. Or until you need to upgrade a package to fix a bug, but there were unrelated breaking changes because Python package devs don’t follow anything like SemVer, and break backwards compatibility constantly. Or until you end up with CUDA version mismatches, or similar dynamic linking issues with other system libraries.
Those issues are not related to python. You’ll run into each thing you listed when compiling C, Rust, Ruby, or whatever else you use. You always have to deal with external dependencies and version issues regardless of environment.
I think you mean they’re not exclusive to python, which is certainly true. And yet they seem to be far more pervasive with Python than with Rust, at least.
Kind of. What I mean is that this affects development in general. They’re not toolkit issues, but toolkits can help workaround them. This is what I mean about the meme. You it’s not python vs rust, but rather environment existing for decades vs environment with explosive growth in last years + applying patterns at learned recently. You can apply semver to anything you want - it’s not relevant whether you’re using python. But most python projects were created before semver was described.
This talks about Poetry. And about using venv directly. But what about pipenv? I remember that being pushed quite heavily. Does it do the same as Poetry? Or the same as venv? Or both? Neither? Should I use Poetry in addition to pipenv? Why do we need all these different tools? Are they complementary or competing? If they’re competing, is there a community consensus about what’s “right”/“best”?
Pipenv lives in the same “dependency and environment manager” area as poetry. Choose either one you like if you’re choosing for yourself, or use the one used by upstream package otherwise.
For the differences, you could think of poetry as pipenv with extra nice features. For example automatic “pipenv clean” equivalent.
I feel like this entire post reinforces just how difficult Python dependency management is. I’m a huge fan of Python, despite all of its flaws, but the dependency management is horrible compared to other languages. Nobody should have to learn the intricacies of their build tools in order to build a system, nor should we have to memorize a ton of flags just for the tools to work right. And this isn’t even going into the issue where building a Python package just doesn’t work, even if you follow the directions in a README, simply because of how much is going on. It is incredibly hard to debug, and that is for just getting started on a project (and who knows what subtle versioning mistakes exist once it does build).
I think Cargo/Rust really showed just how simple dependency management can be. There are no special flags, it just works, and there are two tools (Cargo and rustup) each with one or two commands you have to remember. I have yet to find a Rust project I can’t build first try with
Cargo build
. Until Python gets to that point, andpoetry
is definitely going down the right path, then Python’s reputation as having terrible dependency management is well deserved.Completely agree. I’ve been writing Python for 15 years professionally, and it’s a form of psychological abuse to keep telling people that their problems are imaginary and solved by switching to yet another new dependency manager, which merely has a different set of hidden pitfalls (that one only uncovers after spending considerable time and energy exploring).
Every colleague I’ve worked with in the Python space kind of feels jaded by anyone who promises some tool or technology can make life better, because they’ve been so jaded by this kind of thing in Python (not just dependency management, but false promises about how “just rewrite the slow bits in C/Numpy/multiprocessing/etc” will improve performance and other such things)–they often really can’t believe that other languages (e.g., Go, Rust, etc) don’t have their own considerable pitfalls. Programmers who work exclusively in Python kind of seem to have trust issues, and understandably so.
The problem is that no matter how good Poetry gets, it still has to deal with deficiencies that exist in the ecosystem. For example, having lockfiles are great, but they don’t help you if the packages themselves specify poor/incorrect package version bounds when you come to refresh your lockfiles (and this is something I’ve been bitten by personally).
That’s not a python-specific issue though. It’s not even python-like issue. You’ll have the same problem with autoconf / go.mod / cargo / any other system where people have to define version bounds.
if I create a go.mod in my repo and you clone that repo and run “go build” you will use the exact same dependencies I used and you cannot bypass that. I cannot forget to add dependencies, I cannot forget to lock them, you cannot accidentally pick up dependencies that are already present on your system
Keep in mind that Go and Rust get to basically ignore the difficulty here by being static-linking-only. So they can download an isolated set of dependencies at compile time, and then never need them again. Python’s
import
statement is effectively dynamic linking, and thus requires the dependencies to exist and be resolvable at runtime. And because it’s a Unix-y language from the 90s, it historically defaulted to a single system-wide shared location for that, which opens the way for installation of one project’s dependencies to conflict with installation of another’s.Python’s
venv
is an attempt to emulate the isolation that statically-linked languages get for free.I described the situation for Go during build time, not during runtime.
And my point is that a lot of the things people complain about are not build-time issues, and that Go gets to sidestep them by being statically linked and not having to continue to resolve dependencies at runtime.
I don’t get the importance of distinguishing when linking happens. Are there things possible at build time that are not possible at runtime?
Isolation at build time is extremely easy – it can be as simple as just downloading everything into a subdirectory of wherever a project’s build is running. And then you can throw all that stuff away as soon as the build is done, and never have to worry about it again.
Isolation at runtime is far from trivial. Do you give each project its own permanent isolated location to put copies of its runtime dependencies? Do you try to create a shared location which will be accessed by multiple projects (and thus may break if their dependencies conflict with each other)?
So with runtime dynamic linking you could, to take one of your original examples, “accidentally pick up” things that were already on the system, if the system uses a shared location for the runtime dynamically-linked dependencies. This is not somehow a unique-to-Python problem – it’s the exact same problem as “DLL hell”, “JAR hell”, etc.
But the same issues exist with managing the source of dependencies during build time.
Yeah, I’m not seeing anything different here. The problem is hard, but foisting it on users is worse.
The project-specific sandbox vs disk space usage recurs in compiled langs, and is endemic to any dependency management system that does not make strong guarantees about versioning.
No, because at build time you only are dealing with one project’s dependencies. You can download them into an isolated directory, use them for the build, then delete them, and you’re good.
At runtime you may have dozens of different projects each wanting to dynamically load their own set of dependencies, and there may not be a single solvable set of dependencies that can satisfy all of them simultaneously.
You can put them into an isolated directory at runtime, that’s literally what virtualenv, Bundler’s deployment mode or NPM do.
And at build time you don’t have to keep them in an isolated directory, that’s what Bundler’s standard mode and Go modules do. There’s just some lookup logic that loads the right things from the shared directories.
The point is that any runtime dynamic linking system has to think about this stuff in ways that compile-time static linking can just ignore by downloading into a local subdirectory.
Isolated runtime directories like a Python venv or a
node_modules
also don’t come for free – they proliferate multiple copies of dependencies throughout different locations on the filesystem, and make things like upgrades (especially security issues) more difficult, since now you have go track down every single copy of the outdated library.It might be possible to have this issue in other languages and ecosystems, but most of them avoid them because their communities have developed good conventions and best practices around both package versioning (and the contracts around versioning) and dependency version bound specification, whereas a lot of the Python packages predate there being much community consensus in this area. In practice I see very little of it comparatively in say, npm and Cargo. Though obviously this is just anecdotal.
Pretty sure it’s not possible to have this issue in either of your two examples; npm because all dependencies have their transitive dependencies isolated from other dependencies’ transitive dependencies, and it just creates a whole tree of dependencies in the filesystem (which comes with its own problems), and Cargo because, as @mxey pointed out (after your comment), dependencies are statically linked into their dependents, which are statically linked into their dependents, all the way up.
This has been a big problem in the Haskell ecosystem (known as Cabal hell), although it’s been heavily attacked with Stack (a package set that are known to all work together), and cabal v2-* commands (which builds all the dependencies for a given project in an isolated directory), but I don’t think that solves it completely transitively.
That’s not true for Go. Everything that is part of the same build has their requirements combined, across modules. See https://go.dev/ref/mod#minimal-version-selection for the process. In summary: if 2 modules are part of the same build and they require the same dependency, then the higher version of the 2 specified will be used (different major versions are handled as different modules). My point was only that it’s completely reproducible irrelevant of the system state or the state of the world outside the go.mod files.
Ah, I misunderstood your comment and misinterpreted @ubernostrum’s response to your comment. Thanks for clarifying. Apologies for my lack of clarity and misleading wording.
To be clear, I’m not talking about transitive dependencies being shared inappropriately, but the much simpler and higher level problem of just having inappropriate dependency versioning, which causes the packages to pick up versions with breaking API changes.
Ah, I reread your original comment:
Are you talking about transitive dependencies being upgraded with a major version despite the parent dependency only being upgraded by a minor or patch version because of the parent dependency being too loose in their version constraints? Are you saying this is much more endemic problem in the Python community?
Well, it fits into one of two problem areas:
As you say, incorrect version specification in dependencias allowing major version upgrades when not appropriate - this is something I rarely if ever see outside Python.
A failure of common understanding of the contracts around versioning, either by a maintainer who doesn’t make semver-like guarantees but downstream consumers who assume they do, or the accidental release of breaking changes when not intended. This happens everywhere but I (anecdotally) encounter it more often with Python packages.
npm has had dedupe and yarn has had
--flat
for years now.Go handles it by enforcing that you can have multiples of major versions but not minor or patch (so having both dep v1.2.3 and v2.3.4 is okay, but you can’t have both v1.2.3 and v1.4.5).
I was unaware of that, but is it required or optional? If it’s optional, then by default, you wouldn’t have this problem of sharing possibly conflicting (for any reason) dependencies, right? What were the reasons for adding this?
I have mixed feelings about Poetry. I started using it when I didn’t know any better and it seemed like the way to go, but as time goes on it’s becoming evident that it’s probably not even necessary for my use case and I’m better served with a vanilla
pip
workflow. I’m especially bothered by the slow update and install times, how it doesn’t always do what I expected (just update a single package), and how it seems to be so very over-engineered. Anthony Sottile of anthonywritescode (great channel, check it out) has made a video highlighting why he will never use Poetry that’s also worth a watch.If you have an article that summarizes the Poetry flaws I’d appreciate it (I’m not a big video person). I’ll defer to your opinion here since I’m not as active in Python development as I was a few years ago, so I haven’t worked with a lot of the newer tooling extensively.
But I think that further complicates the whole Python dependency management story if Poetry is heavily flawed. I do remember using it a few years back and it was weirdly tricky to get working, but I had hoped those issues were fixed. Disappointing to hear Poetry is not holding up to expectations, though I will say proper dependency management is a gritty hard problem, especially retrofitting it into an ecosystem that has not had it before.
Sure, here’s what he laid out in the video from his point of view:
pip install poetry
pip freeze --all | wc -l
python = "^3.8"
, which is equivalent to>= 3.8, <4
requests
specificallypip
already has a dependency resolver and a way to freeze requirements and their very specific versions==
and not use caret or tilde versioningget-poetry.py
(their old way of installing Poetry) would fail to get people to move away from using that script and if you weren’t in CI then the script would just failI have used pip-tools and it is my favorite way of doing dependency management in Python, but it’s also part of the problem because I have a solution for me, so it doesn’t matter that the core tools are user hostile. The Python core team should really be taking ownership of this problem instead of letting it dissolve into a million different little solutions.
I don’t wish to ascribe malice to people, but it comes off as contemptuous of users.
Infrastructure should be as invisible as possible. Poetry deprecating something is Poetry’s problem. Pushing it on all users presumes that they care, can act on it, and have time/money/energy to deal with it. Ridiculous.
Absolutely, very unprofessional. Is the tool deprecated? Just drop the damn tool, don’t bring down my CI! You don’t want future versions? Don’t release any!
I wanted to just settle on Poetry. I was willing to overlook so many flaws.
I have simply never gotten it to work on Windows. Oh well.
Poetry is here though and is ready to use. There are good reasons to not make things included and frozen in upstream distribution. For example rubygems is separate from ruby. Cargo is separate from the rust compiler. The Python project itself doesn’t have to do anything here. It would be nice if they said: this is the blessed solution, but it doesn’t stop anyone now.
Another commenter posted about the issues with Poetry, which I take as it not being quite ready to use everywhere. I think not having a blessed solution is a big mistake, and one that the JS ecosystem is also making (it’s now npm, yarn, and some other thing) — it complicates things for no discernible reason to the end user.
While Cargo and rubygems may be separate from the compiler/interpreter, they are also closely linked and developed in sync (at least I know this is the case for Cargo). One of the best decisions the Rust team made was realizing that a language was its ecosystem, and investing heavily in the tooling that was best in class. Without a blessed solution from the Python team I feel as though the dependency management situation will continue as-is.
There was a time in the beforefore, when we didn’t have bundler, and ruby dependency management was kind of brutal as well. I guess there is still hope for python if they decide to adopt something as a first-class citizen and take on these problems with an “official” answer.
I tried to add advice about dependency and packaging tooling to my code style guide for Python. My best attempt exploded the size of the style guide by 2x the word count, so I abandoned the effort. I recently wrote about this here:
https://amontalenti.com/2022/10/09/python-packaging-and-zig
I’d really like to understand Rust and Cargo a little better, but I’m not a Rust programmer at the moment. Any recommendations to read about the cargo and crate architecture?
Pythonistas looking at Java Hello World:
Pythonistas looking at pip.
This is how I feel about most programming language conversations, to be honest. Especially with Go, people freak out about boilerplate, but they’re totally willing to accept some combination of default dynamic linkage, complicated build/dependency management tools, weak performance, weak ecosystems, slow developer velocity, etc. But having to write
for {...}
instead ofmap().reduce()
is a deal breaker, apparently. 🤷♂️Since Go 1.18 gave use generics it’s only a matter of time until
is solved as well for Go.
I think much of the meme comes from the difficulty of obtaining the understanding, not of employing the best practice at a given point in time.
IME this is greatly exacerbated by an information environment soaked in recommendations from multiple generations/branches of tooling. The people who most need to be able to find the current best practices will be least equipped to tell the difference between cutting-edge and out-of-date.
That said, a post outlining the ecosystem is probably a good cure if people can find it.
Agreed on that repeating the meme is just being too lazy to call out which part specifically is broken.
That said, by far the biggest problem I have with Python dependencies is failures during installation of packages that compile C/C++ code that break on wrong headers, libraries, compiler versions etc. Most often with numpy, scipy and similar.
Now, is there hope the situation improves without migrating off of
setup.py
scripts? I doubt it. Everything needs to migrate to something declarative/non-Turing-complete so that those edge cases are handled in the tool itself, once and for all packages. I’m not sure ifsetup.cfg
/pyproject.toml
cover all the use cases to become a completesetup.py
replacement. Probably not (yet).Yes, that’s the real issue. Django web devs can happily play with pure Python packages all day long and wonder why others are complaining until they try to build a package with C code from source.
Numpy, for example, goes full clown world – last time I checked you needed a Fortran compiler to build it.
The highest-performance mathematical libraries are written in Fortran, so numpy using it is pretty much a requirement for their use case.
God have mercy on your soul if you try to pip install a package with C dependencies on a more niche Linux system (maybe this only applies if you’re not using precompiled binaries?).
Niche Linux system I’ve had trouble compiling C dependencies on: Termux on Android.
But the picture on Windows is even worse. If a C dependency doesn’t have a binary wheel, you may as well give up.
I don’t really get this. I can
pip install numpy
without a Fortran compiler. So can you! It has pre-built binaries, on the Python Package Index, for a bunch of popular operating systems and architectures. I can’t remember the last time I needed to actually build a package from source just to install and use it.This is not quite inaccurate. Try to install it for example in alpine linux. Or try to install a very new version. Or use a less common feature not included in the default pre-built packages
’‘pip install’’ does rely on compilers for countless packages.
The packaging ecosystem supports (as of PEP 656) binary packages compiled for distributions using
musl
. So if you find a package that requires you to install a bunch of compilers on Alpine, it is explicitly not the fault of Python’s packaging tooling – it’s the fault of a package author who didn’t provide a built binary package formusl
.As for “very new version”,
numpy
already appears to have built binary packages available on PyPI for the as-yet-unreleased Python 3.11.And if you’re going to customize the build, well, yeah, you need to be able to do the build. I’m not sure how Python is supposed to solve that for you.
No. You already explained in your second paragraph what the problem is. All this talk about poetry or praising package managers of other languages as if the better experience from those wouldn’t heavily rely on the fact that those who use them, do so in a very limited scope.
Remove all the packages that rely on the binary C libraries that do the heavy lifting, et voila, you are at the level of smoothness as other package managers. I would even argue that python virtual environments are unmatched in other languages, if anything.
Which other language provides a high level interface to cuda with a easier set up? Which other language provides functionality equivalent numpy with a smoother setup?
C libraries are installed in different ways in different systems. Pip will try to compile C and this will horribly degrade in the absence of the exact ABI on some dependency the package expects.
I am not really sure this has a solution. Support for different operative systems is important.
You’ve made a couple of points which I don’t necessarily agree with but let me just focus on the original one in this thread.
As @ubernostrum has pointed out in a different thread, it’s best if a package comes with a precompiled binary wheel (and statically linked at that). Installation is great then and it’s probably the solution. That said, I can’t install e.g. PyQt5 (i.e. not a unpopular package) on a M1 Mac at the time of writing this.
curl https://pypi.org/pypi/PyQt5/json
tells me there’s no apposite binary package for it.The next best thing IMO is to vendor C/C++ library source code (either through a git submodule or even try to download the sources on the fly maybe) and attempt compiling it so that it doesn’t depend on any system-installed headers and libraries. This presupposes a presence of a working C/C++ compiler on the system which I think is acceptable.
What’s fraught with peril though, is to assume that the relevant library (along with its headers) is preinstalled on the system and attempt linking with it. This is just too complicated of a problem to be able to handle all the cases correctly. (1) It relies on the presence of some external means of installing libraries, so it conceptually takes you outside of the Python package manager into
apt
/brew
/etc territory. (2) Your Python project can break when you upgrade the library if you dynamically linked with it, or it can break silently due to an incompatibility if you’re linked statically with it (equally bad). (3) The Python package has no control over the exact version of the system dependency so caveats abound. (4) Your Python project can break if you uninstall the system library, since there’s nothing preventing you from doing that.If I could change one thing about the the Python ecosystem, it would be to push for the first two solutions to be preferred by package authors.
Yeah, at some level, Python’s deepest packaging problem is that Python performance stinks, so you need to use C-wrappers for math and whatnot, but once you do you run into the fact that however bad Python packaging is, at least it exists, which is more than C/C++ can say.
OTOH, Zig is just one guy, and I haven’t used it personally, but it seems like it can handle a lot of C cross compilation stuff? So ideally, the Python core team should just say “C/C++ is part of our domain, so we’re going to include the equivalent of zig build with CPython to making packaging work.”
Just FYI, Zig has a language creator but it is not “just one guy” any longer. It is being developed with the help of the ZSF, https://ziglang.org/zsf/, which is now paying for core contributions beyond the language creator. There is also a small group of core committers contributing to the language and fixing bugs. It is being very actively developed and their next release will be the “self hosted compiler” (the Zig compiler will itself be written in Zig and built with Zig), which will likely also make it easier for new contributors to join the project. There are also some core contributors beyond the language creator who have started to specialize on Zig’s use cases as a x-platform C compiler, for example by focusing on areas like libc improvements (shorthand: zlibc) and on the various target platforms beyond Linux, like macOS, Windows, WebAsm, embedded, etc. It’s likely true today that Zig is the world’s best x-platform C build toolchain, and that will likely only get better and better over time due to the vitality, pace, and mission of the open source project (especially compared to the alternatives).
Your idea about how Zig could help the Python C source build process is a good one. There is a nice post here about maintaining C code and C cross-compilation issues with Zig:
https://kristoff.it/blog/maintain-it-with-zig/
Unfortunately some Python code relies on C++, Fortran, and other exotic languages to build sources, since the Python C extension module interface is so open and so many other languages can expose something that speaks the C ABI. But you’re right that C modules written in C are most often the culprit.
The creator of the language also talked about some of the technical details behind
zig cc
here:https://andrewkelley.me/post/zig-cc-powerful-drop-in-replacement-gcc-clang.html
I think that’s a fantastic way to improve the Python packaging status quo 1, 2! Zig is a game changer when it comes to portability.
I’ve had Sublime Text plugins broken in subtle weird ways because of outdated/incompatible Python dependencies in its custom environment that I don’t even know how to manage.
I keep running into problems with npm dependencies that use gyp that uses Python. Why would they use Python in a JS tool I don’t know, but they want a specific Python version which my OS doesn’t have.
I install Ansible via homebrew, which means it wants whatever Brew-installed Python things there are, and I need to be careful not to use any other Python software that would mess with them (which happens if I run a Python tool from elsewhere and forget to jail it in its own env).
Scripts still can’t decide if it’s
python
orpython3
.Then there’s the dozen of various envs and package managers, and I lost track which one of them in the current one and which ones are the old crappy ones that will make things worse.
It all is probably super obvious to people who are into python, but as someone who isn’t actively involved it’s just an endless source of problems testing my patience.
I liked this article
I think the majority of people who propagate this meme (which thankfully I see less and less) are largely dogpiling and not in touch with the Python ecosystem (it’s trendy to hate).
On the other hand I don’t think the camp advocating to “just learn the tools and remember to do X Y Z” help the situation because there is definitely a problem to be addressed. There’s some truth to the meme after all.
Looking at other languages for inspiration like node + npm I wonder whether the root cause is just providing sensible defaults. In python land, package installations (pip install) aren’t local to your project by default (unless you create + run inside a virtual env). I think it’s just this extra step tripping newer/unfamiliar/junior developers up
I don’t see the same kind of hate for node/npm for throwing stuff into node_modules by default (except criticisms about the folder size). And the node ecosystem has similar approaches for managing node versions like nvm/n similar to pyenv (so I don’t think that’s a point of friction either).
I’ve been programming professionally in Python for 15 years, and I’ve genuinely tried to find a happy path in Python performance and package management and so on. It feels like being gaslit when people continuously promise some tool will fix my problem, and then I sink a bunch of time only to find a different set of unworkable pitfalls. I’m sure I haven’t tried every tool in the solution space, but writing me (and others) off as “out of touch” or “it’s trendy to hate” feels pretty cruel.
Notably, in Go and Rust, there is a single build tool with minimal configuration, and they work right out of the box just about every time on just about every system. Similarly, the performance of those language implementations is better than Python performance by several orders of magnitude out of the box and there’s usually headroom to easily optimize. I’m sure there are other languages that behave similarly–it really feels like Python is just stuck.
So, as someone who has been around for multiple generations of the Python packaging saga, what I think has happened is:
During (1), people ironically mostly didn’t complain because that was an era when expectations were low and nobody really thought Python was going to develop an equivalent to CPAN and its tooling overnight. Also, many popular packages, for many years, avoided having external dependencies because the support for properly pulling those in at install time was not good.
During (2) and into the early part of (3), a lot of people developed their own ad-hoc tooling to suit their specific use cases, and wrote about it. And then for every ad-hoc tool, two other people would build their own things on top of that, and four more on top of those, and so on. So you got this huge proliferation of personal setups that worked for one person’s use case, and came up high in the Google search results for common questions.
Now we’re at (3), and the main issue is the inertia of all those “don’t use the default tooling, use this thing that only works for my codebase” posts. If you cobble together three or four of them, you might get something that actually kinda-sorta works, but it won’t be nice, and you’ll be using a bunch of different incompatible tools which each were designed with the assumption of being the one and only tool you’d use.
Meanwhile the standard default tooling –
setuptools
for writing package manifests and producing the artifacts,pip
for installing packages,venv
for project isolation – has just been quietly improving int he background the whole time. I highly recommend ignoring anyone and everyone who tells you to use something else, and instead going back to those tools. Earlier this year, I wrote a guide to doing that. And a couple of years ago I wrote an explanation of the default tooling and the use cases each tool fulfills, if you want that intro first.Yea, I think you have the community timeline pretty much right. And I, too, recently wrote a guide to “uncontroversial” Python tooling, looks like it aligns quite a bit with your recommendations. That is, pip, venvs, pip-tools, requirements.txt for dependencies and pyproject.toml + PyPA recommendations for package publishing.
“How Python programmers can uncontroversially approach build, dependency, and packaging tooling”:
https://amontalenti.com/2022/10/09/python-packaging-and-zig
I feel like I fall into the same camp as yourself. I’ve tried a bunch of tools and still think they fall short. So I didn’t intend to cause offence.
I just don’t think dogpiling on the ecosystem is productive. I also feel like much of the criticism (and this memes popularity) stems from casual onlookers outside the ecosystem.
Thanks, I appreciate the clarification. No harm, no foul. 👍
I think it’s productive insofar as it advertises pitfalls to people who are considering the Python ecosystem. It’s not constructive to have everyone sinking considerable time and energy into Python only to fall into the same traps over and over. It can also be productive if it increases awareness or pressure on the maintainers to address the issue (yes, that may mean breaking changes). That said, I’m sure the quality of the criticism varies, and some are surely (and understandably) just venting steam.
Granted, I’m not a JS or Python developer by trade, but node/npm is probably the only package manager I’ve had more trouble with than pip and friends. While I agree that default project-local installs are desirable, I’m not sure if node/npm is the ideal we want to strive for.
Interesting (me neither tbh) 🤔. Perhaps not the best example.
I know npm sometimes gets a bad rep, but it’s never usually the packaging situation I see people complaining about (maybe it’s just overlooked / lost in the noise).
I still believe having saner defaults could at least help resolve some of the tooling complexity issues people love to complain about.
The need for local installs has been recognised and there’s a PEP for it https://peps.python.org/pep-0582/ so soon(*) we’ll get something similar to node_packages.
Unfortunately, it’s unlikely. That PEP is from 2018 and iirc is currently stalled. There are some cool projects like https://pdm.fming.dev/ built off it though.
This is great news, I wasn’t aware of this specific PEP, thanks (a lot of the motivation section confirms my existing biases).
i think it’s a helpful meme for two reasons:
for the first one, my workplace usually starts C# projects by default, largely for environment/dependency reasons. we use python for throw-away work, existing python projects, and when there’s a python library that saves us a lot of effort.
the second one can be especially important, because python also has a reputation for being easy. many people get stuck on dependency hell and assume that they are the problem
(rust is also very good at this problem, but i basically never want to use rust for a problem i was considering python for. it’s like when a vegan tells you that a banana is a good substitute for an egg, but you don’t want bananas benedict)
It’s a real shame that Python is so much more often used for this than Ruby. In a Ruby script, you can just drop
require 'bundler/inline'
and define fully-isolated gem dependencies as a prelude, in a single file. No hassle whatsoever.The author is right that this isn’t black magic nor is it specific to Python. But it is a source of continual annoyance to developers. You get confused about where you are and what python/ruby/whatever you are using is and stuff breaks in weird ways. You can absolutely figure it out but it doesn’t take away from the fact that daily use means you’ll stub your toe somewhat frequently because dependency management is a hard problem that has to be tailored to each codebase.
It worse than this, because it impacts many more people than Python developers. There are tons of Python scripts being used as utilities across multiple operating systems that advanced users will come into contact with on a regular basis. I’m frequently having to battle this problem, despite not ever developing Python.
I think that python dependency/project/environment management still has plenty of room for improvement, but it has also made steady progress in recent years. The slow evolution can be confusing for users, as someone else mentioned.
Wheels have made installing packages with non-python dependencies much easier. This has removed the need for something like conda/mamba for all of my projects.
Python adopted
pyproject.toml
as a centralized project configuration file.tomlib
will be part of the standard library in the next version of python which will remove any external dependencies.Pip got a new dependency resolver.
After installing python on any system, I start with
pip install pipx
so that I can install any global python based tools in isolated environments.I’ve recently started to use
hatch
(installed withpipx
, of course) to manage projects. It pieces together some existing tools in the python packaging ecosystem and helps enforce best practices. Think of something likeblack
for project management. It tries not to re-invent the wheel likepoetry
does and fits well/uses existing tools. It is also part of the Python Packaging Authority (likepipx
), which feels promising. One of my favorite parts about hatch compared to npm or typical venv usage is that it keeps the virtual environment out of the project folder.Again, this doesn’t solve everything and I’m still experimenting to find the best workflow for myself, but I do see things getting better.
Out of curiosity: why do you feel the need to use Hatch instead of rolling the
pyproject.toml
file by hand, using plain virtual environments included in the standard library, andpip
on top of that? I feel, personally, that would lower the barrier of entry significantly to new users of your project when the setup guide (i.e. create a virtual environment and here’s the command, activate it and here’s the command, and install the requirements and here’s the command) is so simple and is more a part of vanilla Python as opposed to install this tool that you may have never heard of because I, the project overlord (tongue in cheek), deemed it so.With each of these additions to the package and project management ecosystem (e.g. Hatch, PDM, and Poetry) I feel like I’m utterly clueless in not understanding why so many people jump on board. Surely, there have to be outsized benefits that I’m not privy to or not am exposed to in my workflows.
Hatch uses plain virtualenvs and pip already. Someone could definitely contribute without using hatch. Poetry feels much more heavy handed to me. https://hatch.pypa.io/latest/meta/faq/
Ah, okay, that was an oversight on my part then. Glad to know Hatch rolls more on the sane side than Poetry.
There are 2 big problems from my perspective:
zipapp
or use other dependency tools)Both of these are easily solved by less lazy developers and/or forcing the ecosystem to ensure developers can’t be lazy. Go, Rust, etc all went the 2nd way, by ensuring developers can’t be too lazy, by making the default in the ecosystem the only sane way to do it. That generally means anything outside of the blessed way gets hard to very hard, depending on the language/ecosystem.
Python obviously doesn’t enjoy those same defaults. Could the Python ecosystem be “fixed”, of course, it’s not a technical problem. So far nobody has managed to do the hard work and make it happen, so I’m not optimistic.
Some applications, like Calibre do the work for their user(s) and include binaries for the platforms they want to support. Most python developers seem to lean on other people to fix the distribution problem. It’s not even that hard to build binaries for the different platforms to solve the user side of this, many different tools exist to solve these problems.
The developer side of the problem has plenty of tools also(poetry, pip-tools, etc).
So I’m sure I’ll get a lot of flack, but my perspective is, it’s a lazy developer problem and a lack of consensus problem(as far as getting the defaults fixed).
Looking at all the comments in this thread, I have trouble accepting any description of “lazy” being involved.
Fair enough, I didn’t mean it in a negative way, I describe myself as lazy all the time. I apologize if it was taken that way. Language is hard.
I expended a tremendous amount of effort within my company to make
make
make Python dependency management one-command-and-done. I did a talk about it for PyOhio this year: make python devex: Towards Clone to Red-Green-Refactor in One Command w/ a ~45y/o Tool [PyOhio 2022].The TL;DR of the talk is that I use
make
+pyenv
+poetry
to ship data science pipelines, libraries, and some CLI tools. Make is the UI that tells pyenv which Python to install and then poetry does the dependency management, building, and publishing. I make a point in the talk that one could use Conda-managed Python instead, if your use of it complies with their licensing (corps gotta pay, ours does).All of this to get the ~same out-of-the-box experience I had in Scala— the stack in which I’d worked for nearly a decade before moving to a Python team— for a decade.
The venv stuff is easy. Although mostly unique to python, it is not a difficult very problem to solve. When i dunk on python dependency management i refer to the issues that have been plaguing the ecosystem due to shortcomings in its design. I am sorry for the incoming rant.
Pip used to ignore version conficts, which has resulted in package ecosystem having bonkers version constraints. I’ve found it common for packages which require compilation to build, to have an upper python version constraint, as each new python version is likely to break the build. The most common drive-by PR i do is bumping the upper python version bound.
Poetry, although a major improvement for reproducibility, is in my opinion a bit too slow (poetry –help takes 0.7 seconds) and unstable (Of course this poetry version wipes the hashes in my lockfile!), has poor support for 3rd party package repositories, and does not even support the local-version-identifier part of the version schema correctly, which has resulted in people overriding some packages in poetry venv using pip.
Every python package manager (other than conda, which is fully 3rd party) is super slow during dependency resolution, as it can’t know what the subdependencies of a package are without first downloading the the full package archive and extracting the list of dependencies (pypi issue here from 2020), which is incredibly fun when dealing with large frameworks such as pytorch, tensorflow, scipy and numpy, where each wheel is at least a gigabyte in size.
For source distributions, dependencies are usually defined by setup.py, which must be executed to allow us to inspect its dependencies. This of course cannot be cached on pypi, as it is possible for setup.py to select its dependencies depending on the machine it runs on.
Then there is the setup.py build scripts which never seem to quite work on any of my machines. Some build scripts only ever work in docker environments I would never have been able to reproduce had it not been for dockerhub caching images build with distros whose repositories now have gone offline. This is especially becomes a problem when the prebuilt binary packages, made available by the package author, typically don’t target ARM and/or musl based platforms.
My pretty strong belief has been that OS-level installs of Python are a major source of all of these issues. Apt and homebrew showing up with system installs means that you get into so many weird scenarios (especially if you happen to have one script somewhere with a badly-formatted shebang).
There is also an issue not covered in this, which is packaging for distribution to other systems. There are lots of packaging tools out there, but it’s still pretty tough to package things that aren’t really pure python. Even avoiding dependencies you get into issues! urllib2 installs on a clean Mac machine don’t work with HTTPS (you have to run some script first…), so things are not cut out for you.
My recent idea is to have a binary like cargo, where you run the binary on your project, it will set up an environment to run your script, but first will completely wipe all Python-related env vars, and then set up as hermetic a thing as possible. It’s a ball of mud solution, but it’s nicer to me than the current state of the art (Docker, basically).
But really I think that beyond
pip
+requirements.txt
, a lot of the new tooling simply is broken way too often. Poetry is kinda buggy, pipenv would just randomly hang all the time…. I think everyone would be very happy to adopt something that works without having major changes in workflows.And hey, motivated people would quickly send patches to upstream libraries! Everyone wants Python to work nicely, just so far people aren’t actually showing up with nice solutions (beyond a lot of the amazing work with wheels and the like, of course. Talking more about the top-level binaries).
The only thing I have never really understood is Python deployment, but I’ve never understood deploying anything that isn’t compiled, really. The best non-Docker mode of deployment is rolling your own service files, or pex or something?
It’s extremely context specific. The answer is different if you embed (language) as a scripting language in your app, different if you’re deploying server software, different if it’s an end-user app, etc. That applies to pretty much every scripting language with a separate runtime though.
Okay but let’s say I have a python module,
__init__.py
, maybe anapp.py
and that’s it in my project to deploy to a server - I wouldn’t what to do other than write a systemd unit file, and I don’t know if there’s anything else one can or should do.I’ve never seen a good explanation for thisNow that I think about it, actually I ended up using dh-virtualenv most of the time. Ignore me entirely I guessIf you wrote the app and it’s simple, you can likely copy the files and use the system python to run them. Add a virtualenv if you need dependencies. Add pyenv if you need a specific version of python.
What do you do in other languages? I don’t quite get the difference.
In compiled languages I usually build a statically linked executable I can deploy
PEX was indeed meant for this scenario, but asides that then you usually just end up either using these tools to build a virtualenv in the deployment environment, or you build the virtualenv in a staging area which is identical to the production environment and then deploy that as an artefact. There’s a reason lots of people gave in and just started using containers.
We dabbled with pex, and it worked okay except the target still needed to have the right version of Python installed, and pex doesn’t know about any shared object dependencies, so you also need those as well. It’s a pretty leaky abstraction. We also tried using these in AWS Lambda functions a few years back and relatively simple closures would bust the 250MB lambda size limit (I think the limit has since been lifted)–pandas alone was something like 70mb. Meanwhile, an equivalent Go program weighed in at like 16mb (vs 250+mb in Python).
I don’t know if I’ve just been supremely lucky with what dependencies I’ve chosen, but I’ve always found Python dependency management pretty straightforward.
Is just about everything you need. It just works. It’s not significantly different to any other language. I don’t need to learn the intricates of how pip works, or to memorise a load of flags. There’s just one flag there!
If installing dependencies fails due to needing a C compiler or whatever, I’ll just bring one into the path and try again. To generate a lock file it’s just
pip freeze > requirements-freeze.txt
. If you install from the lock file, you get exactly the same versions of everything again. Simple.Pretty nice article that came just in time when I was swearing about building a conda environment for our jupyter instance. I just can’t find a set of packages that fit together, and messages I get are not that helpful. I had to switch to pip-compile to get a good overview what versions/packages have conflicting dependencies. And after reading this thread it occured to me that there is another pitfall of python’s packaging tools: if I try to create an environment in one go (a’la pip-compile followed by pip-sync) I will get failures. If I use pip, it will happily install packages with conflicting dependencies – only if I install them one by one, it will warn me. That’s not what I would expect from packaging tools.
That’s all well and good until you run into a missing wheel, which leads to a compilation error due to a missing header file. Or until you need to upgrade a package to fix a bug, but there were unrelated breaking changes because Python package devs don’t follow anything like SemVer, and break backwards compatibility constantly. Or until you end up with CUDA version mismatches, or similar dynamic linking issues with other system libraries.
Those issues are not related to python. You’ll run into each thing you listed when compiling C, Rust, Ruby, or whatever else you use. You always have to deal with external dependencies and version issues regardless of environment.
I think you mean they’re not exclusive to python, which is certainly true. And yet they seem to be far more pervasive with Python than with Rust, at least.
Kind of. What I mean is that this affects development in general. They’re not toolkit issues, but toolkits can help workaround them. This is what I mean about the meme. You it’s not python vs rust, but rather environment existing for decades vs environment with explosive growth in last years + applying patterns at learned recently. You can apply semver to anything you want - it’s not relevant whether you’re using python. But most python projects were created before semver was described.
This talks about Poetry. And about using venv directly. But what about pipenv? I remember that being pushed quite heavily. Does it do the same as Poetry? Or the same as venv? Or both? Neither? Should I use Poetry in addition to pipenv? Why do we need all these different tools? Are they complementary or competing? If they’re competing, is there a community consensus about what’s “right”/“best”?
Pipenv lives in the same “dependency and environment manager” area as poetry. Choose either one you like if you’re choosing for yourself, or use the one used by upstream package otherwise.
For the differences, you could think of poetry as pipenv with extra nice features. For example automatic “pipenv clean” equivalent.