1. 31
  1.  

    1. 29

      I’ll use this as an opportunity to thank @ubernostrum for his post on “boring” dependency management, which has helped me reduce the number of headaches in my life.

      1. 12

        I appreciate the endorsement. Also, it’s worth pointing out that the setup I recommend is:

        • Based on the standard/default Python packaging tooling. The only third-party tool I recommend is pip-compile, and only because it saves you the time of writing your own script to chain together the corresponding set of lower-level pip operations. You could always dump pip-compile and write the script yourself.
        • As close as you can get, with standard/default tooling, to fully reproducible environments. If you deploy each time to the same Python version on the same target platform/architecture, you actually will download the same, byte-for-byte identical, set of packaged artifacts each time[1].

        And one of these days I should go clean up that post a little bit and add a few explanations for why I’m not recommending some things that have become popular since I first wrote it.


        [1] The final on-disk result may vary because not all packages that include compiled extensions ship precompiled binaries; in that case, you might have a package compiling its extension on the target machine and thus only the initial source bundle you downloaded is byte-for-byte identical each time, not the compiled output. I do show how to tell pip to only allow precompiled binaries, if you want to go that toure.

    2. 20

      I’ve learned to appreciate the virtue of manually juggling virtualenvs over “automatic” approaches.

      If you need to support multiple Python versions, or test against multiple versions of dependencies, “automatic” management of virtualenvs by making them 1:1 with project directories just gets in the way.

      Also interesting that the post dismisses pip-tools because it’s a third party tool which doesn’t come with Python and which you have to learn how to use, and then goes on to suggest using Poetry which has the same problem. pip-tools has worked well enough for my needs.

      1. 5

        Same here. The “Automatic management of virtualenvs” is a big part of why I find pip replacements frustrating. One of the best features of virtualenvs is that you don’t have to activate them. If you simply run python/pip/etc from their virtualenv path then you are using the virtualenv. I have a handful of defacto project management scripts that assume the virtualenv is in project-local venv dir (overridable with an env var). It’s a few more pieces but they remain the same across projects so they are easy to build habits around. Most importantly though, exerything is local and explicit. No surprises.

    3. 16

      This is just cargo cult with almost no real argument backing it other than “because it’s better”.

      There is no requirements.txt, it can be any text file. It can even be a shellscript with a list of modules.

      The update of transitive dependencies is a feature. Freezing everything is an anti pattern and the best way to get hacked. If you think it is unrealistic to keep hundreds of dependencies up to date…. Well, yes, it is! Keep dependencies to the bare minimum. You can pin whatever you want on a requirements file too. Or by installing them manually (shellscript?).

      I also don’t get the point of mangling virtual envs in yet another high r abstraction. I would wager people doing so, aren’t really familiar with how virtualenv works.

      1. 12

        Freezing everything is an anti pattern and the best way to get hacked.

        Well that’s quite a bold claim. What evidence do you have that that’s the case?

        1. 4

          Left pad? The whole rubygems hack fiasco?

          But objectively, if you freeze everything, you actively prevent security updates from transitive dependencies. Top level dependencies are supposed to be left for your responsibility.

          1. 14

            Do you even know what leftpad was about? Your comment indicates you don’t.

            1. 1

              I know exactly what it was and had a field day that day. What makes you think I don’t? The fact that it wasn’t a security hole, but rather a dependency that disappeared from the register all of the sudden? The point still stands. I even gave another example. One of the biggest hacks of the history of the internet, which shock waves are still unknown until today.

              https://venturebeat.com/business/rubygems-org-hacked-interrupting-heroku-services-and-putting-millions-of-sites-using-rails-at-risk/

              I recon I was too succint. Security is the number one concern, but not the only one. But mind you, stuff randomly breaking is a security hazard per se.

              1. 3

                Any reasonably sized company will cache their used artifacts, so a dependency disappearing should not cause any immediate problem.

                Your “solution” on the other hand doesn’t handle the case of introducing new vulnerability to a non-trusted dependency - how is that better? Just freeze a state that you verified is correct, and keep it up to date (this latter is often neglected, making this ordeal much more difficult than regularly updating small things).

      2. 6

        Freezing transitive dependencies is how you get reproducible builds.

        1. 1

          I don’t get your point. What’s the point of reproducible builds at the expense of security?

          1. 7

            The point is a different kind of security. With a reproducible build, you gain availability by preventing breaking changes from being automatically deployed at the cost of losing out on automatically benefiting from security patches. The classic CIA triad includes availability for a reason. There is a second tradeoff that I think is mostly hypothetical - version pinning prevents certain types of supply chain attack, at the cost of availability from non-security bugfixes that solve problems that cause downtime. These are both scenarios that are extremely rare in realistic threat models for most organizations, in my experience.

            Regardless, the point is that this is a question having some nuance. There’s not one decision that’s objectively more secure than the other. Context is needed to decide for any given project.

            1. 2

              Calling it a tradeoff is putting it unreasonably mild. Throwing away security patches at useful time is mission critical for anything else than a toy. A CVE surfaces, if you have an online service, it is a matter of hours, or even minutes before script kids start unleashing their vulnerability scanners. The tradeoff you talk about might include the product survival. If all your user accounts get compromised, I am sure your boss will not calm down by hearing “but we had reproducible builds”.

              Regardless, the point is that this is a question having some nuance. There’s not one decision that’s objectively more secure than the other. Context is needed to decide for any given project.

              I agree with this principle, although the specific case is too extreme. The article gives package-lock.json as an example to follow. Head up to github and check the package-lock files of popular javascript projects. They are enormous, pretty much a blob. Realistically speaking, no one is going to check them for critical security updates. My point being that dependency trees larger than what can be humanly managed are an anti-pattern. If you achieve reproducible builds by resourcing to what is essentially a blob, then you might as well just trust an arbitrary binary blob. At that point, why would you even care about reproducible builds? Reproducible builds assert the source integrity. If you practically not building from source but rather from a blob, what’s the point? It becomes just a semantic hack with no direct benifit.

              Dependencies are not a free lunch. Follow the footsteps of nodejs on this will make things worse. You can pin a specific transitive dependency with pip, the fact that that doesn’t happen by default is not a limitation, it is the intended behaviour.

              1. 5

                Realistically speaking, there are tools that scan dependencies and notify us when there are CVEs affecting them, whether transitive or not.

              2. 3

                Calling it a tradeoff is putting it exactly correctly. Using only library versions you know are compatible with your application is mission critical for anything else than a toy. A new version gets pushed out without good semantic versioning or with a poorly tested feature, your app might only survive as long as your deployment cycle. The tradeoff I talk about might include the product survival. If you take hours or days of downtime to track down the problematic source code, I am sure your boss will not calm down by hearing “but we were using the latest versions”.

                I could go on, but I hope I’ve made my point. This isn’t a question like “Should we give our domain controller a public IP address and no firewall rules?” with an obvious, correct answer. Dogmatic fixation on your preferred option without regard to your particular organization’s concerns does no good.

              3. 2

                A CVE surfaces, if you have an online service, it is a matter of hours, or even minutes before script kids start unleashing their vulnerability scanners.

                We had at least a couple recent events that confirmed my long held suspicion that most CVEs are kinda bullshit, so, I wouldn’t throw away reproducible builds, which is something that improves debuggability and stability in general on a daily bases, for maybe avoiding a once in a blue moon relevant CVE.

                I’ll concede that YMMV depending on the kind of service you’re running, though.

    4. 9

      As a once-in-a-while casual python user requirements.txt has one clear advantage: it’s usefulness is one command away because pip is pre-installed with every python. With poetry I’d have to figure out how to get it installed before I can install the dependencies of whatever I want to actually run.

      1. 2
        python3 -m pip install --user pipx\
          && python3 -m pipx ensurepath\
          && python3 -m pipx install poetry
        
        1. 2

          Wait, what’s pipx? Do I need yet another package manager to install poetry? Why can’t plain pip do it?

          1. 1

            pipx is a way to install and manage Python applications. It handles your PATH, executables, virtual environments, et al.

          2. 1

            pipx is like pip, but it creates a virtual environment for the thing you’re installing, installs it there, and makes the binaries available somewhere sensible. It’s great for installing tools and applications and keeping it them isolated from each other.

      2. 1

        pip install .

    5. 5

      Can’t you pip-install all of your pinned immediate dependencies and then do a pip-freeze to get the complete set of versions for all the transitive dependencies?

      1. 4

        My thought too.

        pip install -r requirements.txt
        pip freeze >lockfile.txt
        
      2. 2

        pip freeze has a lot of shortcomings and only captures a dependency to its pinned exact version. It doesn’t capture platform markers or hashes which hurts reproducible builds.

        1. 4

          You can script the hash generation. It just makes the script complicated enough that you should probably just reach for the pip-tools’ project and their pip-compile, which already has scripted fetching the full tree and pinning it with hashes.

          1. 2

            Yes, I would recommend pip-tools at a minimum when it comes to Python development, and if you plan on making an application or library, reach for more powerful tools that automate more of this for you like Poetry or Hatch.

            Though lately at work, I’ve been building small container images (as in, someone can easily hold the entire context in their head by looking at the whole file at once) and just throw in a pinned minor like ~=62.3.0 because sometimes our cache mirror will break and not have the exact version as expected.

    6. 4

      isn’t this terrible advice? python has standardized on requirements.txt for 20yrs. It is the standard.

      1. 4

        HTTP 1.0 was standard. It was a good standard. You can still use that standard. There are better standards now.

        1. 7

          There are better standards now.

          No, not for the specific use case requirements files are actually for.

          Specifying dependencies in a setup.py (old school) or pyproject.toml (new school) is for building a distributable artifact from the associated source tree. Perhaps for upload to PyPI or another index, perhaps just for transfer to some other directory or machine internally.

          Specifying dependencies in a requirements file is for reproducing an environment. Perhaps for deployment, perhaps for debugging, perhaps just for “here, use this setup to run my thing on your machine”.

          This has been the official distinction between the two for at least a decade.

          See also this thread in the packaging forum about using pyproject.toml for something that’s not intended to eventually produce a .whl package.

          1. 1

            Having used setup.py, requirements.txt, pyproject.toml Poetry, and Pipenv with its Pipefile, I feel Pipenv is a better tool than setup.py or requirements.txt for my needs for an environment for exploration-based tasks, like a notebook-based app. I want good version specs, locking with hash verification, and to avoid having to manually manage a virtualenv. I’ve got some reporting pipelines that use Pipenv in CI to build reports and it works well. I’ve got ML model training pipelines that ship as a Poetry-built whl installed in a python:$version container. I prefer the latter but the former opened my mind to notebooks as production code.

    7. 4

      i don’t find poetry to be super appealing when writing one off scripts or writing tools that gets executed directly. When writing a library, poetry makes much sense to me and that’s when i use it.

      Does it make more sense to use poetry for the first case too? I mean the obvious issue w poetry is that, if i am simply sharing a script for someone to use, it’s more likely they’ll have pip in their system and they can go on and use the tool right away.

      i assume the user would be able to make their own setup of virtualenv.

      1. 7

        When I write a one-off script using Python I launch it with nix-shell.

        1. 4

          Out of everything I’ve tried, this has been by far my favourite approach. Basically turns python scripts into portable executables. No need to worry about which dev headers to install. Nix’s learning curve pays dividends.

      2. 4

        writing one off scripts or writing tools that gets executed directly.

        Ruby made the transition to being lockfile based over a decade ago and I can’t imagine going back. Sure, some things are slightly harder, but I remember the days before Bundler where I would spend days trying to figure out what mystery version of some library broke my script or to figure out why my coworker couldn’t see the same bug on their machine.

        Scripts aren’t as quick and easy, but the flip side is they are much more robust when they come with a lockfile.

        1. 2

          Inline gemfile is so handy for scripts

          1. 1

            Both Rust and Python have open proposals to do the same

      3. 1

        I think requirements.txt makes a lot of sense if the artifact you’re going to ship is a container image of some sort. In that case, you just need to know what to install inside the environment, and requirements.txt is dead simple. For one-off or dev-oriented scripts I also don’t necessarily see a problem with it, depending on the exact situation. On the other hand, I reach for Poetry if whatever I’m writing (library or app) is going to end up on PyPI.

        1. 6

          I don’t generally agree with this.

          One of the major pain points in requirements.txt-based approaches is that removing a package does not remove transitive dependencies, which makes active development a pain that increases at least linearly (assuming most packages have at least one transitive dependency that you don’t already have) with time.

          Of course if you’re simply shipping a container image, then why bother even including the requirements.txt? It’s an artifact necessary to build said image, but not necessary to run the image, if done correctly.

          1. 1

            I think you misunderstood. I didn’t mean that you include requirements.txt in the container image, just that you can use it to manage dependencies for the project if you’re just building a container that contains your program since you don’t need to create a PyPI package or metadata. It still has drawbacks, but they aren’t as significant if you’re not shipping to PyPI.

    8. 3

      Yeah, nah.

      I admittedly haven’t experimented with poetry very extensively, but I had to help people figure it out a couple of months ago, and just … Nah.

      It’s slow, the config file is weird, it handles private indexes weirdly, and that’s just the top three annoyances out of the top of my head.

      I feel like, in the effort to prevent some occasional issues, these tools make the everyday stuff significantly worse.

      I am waiting to try Armin Ronacher’s new thing, it sounds promising, and he has a track record of writing APIs that don’t suck. In the meantime, pip is enough, maybe with pip-tools for flavor.

    9. 3

      Never used poetry, but it seems as un-unixish as pip to me. Do not modify your virtual environment, create a new one and enter it.

      #!/bin/sh
      source .env/activate.sh
      bash
      

      That script is roughly my workaround for pip. With poetry, I would do the same.

      1. 3

        It can feel unergonmic to run poetry run cmd but it’s the explicit way to handle it. Most folks who set up Poetry will also put ~/.local/bin in PATH, which is where poetry lives. An intelligent shell like that in IntelliJ will automatically put Poetry’s venv on the PATH in its Terminal session, so you can also run poetry-installed commands like normal. I like this magic when I’m using the IDE, but my Makefiles all call anything Poetry manages with poetry run $cmd.

    10. 3

      If we’re talking about alternative dependency managers, PDM has been very good to me lately. I especially appreciate that it is aware there are many dependency managers, and does not assume a pristine project.

      Before I get into the convoluted things I used it for, let me reassure you PDM has the basics down pat. The happy path of “pdm init/add/run/build/publish` is happy indeed; it will install your dependencies and update your lockfiles and build your artefacts, and all will be well.

      PDM has a wonderfully helpful attitude to interop. We have some repositories at work that are still managed with pip/requirements.txt, but on my own computer I prefer to use PDM — and PDM helps me, because it can do pdm import requirements.txt and pdm export --format requirements.txt (as well as other formats, like setup.py, Poetry, or Pipfile).

      When I ran pdm init on an existing project that used setup.py/requirements.txt, it asked me which Python interpreter to use, and it showed me a list of my system’s Python interpreters to choose from. Let’s just say I’m not used to Python tools being that helpful.

      When I was working in multiple branches, one of which was updating the dependencies, PDM helped me by supporting multiple venvs, and multiple lockfiles, so I could switch back and forth without constantly reinstalling.

      In short, PDM has helped me with my workflows, both on and off the happy path, and often made my work easier than how I did it before. Wholeheartedly recommend it.

      1. 2

        I think one of the biggest pain points for me always has been to grandfather existing projects into pipenv so a tool that would reduce that effort, would definitely be very nice.

    11. 3

      A recursively pinned requirements.txt using a tool like pip-compile is – if you understand its limitations – a robust and standard way to pin Python dependencies. That’s more than what can be said about Poetry. And automatic management of virtualenvs might be nice, but not a reason to switch dependency managers.

      Ironically, the article missed the one actual weakness of requirements.txt files: cross platform lock files which is why I’ve switched to PDM after getting an ARM Mac (but even that is solvable using Docker if you really want to).

    12. 3

      People won’t be convinced because of Blub. From the OP:

      It would be great if Python came with a canonical project workflow tool.

      Yes. This debate is not happening in Rust. The arguments of “but then I have to type cargo run” is non-existent. When C++ come to Rust, they say “wow the tools are very nice” but Bundler and others have been doing this the whole time. It’s because the C++ person is actually trying, feeling and experiencing cargo.

      Meanwhile, people are making their own Poetry setup. I can activate myself. I can lock myself. Yes, you can and I’m not going to use your invented system. I’m going to use whatever tool that acts like the other languages’ package managers. I don’t really care if it is named Poetry. You can add pip plugins and have a stack, or I can just use Poetry. We can’t resolve this. Someone has to be willing to try the other thing. The thing I’ve seen is just outright denial. No trying.

      For this project (django-unfold), which is a random library I ran into recently (I’m not even doing Django at the moment), their CI is very elegant. They build and publish with a single subcommand.

      We can rewrite The Blub Paradox with Poetry and pip.

      As long as our hypothetical Python programmer is looking down the dependency management spectrum, he knows he’s looking down. Tools less powerful than requirements.txt are obviously less powerful, because they’re missing some feature he’s used to. But when our hypothetical Python programmer looks in the other direction, up the dependency management spectrum, he doesn’t realize he’s looking up. What he sees are merely weird tools. He probably considers them about equivalent in power to requirements.txt, but with all this other hairy stuff thrown in as well. requirements.txt is good enough for him, because he thinks in requirements.txt.

      It still doesn’t solve Blub. Blub happens until the person tries the thing, for a long time, which is hard.

    13. 2

      However, pip won’t pin the versions of the transitive dependencies

      It does when you do pip freeze > requirements.txt

      This is error-prone: forgetting to activate the virtualenv or activating a wrong virtualenv are common mistakes.

      Tools that automagically do stuff based on the current folder cause more problems than they solve.

    14. 1

      I thought the headline meant to stop storing my software requirements in text files, which you can pry from my cold, dead fingers.

    15. 1

      pdm is better than poetry. or just venv is fine

    16. 1

      When I did a lot of Django development we were ride or die for pipenv and I haven’t really handled a requirements.txt file for years now.

      Also as an application developer I always wonder what the fuss around python packaging is because it’s very rare to run into issues.