I would recommend Nix over Conda for new projects. The NixOS community has a comparison between Nix and Conda. While there may be many ways in which Nix is slightly painful compared to Conda, the terms of service alone seem like a motivating reason to avoid Conda.
What problems (particularly ones stated in the article) does Nix solve here?
From the article’s perspective, Nix is like Conda but more reproducible. The running example from the article becomes a single-line expression with nixpkgs:
python39.withPackages (ps: [ ps.numpy ps.pandas ps.pillow ])
To make a third column in the first table, adding Nix alongside pip and Conda, Nix does everything that Conda can do. The third table is similarly simple; features like reproducibility and virtual environments are baked into Nix’s design, and nixpkgs has a security team. To make a third column in the second table, comparing nixpkgs to PyPI and conda-forge:
Finally, but non-trivially, the production of Docker-compatible container images is reproducible when using nixpkgs’ tools to assemble the image. This provides a reproducible alternative to Docker’s builder. In contrast, the article gives two non-reproducible Dockerfiles which depend on the state of global package repositories.
On serious examination, the only reason to recommend Conda might be for Windows users, but I usually recommend that they change their entire software stack at that point.
Yes, Nix can do everything Conda can, the problem I see is that you may want to install precise versions of your dependencies and instead if you stick to a certain Nixpkgs release or commit, you have to use the version in the distro. I don’t know if that affects Conda as well. That’s the reason why for most my Python project I use few distro packages and instead i rely on mach-nix which does a very good work together with a private Nix binary cache in order to not have to rebuild the packages every here and then.
Can nix do non-root install now (but still use a binary cache)? That was the one thing keeping me from using nix for this last time I checked (which was a while ago).
It needs root just to create /nix directory during installation (in single-user mode, in multi-user mode it will need root also to create some additional users to run the nix build service as) and yes, it does support binary caches in such setup
Right, thanks. So that’s pretty much what I remember. Without a non-root install option (that also supports binary caches) it’s not a good fit for what I need to do (I can’t assume that I’ll have root access where I need to run my software, or that anyone would install it for me). Conda ticks those boxes, but I’d prefer the stronger guarantees that come with nix. Also the tooling.
If the problem is one of distribution alone, then nix-bundle is a great prototyping tool, and nixpkgs supports static linking for many languages. As long as you’re not trying to share development environments too, Nix would still work for that situation.
For the specific case of living in homedirs on somebody else’s hardware as a permanent tenant, I personally think that specialized toolchains should be developed which use “living off the land” techniques. These techniques are typically used by malware, but they could be used for good, too.
living off the land
living off the land
TIL. Thanks for that, very interesting! I guess having scripts setup conda environments (or similar) in a semi-transparent way is not too far off that concept, which is what I’m doing at the moment. miniconda (or a custom installer created via constructor take quite a bit of pain out of that work, because they’ve already taken care of wrapping the install process into scripts, and they also take care of a lot of the inconsistencies between target platforms, and so on. As I said earlier, conceptually I’d prefer having something more rigorous like nix, but there is only so much time I can afford to spend on making deployment as simple as possible for my users…
Also, thanks for mentioning nix-bundle, I think I stumbled upon it when reading about the packaging of Nyxt (or some other tool), but forgot about it again. I could probably make it work somehow, but the advantages compared to Conda would probably not be worth it in my scenario (modular, extendable data-science tool for non- (or not-so)-technical users, basically). And then I’d still have to worry about Darwin and Windows.
If I ever find the time I would really love to look into it even if it’s just for Linux. Being able to create nix-based containers and AppImages and all that easily would be something very nice to have.
Be careful to stay mindful of how radically different the work you do might be from other people.
I use conda. I create a new environment for each data science project I work on, then delete those environments when that project is done. I sometimes use pip within those conda environments; conda supports that. I also use pip outside my conda environments, when the tool to be installed is a non-dependency to a specific project (and doesn’t already exist in my OS package manager, which is my strongly preferred way to install any system-wide application).
Suppose I start a new GIS-related analysis and I want to create an isolated “virtual environment” and install shapely.
Okay, so I’m now installing its dependency on GEOS, which is most definitely not a Python library. So, what’s the singular “right way” that every individual should install these tools in every circumstance? I’d suggest there is no such thing as a single “right way”.
I want to do two things here that are important to me: install a C++ dependency to a Python library, and have it totally isolated into something functionally akin to a reproducible virtual environment that also leaves the rest of my projects (and system) alone.
Now, I also want to share my Jupyter notebook analysis with my colleague so they can reproduce my results on their machine. I want to check one text file into my repo that fully specifies that environment, and I want my colleague to be able to create that environment with one command, on any operating system (including Windows and MacOS, since they’re not hip to the same development tools as me). And after they have reproduced my work, they won’t want any of my stuff on their system anymore. I want them to be able to delete it with one command, too. To really remove it from their system.
We can kind of do all of this with pip, but… well… good luck to each of us on our competition for whose system goes longer before requiring a fresh OS reinstall. Heck, maybe my colleague is reproducing my results on MacOS, but I created the work on Linux. Say Apple pushes an update to Maps that somehow breaks my GIS project, and only my GIS project, and only on my colleague’s machine.
I’m sure you’ll agree this will be a fun experience to resolve with my colleague. And this is the one on Mac, so at least I can start to theorize why it might break for them. That poor colleague using Windows is on their own.
Now, on the other hand:
conda is not how I would install any software in the context of needing some project-agnostic “user tool” on my local machine. I don’t want to have to activate an environment - even an automatic default one - to run shell commands I expect to just work. My tune might change if I switched to Windows, but I know nothing about that OS and would be reaching for the tool most familiar to me in order to become productive quickly, so I’m not a person with an informed opinion about that.
And while deploying isn’t my area of expertise (so I might be speaking out of turn), this also is not how I would deploy any of my work into production. For example, I imagine installing GEOS via conda to work with PostGIS as part of the backend for a user-facing GIS platform is almost certainly the wrong way to be approaching the objective. This is where people are going to be looking at Nix and all the other things related to scalable and reproducible deployments.
conda was created by people in the data science community that write a lot of Python that depends on a lot of non-Python. I don’t think Travis worked on it, but the people who did certainly work for him, and so you know they intimately understand this use case. It is very, very, very good for this use case. Not perfect, but awesomely useful. Poor Travis has probably been answering people’s “Python” questions about NumPy for 2 or 3 decades. Cheers to his group of folks for making my life easier at the same time as reducing their open source support burden.
And to be perfectly clear, I don’t blame the people responsible for the Python packaging ecosystem that pip is a suboptimal solution when I need to write some Python code (which depends on some Python library, which has some numerical C or Fortran library deep in its dependency chain, all from one project that conflicts with my other project’s separate deep dependency tree of software spanning multiple languages and decades).
In summary, I don’t think conda is for software engineers.