1. 2

    This seems fine and reasonable but I’m curious: for whom it is that the limitation which this works around ever comes up?

    1. 1

      With multi-user machines (as at HPC sites) this comes up a lot, as people are likely to install software in their home directory and in shared project directories with deep paths.

      1. 2

        Thanks!

    1. 1

      Could

      #!/usr/bin/env -S /long/path/to/real/interpreter with many arguments
      

      be a solution if their env implementation accepts the -S switch?

      1. 1

        env -S allows many arguments, yes, but the OS will still truncate that line at the max shebang limit. So if /long/path/to/real/interpreter is very long (> 127 chars is the total line length limit on Linux), you will miss all the arguments and possibly part of the interpreter path. See https://www.in-ulm.de/~mascheck/various/shebang/ for lots of details and https://lwn.net/Articles/779997/ for an interesting story on nix and long shebangs.

        1. 1

          Whauw, 127 characters is not a lot.

          Thanks for the interesting links!

      1. 5

        Hmm… Pretty cool! NixOS recently fixed an issue with 32MB large shebangs. I wonder if this does actually work on macOS? I thought macOS prohibited the program in a shebang from being a script with a shebang? already in the README. It might be interesting to note that Perl already re-parses its shebang on startup, to handle very large shebangs.

        1. 1

          I saw that story – is there hope that they’ll actually improve the shebang mechanism with whatever the solution to this is?

          There is work being done to achieve the original goal (preventing the kernel from possibly running the wrong interpreter) while not breaking existing users; that is proving harder than one might expect and will almost certainly have to wait for 5.1.

          It’s too bad that the -I magic nix does with perl is perl-specifc (and that it requires 32MB shebangs). Does nix similarly isolate other interpreted languages this way?

        1. 2

          Not sure the author read the relocation logic in Spack too deeply. That, or maybe we should put a less dissuasive comment at the top of the file (I think the comment mentioned is older than the code 😬).

          Spack does use patchelf and install_name_tool to relocate RPATHs in binaries. Those will lengthen paths in binaries if they are short, which is nice.

          Spack also:

          • Spiders the installation directory and replaces instances of the install prefix in text files (shebangs, config files, etc.)
          • Modifies strings in the strings section of binaries. Spack will warn you if the target path is too long to relocate to.

          That last one is the hard one. To solve it in the general case, where we could lengthen the path as we do with RPATHs, you’d have to figure out all the usages of the string in the binary. So you’d have to rewrite address computations and find all the places they’re used. It’s impossible to do generally because the address could be computed in theoretically any way. It’s likely that static string addresses are only computed a few ways in practice, so it might be possible to make a binary rewriting tool that does simpler pattern matches.

          So far we have gone with padding the path, which at least allows us to find the path and shorten it. You can set up a Spack build pipeline and put the following in your spack.yaml:

          spack:
            config:
              install_tree: /home/software/$padding:512
          

          That will pad the install path (like paths in /nix/store, but they can be wherever) out to 512 characters. You can also just omit the :512 and Spack will try to do something based on the max path allowed on the OS (that can get you in trouble as OS limits can vary, so we picked 512). Packages will now be built and installed to this very long synthetic path.

          Binary packages made from these installations will now be relocatable to shorter paths, including strings in binaries. There are likely other weird special cases that we’d need to handle to cover all packages generally, but this has worked well for packages tested so far. We’ll likely find out more once we start making binaries available in a public build cache.

          1. 1

            Hey @tgamblin thanks for the lengthy response, and apologies I clearly didn’t read the Spack implementation closely (I’ve also edited the post with a link to this comment).

            Is my understanding correct that “/home/software/$padding:512” pads the remaining parts of the path so that the total path is 512 in length? And from there can you freely patch all binaries and text files with simple replacements?

            Part of the reason I wanted to use path padding is because I want to more rigorously validate build output by hashing it. By padding out to a known long path length you could replace those links on the fly to get a hash of the output that is unrelated to the build location.

            Also nix allows the package name to exist in the build path, I’m slightly worried this might exacerbate the length issue, does Spack deal with that or avoid it somehow?

            Great to hear that padded paths and naive replacement have worked well so far.

            1. 1

              Is my understanding correct that “/home/software/$padding:512” pads the remaining parts of the path so that the total path is 512 in length?

              Yep, that’s the idea.

              And from there can you freely patch all binaries and text files with simple replacements?

              Yeah, as long as your replacement is a null-terminated string that’s shorter than 512 chars, you should be able to do this.

              Part of the reason I wanted to use path padding is because I want to more rigorously validate build output by hashing it. By padding out to a known long path length you could replace those links on the fly to get a hash of the output that is unrelated to the build location.

              Yep – this is definitely an advantage. It makes me wonder what nix does with RPATHs though, as I thought nix used patchelf to change them, and patchelf can lengthen paths. Maybe nix only hashes after patching, and the store path is authoritative? We just ignore this and assume the package can be built (more typical case right now) or relocated anywhere.

              Also nix allows the package name to exist in the build path, I’m slightly worried this might exacerbate the length issue, does Spack deal with that or avoid it somehow?

              Not sure exactly what this means. The build path in Spack is a temporary directory that we get rid of after the build is done. I’m actually not sure how this works in nix. What’s the issue with the package name being there?

          1. 1

            I like the notion of innovation in the realm of package managers, but: if I’m the author/maintaner of a give bit of software, I’d want to know how popular Spack is (i.e. installed and in use) before committing time to making a Spack package for my software.

            1. 3

              I’ve been using and supporting Spack in several HPC/bioinformatics environments for the past several years (and have a commit bit to the non-critical branches).

              I don’t wait, or ask, for the upstream developers to create Spack packages for the tools that I (my users) need, for well engineered upstream packages they’re generally simple to create.

              The Spack community is responsive, helpful and patient.

              1. 2

                Check out the first few tutorial slides here for some stats about usage and contributor base:

                https://spack.readthedocs.io/en/latest/tutorial.html

                We have slightly more than 2,000 monthly active users on the documentation site (per google analytics, FWIW).

                The GitHub pulse page is also not bad for assessing activity on the project:

                https://github.com/spack/spack/pulse

                Also feel free to join slack and ask around. No invitation required:

                http://spackpm.herokuapp.com/

                There are usually 40-50 folks online. 18 right now because it is a weekend.

              1. 1

                I took a brief look at this. How’s it differ from, say, asdf?

                1. 3

                  I am not too familiar with asdf, but I think the main difference is that while asdf is a wrapper around langauge-specific env management systems, Spack is a build-from-source package manager (like, e.g., nix) that happens to have support for virtual environments of its own in whatever language you want to use. So, Spack will actually build you a version of numpy based on the intel compiler toolchain, whereas asdf will manage pip packages in its virtualenvs in conjunction with other languages. I would need to get deeper into asdf to really understand the difference.

                1. 1

                  I’m using easybuild [0] with lmod [1] with my cluster and quite happy.

                  Spack seems to put everything in different environments that can be loaded separately. I’m not quite sure if it fits my use case and what advantages there are over easybuild, but I will take a look at it. (If it has a better concept for building packages in different environments than it could have a good chance for my use case at least)

                  [0] https://easybuild.readthedocs.io/en/latest/

                  [1] https://lmod.readthedocs.io/en/latest/

                  1. 3

                    In terms of isolating configurations and getting strong reproducibility, Nix and Guix are also interesting. There is a comparison of Spack, Easybuild, Nix/Guix in the slides from this 2018 FOSDEM talk by Kenneth Hoste.

                    I personally find the approach of Guix quite convincing (Nix and Guix are using the same overall design but with different programming language choices); it’s not as tuned as Easybuild/Spack towards making it easy to tune the compilation flags for each packages (which is less important in general devops scenarios than for HPC), but there definitely is a HPC userbase whose concern are actively discussed – see the blog post Pre-built binaries vs performance for a discussion.

                    1. 1

                      The main difference between Spack and Easybuild is how dependencies work. Spack packages are templated; you can depend on other packages with version ranges instead of fixing them as you must in EB. You don’t need to have easyconfigs for every new build of a package, you can just say, e.g., spack install hdf5 ^mpich@3.3 pmi=pmi2, and Spack resolves (“concretizes”) dependencies and completes the build spec for you. Spack also supports virtual environments (with spack.yaml/spack.lock files) and binary packages (EB does neither of these). There are other features like chaining spack instances, GitLab CI integration (https://github.com/spack/spack/pull/11612), and a concise yaml format for describing large build matrices (see https://github.com/spack/spack/pull/11057), e.g., for a large HPC facility deployment. So in general it’s more flexible, and we target HPC center user support, individual developers, and individual users, whereas EB is more geared towards the first use case.

                    1. 1

                      It looks like it’s basically a clone of homebrew written in Python. I recognize the syntax of the example package on the website — very close to a homebrew formula.

                      1. 2

                        The package format was indeed borrowed from Homebrew, with some fancy metaclass magic to get function calls to work as mixins in Python.

                        Some key differences from Homebrew (mainly around building lots of things from source and configuration):

                        1. the dependency model supports building lots of different versions of packages – you can have hundreds of different versions of, say, hdf5 installed, each with different compiler flags and different transitive dependencies. Packages are essentially templated.
                        2. the parameter system in spack is more extensive
                        3. you can swap compilers in and out of the build
                        4. spack has virtual environments
                        5. spack can generate environment modules for your HPC machine
                        6. it lets you build with external packages if you want (e.g. you can say that openmpi lives in /opt/openmpi and swap it into a build)

                        Spack is also geared towards HPC clusters and scientific computing, so there are packages in spack that just aren’t in homebrew.

                        1. 2

                          In my environments (bioinformatics, Pharma, biotech), this has been the killer feature:

                          the dependency model supports building lots of different versions of packages – you can have hundreds of different versions of, say, hdf5 installed, each with different compiler flags and different transitive dependencies. Packages are essentially templated.

                          The other features are nice, but I haven’t found anything else that will straightforwardly organize the building and deployment of several different versions of a package (and it’s dependencies) in various configurations/options-settings and help me swap them in and out of my environment at will.