1. 15

Had an idea to use the sometimes-provided md5sum for files installed with curl <url> | sh -, and was finally nerd-sniped to throw together a simple script.

  1. 5

    IMHO, this is one of the critical parts of an install script that I see missing constantly. Homebrew doesn’t by default do any signature checks on what it’s downloading. Docker’s install used to be a curl | sh as well.

    I first thought about this when seeing people’s (especially Golang) NixOS files which included a statement like the following:

    src = fetchgit {
      url = "git://github.com/NixOS/nix.git";
      rev = "1f795f9f44607cc5bec70d1300150bfefcef2aae";
      sha256 = "1cw5fszffl5pkpa6s6wjnkiv6lm5k618s32sp60kvmvpy7a2v9kg";
    }
    

    I may be wrong here and there may be signature checks under the hood, but I feel like building this mechanism into the curl | bash would be a huge bonus. I wish more programs (and especially homebrew) did this, but would like to hear if there’s a solid reason why not (there probably is).

    IMHO: curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh?version=X.X.X | inline-md5sum MD5SUM | sh -" would be a much more robust curl to bash as it would include the current version (fully reproducible curl | bash-es) as well as a quick signature check. Heck, you could even have the signatures posted elsewhere or tied to PGP/AGE.

    1. 5

      Just to clarify, Homebrew formulas do have signature checks, which is more parallel to your NixOS example.

      1. 1

        Yes, that’s correct, however unlike NixOS, Homebrew does not provide a signature for itself in an easy manner to perform a signature check. With NixOS it’s either a.) provided right next to the download button or b.) you can perform your own because you are building the ISO yourself.

        If you cannot verify the Homebrew installation itself, then it doesn’t really matter if the posted dependencies have signatures because an invalid homebrew installation could do whatever it wants re: Reflections on Trusting Trust.

      2. 2

        IMHO, this is one of the critical parts of an install script that I see missing constantly.

        md5 is not, and should not be used as, a meaningful “signature” for authenticity. it might be useful for verifying if the download was not corrupted, but it won’t give any indication of whether the file was tampered with in transit, etc.

      3. 5

        This is good, but it, IMO, should be SHA256 or Blake2 instead, which are considered cryptographically strong unlike MD5.

        1. 2

          Since this is just a validation script you could theoretically make it generic enough to process a handful of different hash types so that it’s more compatible.

          1. 2

            I was just thinking about this, and had two thoughts:

            • Generalize it by adding a CLI flag to indicate which hashing function is being used. (Something like, -n md5, -n sha256, etc)
            • And/or also supporting the Multihash format
            1. 2

              Thought about adding other formats, but considering I was nerd-sniped, I had other things I intended to do today 😅

              Definitely gonna read up on Multihash, as this is the first time I’ve heard of it.

            2. 1

              Feature creep 😁

              But adding that into the script wouldn’t be too much of an excercise.

            3. 1

              You’re absolutely right, but most sites that I’ve come across that use the pattern only provide MD5.

              I thought about adding a flag to specify the type of sum, but feature creep 😁

              1. 1

                Yeah, but how would that help you run a script where the MD5 was provided :)

              2. 5

                Just remember… if the script and the known-correct value for the hash are being obtained from the same site, this is unlikely to be protective. If someone is maliciously replacing the script, surely they will also replace the checksum sitting next to it.

                This will probably flag plenty of errors where someone updates their script but not their checksum, though. Which is generally a good thing too, unless it teaches people to ignore the errors.

                1. 2

                  You’re absolutely right. This little script is insufficient if the threat model involves actors that are aware enough to update the sums along with the script, but that was definitely out of scope for this little excercise.

                  I don’t have any particular hopes for this little script, outside of I hope others find it useful and are made aware that running a script after being immediately downloaded could be made more secure.

                2. 4

                  I wrote a similar tool to sign/check files in a pipeline: sick.

                  It requires a public key to verify the signature though, but works in a similar way:

                  curl $URL | sick | sh
                  

                  The only caveat is that it puts the whole file into memory for now (though bufferizing it on disk would be easy to do). If the ed25519 signature shipped with the file doesn’t match, then nothing is printed to stdout. If it matches, the content (without the signature) is ouputed to stdout.

                  1. 3

                    Maybe just curl from ipfs.

                    As long as the key is the correct, you’ll always get the same data. About as good as a download link next to a hash.

                    1. 3

                      Even better: ipfs get from ipfs ;)

                      But yes, curl from a public gateway is a close second

                      1. 2

                        Ooh, interesting. Hadn’t thought about IPFS as a solution.

                        Use case is a bit different/nuanced though. I wanted something where I could insert some sort of verification string prior to running that would be trivial for the author to also include as a part of a release.

                        Since IPFS doesn’t quite fit that description, it doesn’t feel like the right solution, but you did remind me that I should give it another look.

                        1. 2

                          An improvement but afaics that still means ultimately trusting an external entity (ipfs infrastructure) versus a locally calculated checksum.

                          I haven’t looked very close at ipfs yet, it’s on my list as part of my archiving endeavours.

                          1. 1

                            There’s no need to trust the “ipfs infrastructure”, just the client implementation. Content keys are generated from secure hashing the content itself.

                            1. 2

                              If you use a client, sure; I presumed you meant curl https://ipfs.io/…

                          2. 2

                            But then the script needs an IPFS client, which is, too, vulnerable to this. Unless you mean hitting on a specific server, which can be manipulated as well (one of my friends actually did that, for a prank).

                            1. 2

                              Unless the download somehow fail in the middle ? Take the following script:

                              #!/bin/sh
                              curl -o https://random.stuff/archive.tbz
                              tar -C $HOME/.cache -xJf archive.tbz
                              cp /$HOME/.cache/archive/blah /usr/bin
                              rm -rf $HOME/.cache/archive
                              

                              Pretty simple, and downloading it from ipfs would work, but if the server chokes, and stops transmitting data at rm -rf $HOME, then the script will just cleanup your home directory without warning. You got the script from the correct URL though. So checking the hash (or best, signature!) after the download is complete remains a better option.

                            2. 2

                              Anything that gets the computer checking the checksum for you is a step up in my book! Relatedly, I did a roundup a couple of years ago of minimal techniques for verifying checksums without having to eyeball them manually. If anyone knows any better ones I’d love to hear them.

                              1. 2

                                My solution is very low-tech: copy the hash from the terminal, and in the browser, Ctrl+F Ctrl+V. If it highlights the hash on the website, you’re good.

                              2. 1

                                i believe a similar result can be achieved using plain shell scripting. this is single bash pipeline, so it’s also usable in a single RUN stanza in a Dockerfile, without creating intermediate layers:

                                set -euo pipefail \
                                    && curl -fsSL https://... > file-contents \
                                    && expected_hash=... \
                                    && echo "$expected_hash  file-contents" | sha1sum --check --quiet - \
                                    && (cat file-contents; rm -f file-contents) | wc -l
                                

                                it works like this:

                                • the set line is for ‘safer bash’ (search the web for details) and is a good default in general.
                                • store curl output in a file, aptly named file-contents
                                • call sha1sum to check the file; pipeline aborts if there’s a mismatch
                                • pipe the file contents into a program of choice, and cleanup (data and hash) afterwards.

                                some more remarks:

                                • short code to avoid 3rd party scripts, their downloads, version control for local copies, etc.,
                                • the subshell trick is the lazy person’s version to a cleanup using mktemp, trap etc
                                • the files are removed in the happy case only; leftover file may be helpful for debugging
                                • sha1sum prints an error message like sha1sum: WARNING: 1 computed checksum did NOT match (use --status to suppress, but it’s generally good to see why things fail)
                                1. 2

                                  So what’s really funny is that the idea for this script was to make it as simple as possible to use.

                                  Over the years, I’ve observed people are a lot more willing to use trivial one-liners instead of multiline-inline blobs.

                                  This is what the original idea was like:

                                  url="..."; sum="...";
                                  curl -Ls "${url}" > /tmp/file  \ 
                                  && md5sum --quiet --check <(echo "${sum}  /tmp/file") \
                                  && sh "/tmp/file"
                                  
                                2. 1

                                  It looks like a security theater. You trust the source if you downloading it. Make sure it’s httpS. That’s it. This md5 hash checking creates false sense of security.

                                  1. 1

                                    this approach actually does make sense to detect tampering, or the more likely real-world scenarios with the same result:

                                    • files getting silently overwritten (even big vendors like zoom make this mistake!)
                                    • files disappearing, potentially with a redirect to another url or alternative content (without http error codes involved)

                                    (and indeed, md5 is not a secure hash function these days)