1. 21
    1. 4

      Careful with spaces in filenames! Theres a -print0 argument to find and a corresponding argument to xargs that takes care of this issue. Though it still makes problems if your filenames contain a literal 0 byte. But then you’re screwed anyway. Fortunately, Unix file names cannot contain null bytes.

      % find src -iname "*.ts" \
        | LC_ALL=C sort \
        | xargs md5sum \
        | md5sum \
        | cut -d" " -f 1
      md5sum: src/this: No such file or directory
      md5sum: is: No such file or directory
      md5sum: a: No such file or directory
      md5sum: test.ts: No such file or directory
      d41d8cd98f00b204e9800998ecf8427e
      % ls src
      'this is a test.ts'
      

      From the man page of find(1):

      -X Permit find to be safely used in conjunction with xargs(1). If a file name contains any of the delimiting characters used by xargs(1), a diagnostic message is displayed on standard error, and the file is skipped. The delimiting characters include single (“ ’ ”) and double (“ “ ”) quotes, backslash (“\”), space, tab and newline characters.

      However, you may wish to consider the -print0 primary in conjunction with “xargs -0” as an effective alternative.

      1. 4

        Though it still makes problems if your filenames contain a literal 0 byte. But then you’re screwed anyway.

        This is as “impossible” as a file name with a literal ‘/‘ in it.

        1. 2

          Ah, that’s a relief. Thanks!

      2. 3

        To be fair, if you have files with white spaces IN YOUR SOURCE CODE you deserve any pain you get =P

      3. 1

        Thanks, I’ll fix that when I’m back at a computer!

    2. 3

      This is fascinating and wonderful. Def. holding onto this trick for future use if I come upon a situation where it’s applicable.

    3. 1

      I struggle to understand the root cause author is working around. If a Makefile says foo depends on bar.c baz.c and honk.c, does foo not get built when it’s appropriate?

      1. 2

        As both Git and S3 set the file lastModified dates to the time they ran, the build process either never ran (artifacts are newer than source), or always ran (sources are newer than artifacts).

        To use the caching across multiple (ephemeral) build agents.

      2. 2

        “When appropriate” means “if bar.c, baz.c or honk.c have changed since foo was last built, then foo needs to be rebuilt.” But, how do you test “have changed since foo was last built” if modification dates aren’t reliable? Answer: content change detection.

    4. 1

      Tempted to do something similar with Shake. I wonder if it could be easier to test. Probably would be more verbose though.

      1. 1

        Shake already supports change detection by hashing the content, see ChangeDigest and ChangeModtimeAndDigest here https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake.html#t:Change.

        It also supports the concept of a remote cache, at least in the form of a shared directory.

        Or am I misunderstanding what you mean…? :-)

        1. 1

          Thank you, good to hear that Shake seems to support the strategy of the article well. I feel like it could be be more robust if written with Shake, this is why I wondered.