1. 15
  1.  

  2. 4

    Dynamic unindexed file tree walk is not as slow as fd/find suggest. This <40 line C program is >2x faster than either fd (slow from probably unbuffered output writing for (likely incomplete!) MT-safety) or find (slow from extra work to support arbitrary depths with fewer open files).

    YMMV, but I get only a < 2x slowdown relative to git ls-files on a hot cache min of 10 runs of that linux.git benchmark running out of /dev/shm (up to git hash ..07c2e39ef6bc43003ff6).

    #include <errno.h>
    #include <stdio.h>
    #include <string.h>
    #include <dirent.h>
    #include <limits.h> // PATH_MAX
    
    char dirbuf[PATH_MAX];
    struct dirent *dread(DIR *dp) { errno = 0; return readdir(dp); }
    
    int walk(int nDir) {
      DIR *dp = opendir(dirbuf);
      if (!dp) {
        if (errno != ENOTDIR) { perror(dirbuf); return 1; }
        return 2;
      }
      int r = 0;
      dirbuf[nDir] = '/';
      for (struct dirent *f = dread(dp); f; f = dread(dp)) {
        if (strcmp(f->d_name,".")==0 || strcmp(f->d_name,"..")==0)
          continue;
        int nName = strlen(f->d_name);
        int nNew = nDir + nName + 1;
        memcpy(&dirbuf[nDir + 1], f->d_name, nName);
        dirbuf[nNew] = '\n';
        fwrite(dirbuf, nNew + 1, 1, stdout); //fwrite_unlocked
        dirbuf[nNew] = '\0';
        if ((f->d_type==DT_DIR || f->d_type==DT_UNKNOWN) &&
            walk(nNew)==1)
          r = 1;
      }
      if (errno || closedir(dp)) { perror(dirbuf); r = 1; }
      return r;
    }
    
    int main(int argc, char *argv[]) {
      printf(".\n");
      dirbuf[0] = '.'; dirbuf[1] = '\0';
      return walk(1);
    }
    

    (EDIT: Of course, 2x faster may still make git ls-files “worth it” for your specific use case…but any “reputation” fd/find have for being fast at this is somewhat undeserved.)

    1. 1

      That’s an interesting perspective, thanks! When I wrote the article, I certainly thought that fd or find had performance closer to your C program.

    2. 4

      The same can be said about git grep vs grep.

      1. 2

        Hi, author here (proof), happy to receive any comments.

        1. 3

          Very nice analysis, thanks! I might have missed it in the text, but are the differences the same when one uses a different filesystem?

          1. 1

            Thanks! I found very similar differences (in terms of multiples) on my ext4 drive from what I recall, but I have not kept the numbers.

        2. 1

          Once you know there’s an index, I guess the results become unsurprising.

          I suppose there’s no reason why your filesystem couldn’t also do the same indexing, but it’d waste space.

          1. 1

            Once you know there’s an index, I guess the results become unsurprising.

            Yes and it’s also quite interesting to know that we have all these small indexes sitting around, almost always up to date, waiting to be used.