Dynamic unindexed file tree walk is not as slow as fd/find suggest. This <40 line C program is >2x faster than either fd (slow from probably unbuffered output writing for (likely incomplete!) MT-safety) or find (slow from extra work to support arbitrary depths with fewer open files).
YMMV, but I get only a < 2x slowdown relative to git ls-files on a hot cache min of 10 runs of that linux.git benchmark running out of /dev/shm (up to git hash ..07c2e39ef6bc43003ff6).
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <dirent.h>
#include <limits.h> // PATH_MAX
char dirbuf[PATH_MAX];
struct dirent *dread(DIR *dp) { errno = 0; return readdir(dp); }
int walk(int nDir) {
DIR *dp = opendir(dirbuf);
if (!dp) {
if (errno != ENOTDIR) { perror(dirbuf); return 1; }
return 2;
}
int r = 0;
dirbuf[nDir] = '/';
for (struct dirent *f = dread(dp); f; f = dread(dp)) {
if (strcmp(f->d_name,".")==0 || strcmp(f->d_name,"..")==0)
continue;
int nName = strlen(f->d_name);
int nNew = nDir + nName + 1;
memcpy(&dirbuf[nDir + 1], f->d_name, nName);
dirbuf[nNew] = '\n';
fwrite(dirbuf, nNew + 1, 1, stdout); //fwrite_unlocked
dirbuf[nNew] = '\0';
if ((f->d_type==DT_DIR || f->d_type==DT_UNKNOWN) &&
walk(nNew)==1)
r = 1;
}
if (errno || closedir(dp)) { perror(dirbuf); r = 1; }
return r;
}
int main(int argc, char *argv[]) {
printf(".\n");
dirbuf[0] = '.'; dirbuf[1] = '\0';
return walk(1);
}
(EDIT: Of course, 2x faster may still make git ls-files “worth it” for your specific use case…but any “reputation” fd/find have for being fast at this is somewhat undeserved.)
Dynamic unindexed file tree walk is not as slow as
fd
/find
suggest. This <40 line C program is >2x faster than eitherfd
(slow from probably unbuffered output writing for (likely incomplete!) MT-safety) orfind
(slow from extra work to support arbitrary depths with fewer open files).YMMV, but I get only a < 2x slowdown relative to
git ls-files
on a hot cache min of 10 runs of that linux.git benchmark running out of/dev/shm
(up to git hash..07c2e39ef6bc43003ff6
).(EDIT: Of course, 2x faster may still make
git ls-files
“worth it” for your specific use case…but any “reputation”fd
/find
have for being fast at this is somewhat undeserved.)That’s an interesting perspective, thanks! When I wrote the article, I certainly thought that fd or find had performance closer to your C program.
The same can be said about
git grep
vsgrep
.Hi, author here (proof), happy to receive any comments.
Very nice analysis, thanks! I might have missed it in the text, but are the differences the same when one uses a different filesystem?
Thanks! I found very similar differences (in terms of multiples) on my ext4 drive from what I recall, but I have not kept the numbers.
Once you know there’s an index, I guess the results become unsurprising.
I suppose there’s no reason why your filesystem couldn’t also do the same indexing, but it’d waste space.
Yes and it’s also quite interesting to know that we have all these small indexes sitting around, almost always up to date, waiting to be used.