1. 17

    1. 20

      Not bad? NTFS is really, really good for its age. It has proper (for 1990s/early 00s values of “proper”) ACL and auditing support, forks, per-directory and per-file compression, shadow copies and a bunch of other cool things. It got a bad rep for two major reasons. First, Windows didn’t always expose these features well enough, so they often went unused, and when they were used, they were frequently misused – early on it seemed like the only people who were able to use them well were either people at Microsoft or malware authors. And second, the whole system was poorly documented for the longest time, and many applications used it sub-optimally, both because developers didn’t understand it well enough to make better design decisions and because much of their code was haunted by the spectre of technical debt. Thus, for the longest time, NTFS was basically just FAT32 with bigger files. But it’s a really cool system.

      This is no longer the case today, but for a while back when Y2K was a thing there was nothing in Linux land – at least among the non-commercial, free software offerings – that could give you the same feature set without also giving you migraines and a bleak outlook on life.

      1. 5

        early on it seemed like the only people who were able to use them well were either people at Microsoft or malware authors

        Even today I see people doing things like iterating a directory and then needlessly stating each file individually. Rust’s standard library has a handy DirEntry::metadata function which avoids this and yet many people will still end up calling stat on each file path. I don’t know how to nudge people in the right direction other than by slowly raising awareness.

        1. 2

          You kind of have to on some platforms (AIX…), but yes, pretty much anything modern will expose most values you care about in the dents.

      2. 4

        Using the file system on Windows is multiple orders of magnitude slower than e.g. ext4 on Linux when many small files are involved. I don’t know if that’s the fault of NTFS, but it is ridiculous how incredibly slow the file system on Windows has become over the past 10-15 years. Sometimes I suspect that there’s an “accidentally quadratic” bug somewhere deep inside that code.

        1. 7

          The answer is mostly that each file system access goes through an audit layer that’s used by anti-malware scanners such as Windows Defender.

        2. 3

          Microsoft have created “Dev Drive”, a special volume type to try to make developer workflows involving lots of small files that are often way faster on other systems better on Windows. Can’t comment on it aside from that, I don’t use Windows.

        3. 3

          IME, it’s often the case of API impedance mismatch - things like using path based APIs from Unix when NT operates on handles. Emulating the semantics of that requires a ton of opening and closing the same file.

        4. 2

          there’s a (2018) comment by one of the NTFS engineers at Microsoft about why this is the case that may be of interest: https://github.com/Microsoft/WSL/issues/873

        5. 1

          Do you have some test programs that show the order of magnitude slow down? One of the points made in the talk is that it’s eminently possible to investigate why something is slower rather than speculating.

          1. 3

            I wrote very simple scripts that created and deleted files and then ran them on Windows and Linux. Quite a trivial matter. Windows Defender / Antivirus was disabled and yet, Windows was incredibly slow. Unfortunately, I don’t have the scripts anymore. I should have saved the scripts and written it all down. I also noticed massive speedups from moving a local disk from the Windows PC into my NAS and accessing it on LAN via Samba. Ultimately, I just switched all my machines to Linux and that was the end of this problem. But the problem is really, really easy to notice whenever you work with files on Windows, the slowness is so obvious and grotesque that you don’t even have to be scientific about it.

            1. 4

              I’m not a filesystems person so I’ve never benchmarked these things – other than noting that NTFS was always “fast enough” for my very modest (and very infrequent) Windows usage requirements, I never really cared about performance enough to run tests.

              But… I mean, have you profiled the scripts at all? How do you know the slowness was due to the filesystem layer and not due to poor implementation in your scripting language’s runtime, or different filesystem caching settings, or… ?

              All filesystems make trade-offs. This may have been one of them in NTFS, I don’t think serving lots of short-lived files is something Windows commonly needs to do so it wouldn’t be surprising at all if Windows sucked at it. But in my experience, pathological FS access performance on a “clean” Windows system is usually the result of poorly-adapted code, which is not at all infrequent. Lots of FOSS projects are Linux-first and lots of their low-ish file I/O code revolves around its file I/O idioms – that’s why many of them are dog slow when operating on large batches of files on Windows. Also, lots of folks learn these concepts on *nix because that’s what lots of CS/CompEng departments use, and that knowledge carries over poorly to Windows. I’ve been there and wasted a lot of time writing slow code by insisting to do things the Unix way on a platform that wasn’t Unix.

              I wouldn’t find it surprising that ext4 is faster than NTFS in some particular scenarios, but a local NTFS filesystem being slower than Samba serving from an NTFS disk sounds like something that would’ve been worth a more “scientific” investigation…

    2. 6

      Watched it some days ago, it’s a really good talk. It’s a deep-dive into IO optimization, and how performance problems which get brushed off as “NTFS bad” can be caused by the application doing IO in stupid ways, using his work on optimizing rustup as a case study.

      Off topic, but I miss those IRL conferences. Ever since covid hit, all conferences have been these low energy online video conferences from people sitting alone in front of their laptops with bad audio quality. It’s not the same as talks given on a stage in front of an audience of people.

      1. 1

        *nod* Same problem as with acting. Not “being there” makes it much more difficult to get into the right mindset to be high-energy.

        If I had to give a talk, I’d at least ask a family member to be my camera operator and try to set up a room where I can stand and pace and gesture a little bit.