1. 23
    1. 7

      This is why it’s a really good idea to use FUSE for any filesystem on removable media. Very few filesystem drivers are written on the assumption that the on-disk data is malicious. They may assume that it might be corrupted, but an attacker can control any checksums that they put on as well as the corrupted data, so that doesn’t really help. If you use FUSE (and properly sandbox the FUSE process) then you can limit the scope of a compromise.

      1. 5

        To be honest, it’s not even about maliciousness, it’s just about correctness. The driver should check that the file entry is only there X times and the filename consumes at most Y bytes. Not blindly copy Z bytes into a Y sized stack.

        FUSE doesn’t protect you much either, here, since in this case, the attacker gains control over the filesystem only instead of the entire kernel but can still present arbitrary contents over the FUSE interface if they wanted, as well as being able to do anything the FUSE process was able to do.

        The real solution is that filesystem drivers should be written with the assumption that data on disk may be corrupted in just the right way and that all metadata from disk should be verified to be sane (not correct, but atleast sane).

        1. 9

          FUSE doesn’t protect you much either, here, since in this case, the attacker gains control over the filesystem only instead of the entire kernel but can still present arbitrary contents over the FUSE interface if they wanted, as well as being able to do anything the FUSE process was able to do.

          Someone who can tamper with a filesystem can already present arbitrary data from that filesystem. Running with FUSE from a sandboxed process means that they can’t (without a second vulnerability):

          • Crash your kernel.
          • Extract the encryption keys used for your main disk.
          • Overwrite unrelated bits of your filesystem, such as the kernel image on your boot partition or your bootloader.
          • Open network connections and transmit data from your filesystem elsewhere.

          If you mount a removable medium and then run binaries from it as root, you can still be trivially compromised, but that doesn’t rely on a vulnerability in the filesystem driver.

          1. 1

            I would still disagree, a FUSE process that has been taken over can do some unpleasant things still compared to merely being able to say which files appear, which would be entirely a non-Problem if the filesystem driver assumes that underlying disk may contain corrupted or non-sane data.

            For example, the process could already be spying on data and attempt to freeze your system by hanging read() and stat() requests, so you reboot and it has a chance to mess with the boot process.

            1. 2

              A compromised FUSE process is pretty bad but a compromised kernel is a lot worse. If I were the victim of a USB key attack, I would feel infinitely more relieved if I knew it was just a user-level compromise. Hopefully I wouldn’t have entered my root password into sudo.

              1. 1

                Yes, but hence the solution is to harden the FS driver, not to sandbox it and hope it won’t get worse. That’s my point from above. Those filenames should be length checked. Directory entry counts should be sanity checked. Everything that one reads from disk should be sanity checked.

                1. 3

                  Once the filesystem driver is running in userspace, you’re likely IO-bound so you might as well validate all structures of course. I don’t think the decisions to validate on-disk structures and running in userspace are mutually exclusive. At minimum you should validate all user-provided bounds to prevent RCE. Since removable filesystems are more likely to be hostile, it’s just an extra level of protection to run them in userspace. There is always a chance for bugs.

      2. 4

        This is the direction that Apple is moving in:

        The implementations of the exfat and msdos file systems on macOS have changed; these file systems are now provided by services running in user-space instead of by kernel extensions. If the application has explicit checks or support for either the exfat or msdos file systems, validate the applications with those file systems and report any issues.

        (Sadly, no API for it…)

      3. 2

        I remember years ago there were discussions in the Linux kernel world about whether or not filesystem drivers should validate on-disk structures. The issue was that historically no validation was done for the sake of efficiency, making filesystems are large attack vector. Not sure what the policy is in the Linux kernel these days but what you have suggested is a great solution that balances security and efficiency and I’ll be doing that going forward. I’ll add that you may want to add the noexec and nosuid options to the FUSE mount to minimize the risk of potential damage even more.

        It’d be nice if bash or other command interpreters honored noexec when running scripts contained on filesystems with that option enabled.

        1. 1

          I had this discussion with Kirk McKusick (author of UFS, and someone who has worked on filesystems for longer than I’ve been alive) a few years ago. His view was that most filesystems don’t have sufficient redundancy that you can catch errors efficiently. You can catch things like this buffer overflow, but not things like cross-linked structures (e.g. this block is used as both data in this file and an inode over here) efficiently, so validating the structures gives a false sense of security. Much better to sandbox the filesystem driver.

          1. 1

            In an ideal world, I’d want all kernel-level code to validate external data such that RCE would be impossible. I haven’t really thought through this problem, especially not as much as Dr. McKusick, but do you know of a way to get RCE via cross-linked structures? I can imagine creating infinite loops by spoofing inodes but if basic precautions are taken I can’t imagine how that could result in an RCE. Maybe if there is inode recursion on the stack but that should be frowned upon in the kernel in general.

            1. 1

              If the structures are cross linked, you can probably get unexpected aliasing in the buffer cache and use that to chain the next exploit.

    2. 2

      I wonder why RedHat disclosed in this fashion? No snark, genuinely curious.

    3. 2

      It gets tiresome to have to say this every time, but Linux has millions of LoCs that compile into megabytes of object code, and all of it runs in supervisor mode.

      What could possibly go wrong?

      There are better designs out there. There’s microkernel, multiserver with capacities. Kernels like seL4. Systems like Genode. We should embrace them.