1. 49
    1. 4

      Kinda surprised that mkdir -p is not exposed via the mkdir system call.

      1. 2

        That is an interesting thought. User-side mkdir -p has atomicity problems which could be resolved in kernelspace by creating the directory entries bottom-up rather than top-down. I’m not convinced it’s worth the complexity, though.

        EDIT: nevermind; you can do the same thing in userspace by creating the directory structure elsewhere and moving it. There are obvious problems with that too, of course…

        1. 1

          What problems do you think kernel space can avoid? It can’t be more atomic than userspace really.

          If you’re worried about someone deleting directories as you progress, you can always use mkdirat instead and ignore that.

          1. 1

            Say I want to mkdir -p a/b/c. First, I create a (or fail to create a). Then, any number of the following things could happen:

            1. Somebody removes a

            2. Somebody creates file b

            3. Somebody creates directory b

            4. Somebody creates directory b and mounts another file system onto it

            5. Somebody creates b as a symbolic link to another directory

            Compare with the kernel, which can: allocate space for c; then allocate space for b and point it at c; then allocate space for a and point it at b; then create an entry in the CWD pointing at a. This mechanism of anonymous directories is likely to be supperted by the underlying FS (though not in all cases, e.g. NFS), but is not exposed to userspace.

            1. 1

              It’s not something the kernel can do atomically either. It can’t “allocate space for c then b then a”, because only the filesystem driver can do that. And most filesystems are not transactional. (I don’t know of any that are transactional across multiple directory creation) If your cwd is mounted on Fuse, creating “file c” may involve sending a letter to someone to chisel it on a wall :) Even with fs support, at that point we’d have a syscall which may do mkdir-p, or may fail for lots of random reasons and in every place you do it you need the fallback implementation anyway.

              Also, “Somebody creates directory b and mounts another file system onto it” - this may be exactly how your system is configured to work. “when I create b, its contents are going to be on the same filesystem/device and it’s going to be empty” is not a valid assumption on current OS.

              1. 2

                It’s not something the kernel can do atomically either. It can’t “allocate space for c then b then a”, because only the filesystem driver can do that

                I’m not sure what you’re getting at here. Is it that existing VFS layers do not support functionality necessary for this? That may be true, but it is irrelevant: it is an implementation detail.

                Or that some VFS implementations cannot support such a mechanism? That is true, as I mentioned, but it does not mean it is not a feature worth supporting. Consider CoW copies as another feature which cannot be supported everywhere, but which it is still worthwhile to support.

                most filesystems are not transactional. (I don’t know of any that are transactional across multiple directory creation)

                You do not need transactions for this. You are assured exclusive access to c, b, and a after you create them (under this scheme), because no on else has any way of constructing a pointer to them. The only way they can fail is if you run out of space. Otherwise, the only step which can fail is the last, of adding a pointer from the CWD to a, which is exactly what you want.

        2. 1

          There are obvious problems with that too, of course…

          Such as…?

          1. 1

            Where do you create your temporary directories? What if there is no other location on the FS where you have write access? What if somebody finds your temporaries and messes with them anyway?

            1. 1

              So create it in the same location as where the actual directory hierarchy is supposed to be created? E.g. pseudocode:

              def mkdir_p(name): # e.g. /a/b/c/d
                parent_dir, child_dir, child_dirs = split_dirs(name) # e.g. /a/b directory which already exists is the parent, c is the child dir, and [d] are the other child dirs
                dir_name = path_join(parent_dir, temp_file_name()) # E.g. /a/b/6a4b00fc-71b8-40e5-926e-07644b7530a3
              
                mkdir(dir_name)
                for child_dir in child_dirs:
                  dir_name = path_join(dir_name, child_dir)
                  mkdir(dir_name)
              
                mv(dir_name, name) # i.e. mv /a/b/6a4b00fc-71b8-40e5-926e-07644b7530a3/d /a/b/c/d
              

              And if someone messes with the temp dir hierarchy while it’s being created by my process, well, there’s not really much anyone can do about that? Any process can mess with files used by other processes if they have the permissions to do it 🤷

              1. 1

                This would not work for the one use case I create multiple directories. I store entries for my blog in the filesystem as YYYY/MM/DD/E (where YYYY is the four digit year, MM is the two digit month, DD is the two digit day, and E is an entry number). The code effectively does mkdir(YYYY); mkdir(YYYY "/" MM); mkdir(YYYY "/" MM "/" DD);. Your method would not work.

                1. 1

                  Why not?

                  1. 1

                    Not for the reason I thought (I thought if the directories existed with files, the files would disappear).

                    My blog software will recreate the YYYY/MM/DD directory structure for each entry I make (it was easier that way). It’s not posslble to do a mv a/b/c 2022/01/01 if 2022 doesn’t exist. And if 2022/01/01 did exist, I would end up with 2022/01/01/c.

                    1. 1

                      So I think you noticed that my suggested algorithm doesn’t have that issue, it creates only directories that don’t already exist.

                      EDIT: oh, I just realized the problem, good callout. But it should be pretty easy to fix up the algorithm. Anyway point is, it’s not that difficult to make a pretty atomic mkdir_p function.

          2. 1

            The big one I know of is that you can’t move files (or directories) across file systems (example: “/mnt/f” is a floppy drive, and “/mnt/usb” is a USB stick, you can’t must move files or directories across, you have to copy them).

            1. 1

              Yeah so my suggestion is to not move across filesystems but within the same filesystem. More details in my comment, https://lobste.rs/s/2nccou/fix_unit_test_open_giant_hole_everywhere#c_r2isbc

    2. 4

      I can sort of understand an inexperienced junior engineer making this change without knowing better. We’ve all been that person.

      But where was the more-experienced code reviewer in all this? It doesn’t take specialized security training to be wary of passing user input to a shell.

      1. 2

        Yeah, this comes across as “the platform we use has bad ergonomics, and someone worked around that in a naïve way, and now we come down on that person like a ton of bricks for doing so, rather than fixing the underlying problems of the terrible ergonomics and the need to work around them”.

      1. 3

        “You can’t read from files during unit tests: “;

        Please explain. I agree that if you have a doSomething(String data) function you should write a unit test and feed it a String of data, not a path to a file. But how would you unit-test your “read a file” function in the step before, assuming it’s not just a standard library call.

        1. 1

          That could be a test but it should not be mixed in with your unit tests. Unit tests should be purely computational. If you do this uniformly, you end up with a layer between computation and IO that most systems lack. It’s ‘separation of responsibilities’ and good for design.

          Some guy wrote about this a long time ago. https://www.artima.com/weblogs/viewpost.jsp?thread=126923

          1. 1

            Well, I know that school of thinking and I disagree. I can’t tell you why the filesystem is different than the database or the network, but it feels like it is somewhere between unit and system test, whereas I agree with the other points.

            I’m also not alone with this opinion or maybe it has been shaped by the people and teams I have worked with. Maybe it is because databases and network are kind of horrible to mock away and can be in any state of disrepair… but the filesystem can usually be persuaded to give you a file. Maybe it’s also moot fighting over the definition. I’ve just seen a lot of code bases that don’t even have anything but “unit tests” which turn out to be just that, plus file system access, but nothing with databases or networks.

    3. 2

      As I recall, apple’s libc shells out to perl for some regex replacement. This is much worse.

    4. 1

      Does anyone know a good way to prevent things like this from happening? I see it a lot, especially like the article in creating gaps in software security. Yet it’s unreasonable to have a core group review all of the code, and it’s also unreasonable to expect everyone to know everything.

      With “training” what it is today, many junior and maybe even higher engineers are unaware of, or uninterested in, this kind of software reasoning. At my last company, the security team desperately tried to get everyone on board with security is a company-wide concern, but the teams I was on very rarely thought through security implications and would just say “oh, we’ll just have security review.” Which I think is an important step, but security was not nearly a big enough team to review everything and only had the time to look at the big picture leaving all of the actual code unreviewed (but theoretically “signed off on”)

      It almost seems like these gaps are inevitable with current software practices. I wonder if maybe Go had some of the right idea in reducing the power the developer has[1], but perhaps what we need is something like RBAC but for languages: different roles have access to different software primitives. No idea how that could work in practice, and there are a lot of issues with the idea (as well as it couldn’t cover everything), but I wonder if some big, crazy idea is what is needed to get software security to where it should be.

      [1] Not trying to be inflammatory here, the original goal of Go was stated to be this, though the language has evolved and it may no longer be the case

      1. 3

        I think this has to come top down. As I’ve gotten farther up the leadership chain I’ve started requiring security analysis from the teams who roll up under me. List of CVEs, Developer driven pen testing, Library patch status. I have a templated report who’s realy purpose is just to get the team to start looking at and considering this stuff.

        It only works because I as a person in leadership is requiring them to provide a report on it to me. And I then provide them with budget to tackle any issues the report identifies. I don’t accept a “There are no issues” report either. That just tells me they didn’t do the work.

        1. 2

          Would be cool if you published your templates, your process and such results as you can.

      2. 2

        I think the issue is that our tools are lagging behind our practices; unrestricted I/O access in business systems is an anti-pattern, but most programming languages lack the semantics necessary to deal with it. Even languages like Haskell, where side-effects must be explicitly dealt with, lack the ability to reason about what kind of side-effects a piece of code can perform.

        Relying on processes and training is also unsustainable IMO - if it’s possible to make a mistake someone will make it eventually, not to mention the cost and mental overhead required to maintain a strict security regimen.

        I believe the right way forward is something like capabilities - this was a very interesting post on the subject: https://justinpombrio.net/2021/12/26/preventing-log4j-with-capabilities.html (which is also trending right now, see https://lobste.rs/s/lumsvs/preventing_log4j_with_capabilities )

        Of course you could also restrict I/O access at the container or runtime level (see eg Deno https://deno.land/manual@v1.17.1/getting_started/permissions) but IMO this is a bit too coarse-grained. For example, what if you want to give file system access to your own application code to but not to vendor modules? Containers only provide one isolation level per application, but with capabilities you could control exactly what parts of the code can access a certain resource.

      3. 2

        #define system system_function_forbidden

        There are some projects with more complete list of security restrictions like that.

        1. 1

          Those work to an extent, along with various linters. But I’ve found those are often the first to be disabled when someone runs into an issue, rather than understanding why they are there in the first place. Thinking back on my comment, I guess it really boils down to “I wish more software engineers cared about security, and treated it as something to preemptively think about rather than reactively think about.”