1. 9
  1.  

  2. 2

    Can anyone explain why NetBSD is investing effort in a kernel implementation of posix_spawn? The API is specifically designed to permit userspace implementations, which is a big part of the reason that it’s so awful and no one wants to use it. The cost of execve is so high that a few extra system calls in a userspace posix_spawn implementation is negligible. With vfork, there cost of the new process creation is negligible and most of the multithreading concerns don’t apply: in the temporary vfork context the userspace thread has a new kernel context (file descriptor table and so on) associated with it and so can modify it at will. An in-kernel posix_spawn requires providing functionality in the spawn implementation that duplicates every single system call that modifies the kernel state associated with a process. A userspace implementation can just use chdir here directly.

    1. 2

      Was wondering the same. Looks like the goal is to replace fork+exec in sh for performance. Creating an address space from scratch avoids some reference counting and inter-processor interrupts, according to this previous blog entry: https://blog.netbsd.org/tnf/entry/gsoc_reports_make_system_31

      1. 1

        Odd. On FreeBSD, at least, vfork is very fast: so much faster than execve that there’s really no point in optimising it. It’s not the easiest API to use, because you are still running in the parent’s address space and so you have to make sure that you undo any changes you make there (release any locks you acquire, free any memory that you allocate), but if you use the libc posix_spawn then it does that for you and if you want to do something that doesn’t have posix_spawn support (e.g. set a resource accounting limit, enter Capsicum mode, or enter a jail in the child) then you’re using the same kernel interfaces.

        If I were to try to improve process creation on *NIX, I’d add a variant of pdfork that created an empty address space and then add variants of all of the system calls that modified a process to take a file descriptor. Windows actually does this: you create an empty process and then all of the system calls that can modify your process take a HANDLE to the process that they’re modifying. I’d love to have, for example, something like pdmmap that would let me map memory in another process. This would be a much cleaner way of setting up shared memory segments than passing a file descriptor to an anonymous memory object and having the other process map it.

      2. 2

        We’re not investing any effort, NetBSD’s posix_spawn has always been an in-kernel implementation from day one. This project is about extending the implementation to bring it up to spec for upcoming POSIX changes. The actual in-kernel implementation isn’t much code, either.

        As for why it was originally implemented in the kernel - why is anything implemented in the kernel? Why didn’t you implement sendfile in userspace? It’s not right to assume everything has or should have the exact same performance characteristics as it does on FreeBSD.

        1. 5

          We’re not investing any effort, NetBSD’s posix_spawn has always been an in-kernel implementation from day one

          Sorry, that’s my question: why did NetBSD decide to do a kernel implementation?

          As for why it was originally implemented in the kernel - why is anything implemented in the kernel?

          Generally, for one of three reasons:

          • It needs to access some in-kernel data structures that are difficult to expose cleanly to userspace.
          • A userspace implementation would be significantly slower (in terms of latency or throughput)
          • It needs to perform some privileged operations that require it to be protected from userspace (sometimes this leads to a privileged userspace daemon as the correct choice, though reason 2 can impact this decision).

          I don’t see any of these applying for posix_spawn, hence my question. Reasons 1 and 2 don’t apply, the API was specifically designed to allow pure-userspace implementations. Reason 2 may apply, but in a vfork + execve sequence the time is dominated by the execve call and process initialisation, so I doubt this is the reason. Presumably NetBSD had a Reason 4 that I’m missing and I’d like to know what that is / was.

          Why didn’t you implement sendfile in userspace?

          Sendfile is specifically designed to avoid a copy to or from userspace. An in-kernel implementation can DMA from disk to the kernel buffer cache and then DMA from the buffer cache to the NIC, with no userspace page-table updates or copies. A userspace implementation would either require at least one additional copy (reason 2) or require exposing the buffer cache to userspace, which is hard (reason 1) and would probably be difficult to do securely (reason 3).

          It’s not right to assume everything has or should have the exact same performance characteristics as it does on FreeBSD.

          Given that vfork was inherited by both from 4BSD and still has similar performance characteristics, I think it’s a fair assumption here.