This is not really fork. For a start, fork implies copy-on-write mappings. If a process has a MAP_SHARED mapping (of a file or [anonymous] shared memory object) then both the parent and the child will see the same thing and it will be explicitly synchronised. You could do this via RDMA, but it wouldn’t be cheap.
Ignoring file descriptors also means ignoring the most difficult part of doing this right. VM migration is orders of magnitude easier than POSIX process migration because the amount of state in the hypervisor for each VM is vastly less than the state in a *NIX kernel for each process. A VM typically has a handful of virtual or emulated devices, often just a disk and a network. The only state of the disk device (other than the backing store itself) is the queue of pending requests, which is easy to transport. The only state of the network device (other than the external routing tables) is the set of pending requests and in-flight responses, which are easy to migrate. In contrast, each UNIX file descriptor has an underlying object and an unbounded amount of stream state associated with it. Migrating this properly is difficult for threereasons. First, there’s no introspection to automatically copy the state associated with the object. Second, state is shared. If I open a file and fork, then both processes will share the same file descriptor and reading with one will alter the state of the other. Third, the objects are often intrinsically local. For example, you can copy a file from the local filesystem, but the filesystem is a shared namespace and so you then alter the sharing behaviour between that process and any other process that has the file open.
I find it difficult to imagine this being generally useful because any nontrivial process is going to find itself in an undefined state after telefork. The UNIX process model is not the place to start if you want to end up with an abstraction like this. In fact, given the later use cases, an RPC server that runs some WebAssembly provided in the RPC message is closer.
I feel like I explicitly said that handling file descriptors correctly is super hard, although CRIU and DMTCP make attempts that work for the common cases. I also mentioned possible extensions to do both lazy copying and using a MESI-like protocol to do shared memory of pages across machines. What I have is just a fun demo to show what’s possible if you ignore the hard parts, and I say as much.
Just to have said it: That this was a limited tech demo was indeed abundantly clear in the post. Not sure why people are acting as if you’re claiming this to be production grade ready-to-ship software..
I really enjoyed reading the article, I can physically feel the excitement you must’ve felt when you first got this demo working. Thanks for writing it up :)
I’m sorry if I came across as overly critical. It is a neat demo. I’ve done something similar in the past and rapidly hit the limitations of the approach quite quickly. I’ve also read a bunch of research papers trying to do something similar as a complete solution and they all hand-waved away a load of the hard bits, so I’m somewhat prejudiced against the approach.
Cool thing but seems to just have been an accidental re-invention of a feature erlang has had for a very long time.
Sure but then your software has to be written in Erlang, this works for any language. The best part of the multiple people who’ve written this comment in different places is that as far as I can tell Erlang doesn’t even support process migration out of the box, you can only spawn a process on a different node which is more akin to copying the binary and running it like MPI does. There does seem to be a third-party solution for Erlang though: https://github.com/michalwski/proc_mobility
Maybe I’m wrong though, I haven’t really used Erlang. I really like the ideas and it’s a really cool system, but often I want to write really fast software, and then “use Erlang” stops being a viable solution.
Really all of this misses the point though which is that I just did this for fun because it’s silly, and as a mechanism for explaining some low level ideas to people who may not have encountered them before. I mention in the post that this has been done before and that my implementation isn’t actually useful.
Should have named it spoon() or spork() >.<
I love it — this is just the kind of screwing around with internals I enjoy, but don’t really have the time or mental energy to do myself any more.
From what I see it is what Erlang is all about.
Have you considered MPI?
We could perhaps build some kind of abstractionlayer on top of MPI that mimics Telefork