1. 9

  2. 8

    I’m surprised there’s no mention of the obvious solution. select/poll with a timeout. get back EINTR, you got sigchild. get back 0, timed out.

    1. 2

      Maybe I’m misunderstanding, but how does that help with a timeout on, say, mkdir(2)?

      1. 4

        It doesn’t, but I’m not sure that what people think would happen is what’s going to happen. mkdir isn’t usually interruptible. If your NFS server goes away, you’ll have an unkillable process until it comes back. You’re not going to work around that with threads or alarm signals or whatever. The issues go much deeper.

        1. 2

          Is this a place where expanding AIO might be useful? I have not used AIO other than poked around the man pages in FreeBSD a bit. I’m not really sure why making file reads/writes needs a whole new abstraction, though…

          1. 4

            So one other thing is that mkdir is going to call malloc in a half dozen places on the way down. Any one of those calls can block indefinitely for memory to become available, and they can’t be interrupted because the callers don’t check for null. Real time kernels are the next aisle over.

            1. 3

              Expanding AIO requires major kernel restructuring of what is currently straight line code so that either it’s explicitly asynchronous or the current synchronous code is always spawned in a new kernel thread with notification on completion or error. Neither are small efforts, which is a good part of why the (kernel) answer so far has often been some version of ‘if you want this, use user-level threads to do it yourself’.

            2. 2

              What a user friendly program can do then? Run all file system touching syscalls in separate process? If it hangs give an option to the user to kill it (and leave a zombie I presume) or wait and notify when it will resume?

              It’s honest question. I personally think it is sensible to create a separate process.

              1. 3

                Yes. This is very sensible and it is the exact model encouraged by erlang.

                It doesn’t have to be a separate process (isolated fork) though– a pthread is usually fine as long as your data flows go the right way.

                1. 3

                  Ignore it until it goes away? It’s funny, last week we had a rant that unix sucks because worse is better means all these silly errors have to be handled by userland. Now we have a syscall which handles retries and failure for you and people want the opposite.

                  1. 2

                    Document that you don’t support NFS. Advise your users to use CIFS instead (seriously, it behaves much more nicely, even when both ends are unix).

                  2. 1

                    If your NFS server goes away, you’ll have an unkillable process until it comes back.

                    mount_nfs -i ... shudders

                2. 2

                  Or another signal.

                  Or you’re running under xen.

                  1. 1

                    It would be nice if all system calls had an asynchronous version which returned a completion fd. Select()/poll()/kqueue()/epoll() on it to your hearts content. This would also be great for async io, with the caveat that the read/write buffers can’t be touched until completion gets signalled.

                    It’s not likely to happen, though.

                    1. 1

                      Well, think about some other cases… like chdir(). What async semantics do you want there?

                      1. 1

                        chdir() is one of the easier cases. Keep a reference to the current directory until you successfully look up the new directory. Then change over atomically and signal a success. The chdir() will either take effect some time between the system call being invoked and the completion being signalled, or you will get a failure.

                        It gets harder with system calls that can have partial effects before failures. mkdir, for example, may successfully the directory, but if it’s on an NFS mount, the connection may die before you can get a response. That means that the operations aren’t idempotent, and blind retries might break.

                        It’s not trivial to retrofit async actions in, but it would still be nice if it could happen.

                  2. 6

                    I’ve been told it’s common practice for vendors of things like NFS hardware appliances to patch the kernel (most likely BSD in this case) to add new system calls to make it possible to implement operations like this natively.

                    I remember this sort os thing frustrating the heck out of me when I used NFS.

                    Except it’s one for those fun things about NFS. It promises to get your data onto the network drive, even if someone tripped over the cable.

                    It will patiently wait for you to repair the cable, and plug it back in, and then write it all out safely for you.

                    Good Boy NFS.

                    However, these days “man 5 nfs” reports this option is available….

                    soft / hard
                    Determines the recovery behavior of the NFS client after an NFS request times out. If neither option is specified (or if the hard option is specified), NFS requests are retried indefinitely. If the soft option is specified, then the NFS client fails an NFS request after retrans retransmissions have been sent, causing the NFS client to return an error to the calling application.

                    NB: A so-called “soft” timeout can cause silent data corruption in certain cases. As such, use the soft option only when client responsiveness is more important than data integrity. Using NFS over TCP or increasing the value of the retrans option may mitigate some of the risks of using the soft option.

                    That NB is an important caveat.

                    A timeout is horrid. Consider a timeout on a write.

                    What does it mean? How much was written? If you write again, what does it do? Howmuch was or was not written previously.

                    For his particular problem, /u/tedu is correct. select poll wait for sigchild, except it has a small corner of horridness which is resolved by the self pipe trick. http://cr.yp.to/docs/selfpipe.html

                    Hopefully one day we can redo all of the I/O stuff in Unix. But I’m not holding my breath.

                    The trouble is unix I/O is the summary of decades of real world experience by tens of thousands of developers.

                    While I’m sure I/O could be done better…..

                    …I’m a realistic coward.

                    I have severe doubts on my ability to do better.

                    1. 6

                      I don’t think it’s fair to say that Unix IO is the summary of decades of experience. The truth is that a great deal of Unix IO was implemented more or less as the simplest thing that worked, and it was implemented on much simpler systems and in much simpler environments than Unix runs now. Many of our fundamentally synchronous IO operations like waitpid() and mkdir() come to us from V7 and 4BSD Unix, as does a great deal of the kernel design that would make versions with timeouts (or asynchronous ones) so difficult.

                      (Synchronous system calls without timeouts are much easier to write, at least in C, because you can maintain a huge amount of implicit state in the form of local variables, the contents of your stack, and even your program counter. Making things asynchronous requires materializing this state. Making it possible to have timeouts requires at least being able to safely unwind some amount of this state at any place where your code waits. This has historically been a not insignificant source of kernel bugs just in handling errors.)

                      1. 2

                        I don’t think it’s fair to say that Unix IO is the summary of decades of experience. The truth is that a great deal of Unix IO was implemented more or less as the simplest thing that worked, and it was implemented on much simpler systems and in much simpler environments than Unix runs now

                        That sounds pretty much like “summary of decades of experience” to me. It’s exactly what I would do…. “implemented more or less as the simplest thing that worked”. And then tweak, extend where needed, deprecated and remove where broken. ie. Exactly what the Unixy world, especially Linux has done.

                        ie. I didn’t mean, somebody all wise with decades of experience wrote it correctly the first time.

                        I meant pretty smart people wrote the simplest thing that would work, and then shepherded it for decades learning in practice what doesn’t work and coming up with fixes, and encoding that experience into the unix api.

                        If you look at the earliest versions of the unix API you will realise it has evolved pretty dramatically since then.

                        eg. And nowhere is that painful evolution more visible than in the time handling api’s….


                        The C/Unix time- and date-handling API is a confusing jungle full of the corpses of failed experiments and various other traps for the unwary, many of them resulting from design decisions that may have been defensible when the originals were written but appear at best puzzling today.

                        Linux does do async I/O these days…

                        But look deep into the async I/O libraries…. what are they? Threads. They are a smiling face on a threads. After reflection you left wondering whether you really wanted an async api.

                    2. 1

                      I don’t think we’ll ever redo all the Unix/C I/O APIs - more likely we will bypass them. Higher-level languages already offer nice interfaces to I/O - what I hope for going forward is unikernel systems that implement those directly.