1. 13
    1. 1

      I wish it were easier to do inter-process concurrency control. For example, it’s common to have tests or scripts that start up a server and resort to attempting once per second to connect. But the computer should know when the server is up! I tried doing this by passing a FIFO as a command line argument, blocking on reading from it, and having the server write 0 bytes to it when it’s ready. But it was tricky to get right and seems unusual.

      1. 3

        This works pretty well but depends on a known output string https://github.com/zombocom/wait_for_it

        1. 4

          Sadly https://zombo.com is currently broken. You can’t do anything at zombo.com. Nothing is possible at zombo.com.

        2. 3

          I really want to have an OS semaphore that has the following guarantees:

          • You can’t increment it by more than you’ve decremented.
          • When your process exits, it’s automatically incremented by the amount that you increment.

          When we build FreeBSD packages, we build each in a separate jail. We get some parallelism for this. We can also do make or ninja with a -j flag. What I really want is to give each jail a semaphore so that a make-like thing that is not actively malicious, but might be buggy and, especially, can crash, can use it for concurrency control. I want to be able to say ‘run up to 24 build processes’ but not statically assign them to different jails, where the worst that can happen is one jail uses all of the available concurrency (or lies and says it is doing but doesn’t), but when that jail exits (or when make crashes) everything else can continue.

          1. 1

            I wonder how well it work to use closing a pipe file descriptor as the operation that releases a semaphore slot — because the fd is auto-closed when the owner exits. It would need a server process, though, to detect when the other end of the pipe has closed. And acquiring a slot would require a request-response IPC to the server with file descriptor passing. More complicated and more portability troubles than make’s protocol, but it should fix the lost slot problem.

            1. 1

              This kind of thing (resource recovery on failure) is what the “assertions” in the syndicated actor model (https://syndicate-lang.org) are good for. Though more directly related to your wish would be some of the tuplespace models extended with fault handling.

            2. 2

              Can the server cooperate?

              If so, how about doing pipe() or socketpair() in the test parent process, then start the server with an env variable like TEST_READY_FD=3

              Then the server can check for that env variable, and write a byte to the FD

              The parent can wait for that byte, and then make a connection

              1. 1

                Yes that’s pretty similar to what I’m doing with a fifo.

                Another idea I had was to just use logs. That wouldn’t require me to change the server. For example, say I want to run python3 -m http.server and block until it prints a line starting with “Serving HTTP” (and give up after 10s, or give up if the first line is different from that). While still printing the logs to the terminal (or redirected wherever I want). I wish shell made stuff like that easier.

                1. 2

                  Yeah I think this is a common need, and also someone made a solution - https://github.com/zombocom/wait_for_it

                  I had two thoughts for shell


                  1. First is that I realized shell is a waitpid(-1) loop, not a select() loop! And this explains a lot of limitations with shell programming.

                  That is, waitpid(-1) gets you the next process that ended, and select() gets you the next file event

                  But if you want to grep for “Serving HTTP” - that’s a file event!

                  So waiting non-deterministically for both is actually a general Unix problem. The self-pipe trick addresses this, but it goes “one way” from processes to files:

                  https://cr.yp.to/docs/selfpipe.html

                  https://www.sitepoint.com/the-self-pipe-trick-explained/


                  1. Second thought is that turning shell into a select() loop is hard

                  Instead, we can probably do this with a shell builtin that starts a “sidecar” process

                  I call it “pipecar” … I have been thinking about the Zig progress bar / Ninja problem in shell:

                  https://lobste.rs/s/jxulih/zig_s_new_cli_progress_bar_explained

                  They both wait for process events and file events at the same time. But you can’t really do that in shell, since it’s a waitpid(-1) loop.

                  But I think the “pipecar” idea can solve that – a small process that wraps your process, and streams events into the main shell. It also prints to the terminal like tee.

                  And so it can also solve the “wait for it” problem. I think they are close to the same problem!


                  (Feel free to join https://oilshell.zulipchat.com/ if you’re interested in this – I put some more notes on our #shell-runtime channel. I’m not sure WHEN we can do this, but I think it’s a good idea, and solves a few problems with shell programming)

                  1. 1

                    Thanks, I’ll check out your pipecar idea on the Zulip.

            3. 1

              I just read over this GNU make description a couple days ago: https://make.mad-scientist.net/papers/jobserver-implementation/

              The problem it solves is related to recursive make … then you don’t have a single view of all tasks!

              On a different note: I was surprised that seemed to say in the last paragraph that the self-pipe trick is not portable. But this is a page from 1999 !

              This was rejected mainly because select is surprisingly difficult to use in portable code: it has different prototypes, etc. It is also reduced portability

              1. 2

                Yeah the whole select/poll wars are still an ongoing disaster, sigh. If you are implementing networking software then you already have to deal with select() and so the self-pipe trick makes complete sense, but I can see why the GNU make authors preferred not to introduce yet another portability shim.

                The solution they ended up with is a lot more complicated than it needs to be because of a design error in the jobserver protocol: it does not count recursive make processses as jobs. This means that each process that spawns jobs needs its own job counting semaphore in addition to the global semaphore, so it needs something equivalent to a select across its two semaphores. (GNU make implements this using the dup/close trick.)

                If recursive make processes counted as jobs of their own, they would only need the one global semaphore; the SIGCHLD handler could unconditionally write a byte to the jobserver pipe for each process returned by wait(); then make’s spawning logic does not need to dup or change signal dispositions: reading the jobserver pipe is enough.

                (edit) Actually, there’s a starvation problem if all the jobserver slots are used up by make processes that cannot make forward progress because they cannot spawn processes that do useful work, oops. (edit2) A recursive make process could release its own slot while it is running and re-acquire it just before exiting. That could lead to some make processes hanging around a bit longer than necessary but I think it would be benign.

                1. 1

                  it does not count recursive make processses as jobs.

                  Hm you mean the “submake” process itself doesn’t take a token?

                  Like if the top level is make -j 4, then a submake can spawn 4 concurrent processes including itself?

                  Having two semaphores seems a lot more complex … Is that the extra submake a problem in practice? e.g. if you start 5 processes on a 4 core machine, instead of 4, it seems like it should work OK, and not cause any catastrophic bug.


                  On the other topic, I thought you could code portable against select() (and poll() for that matter) in modern OSes.

                  I thought the main incompatibilities were the newer high performance APIs.

                  I would have thought the differences from POSIX are ironed out by now? I have seen convergence in libc and shells and so forth.

                  https://pubs.opengroup.org/onlinepubs/007908799/xsh/select.html


                  In CPython 2.7 (very old), I see

                  #if defined(HAVE_POLL) && !defined(HAVE_BROKEN_POLL)
                  #ifdef __APPLE__
                      if (select_have_broken_poll()) {
                          if (PyObject_DelAttrString(m, "poll") == -1) {
                              PyErr_Clear();
                          }
                  
                  #ifdef SELECT_USES_HEAP
                      pylist *rfd2obj, *wfd2obj, *efd2obj;
                  #else  /* !SELECT_USES_HEAP */
                      /* XXX: All this should probably be implemented as follows:
                       * - find the highest descriptor we're interested in
                  

                  So I guess those are signs of the problems, but there are lots of #ifdef in Python that you don’t need on modern OSes.