1. 8

  2. 2

    Thank you for writing this Andrew. It is a problem that has been on my mind on and off for a while. But you don’t solve it entirely. If you wait with doing eggs until chickens and flour is done, then you are potentially wasting time there. If chickens finish early, then of course it’d be faster to let the computer finish flour while starting on the eggs.

    I’m interested in what an optimal algorithm for this problem would look like. I feel like the problem must be common enough that it has been solved many times, but all googling ever gets me is topological sorting and no mention of parallel anything.

    1. 3

      Great question! I think that the strategy that I described in the post is only useful for building static execution plans that can be run just using loops and WaitGroup (or similar). To come up with an optimal plan, you would want to build something like a dataflow graph. I am pretty sure that you could use a similar Graph data structure, but instead of performing a topological sort, you would kick off a process that would run the leaves, and whenever one completes, you could remove it and check to see if there are any new leaves to run. Thre are better ways to create a dataflow graph though. Maybe that should be my next post!

      1. 2

        In a system at $JOB we do exactly that. When one task finishes, we loop through all remaining tasks to see if there is a new one ready. It works perfectly because there aren’t that many nodes. … But of course… if instead of 50 nodes, we had 50K?

        Data Flow Graphs seem to be the term I have been looking for. Thank you! :)

      2. 3

        IIRC these are “scheduling problems”. You’d want to look at constraint solvers and operations research. I know MiniZinc has a builtin for scheduling but assumes no dependencies on tasks; you can add that on top manually.