1. 21
  1. 4

    I believe the modern equivalent that is obvious and easy for compilers to understand is

    int i;
    for (i=0;i<n-3;i+=4) {
     doIt(i+0);
     doIt(i+1);
     doIt(i+2);
     doIt(i+3);
    }
    for (;i<n;i++) {
     doIt(i);
    }
    

    where doIt() is a static inline function or a macro.

    1. 3

      Great write up! Explained the basic structure of Duff’s device so well, that I finally understand it. One big item on my todo list for years already is, I still cannot wrap my head around how the functionality of Duff’s device results in such beautiful structures like C coroutines, protothreads and C Async https://github.com/naasking/async.h

      I think I gotta program it myself to properly understand it…

      1. 3

        There’s a really nice piece on exactly that by the inimitable Simon Tatham, author of PuTTY (and much else).

        1. 2

          Many thanks for the reference, will read through.

        2. 2

          Yeah, coroutines is where I’ve encountered the Device before. (And actually implemented it once in a hairy experimental C++ implementation of async/await.)

          The way the Device works there is it lets you jump back into the middle of a function.

          • You wrap the whole function in a switch block.
          • When you need to “pause” it (i.e. call “await”/“yield”) you store a number representing what line you’re on, then return.
          • The following line is a switch case with that same number.
          • The function’s entry point fetches that saved line number and uses it as the argument for the switch … which takes it right to the line following your “await”.

          That’s the meat of it. The rest of the work is (a) wrapping that stuff up in some gnarly macros to make it look pretty, and (b) finding a place to save that line number, i.e. the coroutine’s context.

          The downside is that it’s difficult to use local variables in such a function because they get reset every time you yield. (IIRC I was able to get around this in my implementation by secretly wrapping the switch statement in a mutable lambda that captures the enclosing locals, turning them into saved state.)

          1. 1

            Thanks for taking the time to lay out the details. So this is the missing piece that was in my mind - the precrocessor is involved to incorporate line numbers. That’s super clever. Meta programming in C / C++ is on a whole other level of insanity and I love every drop of it.

        3. 1

          Compiler optimized everything away with -O2 and above.

          1. 1

            I wonder if anyone applied this idea on creating SIMD compatible vector values. I guess it wouldn’t be the same thing as the Duff device as the computation(data) call instead of being called multiple times becomes computation_simd(vectorized_data).