1. 18
    1. 6

      Ah… my hobby horse, let me ride it….

      State machines are the dual of threads. Every state machine (or conversely thread) can be refactored, into a thread (or conversely, state machine).

      The question then is, which is the better representation when.

      If you write your state machine as a transition matrix (ie. a matrix where the row labels are states and the column labels events, the cells the next state to transition to if you receive that event whilst in this state.)…

      Look at the transition matrix… is it sparse? ie. Most of the cells are “can’t happen”, then use threads, otherwise state machine.

      Now the other argument that applies strongly in the embedded space is threads have a stack per thread, which typically is one of your largest usages of ram.

      So if you have hundreds of state machines… you run out of ram just on stacks alone in the threaded representation… unless…. you use protothreads or in C++20, stackless coroutines.

      Conclusion:

      Most state machines I have seen in embedded systems are sparse (higher layers or design makes most transitions “can’t happen”.)

      Thus implementing them as C++20 stackless or Contiki style C protothreads is cleaner, much less boiler plate and much more readable.

      1. 5

        Thus implementing them as C++20 stackless or Contiki style C protothreads is cleaner,

        FWIW some related solutions

        • Foundation DB had a language on top of C++ that let them write in the straight-line / threaded / coroutine style, and then it was compiled to state machines in C++. For the low level portions of the database, threads were considered too expensive.
          • Unfortunately they got swallowed by Apple and I haven’t heard much from them since, but it’s a very interesting system.
          • “Testing Distributed Systems w/ Deterministic Simulation” by Will Wilson (2014) - https://www.youtube.com/watch?v=4fFDFbi3toc
        • Facebook also had some kind of state machine compiler in PHP to do concurrency, which may have led to Hack. I don’t remember the details, but I think is a YouTube talk out there that explains it.

        I spent last week looking through some docs of the P language, which had a good Strange Loop 2022 talk. It was first used ~2014 in the Windows USB 3 stack, and is now being used by AWS (e.g. for S3 strong consistency). The author of the language worked at both places.

        https://github.com/p-org/P

        Formal Modeling of Distributed Systems (2022) - https://www.youtube.com/watch?v=5YjsSDDWFDY&t=1s

        The big question I have from that talk is why he says they “regressed” from generating C code in the Windows case to having a separate model/spec and implementation in the AWS case. He says that it’s because of performance, but I don’t understand why you can generate fast USB device drivers but not fast concurrent protocols in servers (running on server hardware you control, etc.)

        edit: Also I remember a core claim of P is that if you write in their state machine language, it’s easier for the model checker to find bugs (explore the state space) than if you write in a C- or Java-like language. So that could be a benefit to the state machine style.

        They also have a notion of “deferred events” which can collapse the state space explosion. There is some “bounded delay” model I believe – will have to understand that a bit more


        I also mentioned at some point on lobste.rs that Node.js used to have this huge hand-written state machine in C for their HTTP parser, and then they switched to generated code.

        Look at the transition matrix… is it sparse? ie. Most of the cells are “can’t happen”, then use threads, otherwise state machine.

        Hm I would like to see this unpacked more. Examples? I think there is probably some truth there, but not sure it’s that simple

        1. 2

          I found the node.js reference

          llhttp - new HTTP 1.1 parser for Node.js by Fedor Indutny | JSConf EU 2019

          and my comment from a year ago: https://lobste.rs/s/rzhxyk/plain_text_protocols#c_gnp4fm

          Quote about maintaining state machines:

          Let’s face it, http_parser is practically unmaintainable. Even introduction of a single new method results in a significant code churn.

          So they switched to “straight line” source code that generates state machines instead, similar in spirit to FoundationDB.

          https://llparse.org/

          (I guess this is pretty different than P, which is more about writing in a direct state machine style, but you have a model checker that runs in a big parallel cloud of CPUs to explore the state space.)

        2. 1

          I notice in practice a lot of “state machine’s” in the wild are fairly informal.

          A strict model checkable one has a strict rule that the only determinants of which state you transition to is which state you’re in and which event occurred.

          In practice, sadly, a lot of real world code decays from that ideal.

      2. 2

        You might consider jotting that down as a blog-post. I’ve made a similar conclusion recently, but it took me a long time to realize that! This doesn’t seem to be as well understood as it deserves.

      3. 2

        I would say “function calls”, not threads. Threads are about concurrency, which has nothing to do with state machines – though on embedded the line blurs somewhat. Then you have coroutines, which are more or less a way of making function calls operate like their own little state machines.

        1. 1

          The curious “thing” with coroutines (and threads) is when a function blocks… the CPU can go off and do unspecified “other things”, and then “on some event”, something can happen to “unblock” it and continue the function on from the exact point it was blocked at.

          With a state machine, each function does a little, typically never blocking, and returns unwinding to the highest level… to do that unspecified “other things” and it’s the business the state machine implementation to work our where to carry on when that “on some event” occurs.

          What goes by unspecified is “If I have more than one state machine interacting… what’s happening?” is each state machine living in it’s own thread? On a thread from a thread pool? All on one thread?

          Hence when I make the statement “I can refactor any state machine into a thread and vice versa”, I say it in the general sense assuming the general picture.

      4. 1

        Could you expand on why threads and statemachines are dual?

        1. 1

          Give me any program written with a one, I can refactor it to be written as the other.

          It’s not an easy task, and the code may become less clean in the transformation, hence the need to choose wisely.

          Alas, threads (and events for that matter) are not defined by the language (especially C), but involve hard to reason about thinks like i/o, timing, hardware level context switches and schedulers.

          The core idea is with a state machine some often undefined top level is waiting for events, and then telling the state machine “this event happened”.

          With the threaded model, every chunk of state code is going to have a blocking “wait for event” point and then code to decide what to do with that event.

          If after every wait for event you have a massive switch statement… yup, the state machine representation is much cleaner.

          If for example, only one event is possible at that point…. well, then a state machine makes your code much worse.

          What we mean by “wait for event” is somewhat undefined, but I will usually be either using an i/o or thread synchronization primitive at that point.

          With a stackless coroutine or protothread, it unwinds back to that high level “wait for event” point same as with a state machine. So transforming a state machine to a protothread or stackless coroutine is easier than to a OS thread.

          Edit: Was going to fix the typo “hard to reason about thinks”… but it is an accurate Freudian Slip so leaving it as is.

          1. 1

            Give me any program written with a one, I can refactor it to be written as the other.

            I don’t really know what this means in this context. A state machine is a model of computation, a thread is a model of parallel execution. Generally, each thread runs a Turing-complete execution environment (e.g. a C abstract machine), which means that a single thread is strictly is a more powerful computational abstraction than a state machine. A set of threads can be implemented by a context switch after each small step in your sequential semantics and so is an equivalent model of computation (though one that may be a better match for efficient expression on multicore hardware)

            With the threaded model, every chunk of state code is going to have a blocking “wait for event” point and then code to decide what to do with that event.

            If your threads have stacks then even if each thread runs a state machine and not a Turing-complete language then your program is now as expressive as a push-down automaton, which is more expressive than a state machine, but not Turing complete.

            With a stackless coroutine or protothread, it unwinds back to that high level “wait for event” point same as with a state machine

            Okay, now we’re getting somewhere. This is the key part: stackless coroutines are an efficient and clean way of expressing state machines in a Turing-complete language. That’s true, but it’s very different from the computational theoretic argument that you started with.

            1. 1

              stackless coroutines are an efficient and clean way of expressing state machines in a Turing-complete language.

              …if the transition matrix is sparse. If the matrix is dense, it’s a less clean way. It’s a real world design trade off.

    2. 5

      The site is about embedded software but I think this is an interesting topic to discuss outside of embedded. For example, how you structure the internal state machine within distributed consensus libraries (i.e. leader -> follower -> candidate -> leader).

      It’s the part about race conditions halfway through the article that particular caught my eye.

      If anyone has other resources for how you can design these state machines in a parallel or concurrent environment I’d love to see your links.

    3. 2

      (explicit) state machines are a great way to communicate design. A powerful take on these are called Statecharts (aka Harel Statecharts).

      For C++, there’s open source library implementations. For that latter one there’s a corresponding book.

      And here’s a rust library

    4. 2

      I’m using xstate to model the foundation of our “editor” system (video annotation tools), its got really great tooling you can click “inspect” in the vscode extension the line above the statemachine & it displays an interactive chart in the next page.

      Overall its great & we can model fairly complex logic like streaming very large amounts of data predictably, but the syntax is very hard to read for any non trivial state machine. I tend to have an easier time than most with syntax but even I have to do double takes many times, it feels like there’s lots of room for proper DSLs here to get intentions across better.

      When I showed my team xstate, the response was like “ok so we should use this everything, right?” & I quickly replied, “noo”. For most things there are much simpler+cleaner tools to use, like one of my personal overlooked favourites - URLs.

      URLs have a lot of the desirable traits of state machines, you can isolate functionality to specific URL very easily & nothing else, and links are “state transitions”. You can store data in URLs & have it globally accessible. You can do “time-travel-debugging”, by clicking the “back button” :). URLs have pretty large limitations with data storage, but whenever possible they are pretty amazing. A great common example is storing data-table state.