1. 33
    1. 5

      About a decade back I did a lot of work inside Azure with the new region/datacenter buildout process, trying to capture & automate all the manual undocumented steps the author mentions in this post. It really is just phenomenally complicated, and made worse because any changes you try to make might only get exercised once or twice a week as new racks or datacenters come online - the slow iteration time takes features that should be done in days and turns them into features that drag on for months. I have great respect for anyone who is able to stick through that process for 3-4 years with its attendant perpetual on-call, because you’re the only one who understands why the automation failed and every hour the build is stalled costs like $10k per rack. Glad to hear from this author that apparently this process can reach completion some day.

      The whole process has given me a great penchant for automating things. I now wont even think about writing use instructions in a document without having those same instructions tested in a continuous integration workflow.

      1. 2

        The whole process has given me a great penchant for automating things. I now wont even think about writing use instructions in a document without having those same instructions tested in a continuous integration workflow.

        I love that as an ideal, but what do you use to keep the document in sync with the CI job? Do you have a way of doing this with things that involve steps in a GUI?

        1. 2

          Unfortunately no, beyond keeping the documentation in the same repo as the CI so they update in step. Although I’m considering adding a CI step to the tlaplus/examples repo that parses the README.md markdown with goldmark or similar to check whether a table of specs corresponds to the contents of a directory, so maybe that experience will open doors in that direction. Otherwise this actually sounds like a great application for large language models in the CI - “does this repo’s CI meaningfully exercise all the build, test, install, and run instructions in the documentation files?”

          Regarding GUI steps, it is definitely possible to test/script GUIs with projects like selenium. Thankfully I only really develop CLI programs though.

          Tangentially I’m also hoping large language models can close the gap between specification in TLA+ and implementation of the specified system. “Does this program implement this specification?” It’s a correspondence that is easily but meaningfully available for checking by developers.