1. 77
  1. 18

    but an architecture diagram can’t give all details without becoming the code it’s meant to represent

    I struggle with this, too.

    Creating the diagram then becomes a three-step process of a) create a template for the diagram b) create tooling to update said template c) throw it all away in favor of a never-realized, fever-dream-induced vision of a series of networked agents crawling through my infrastructure and creating diagrams themselves.

    I, like the author, would love to find something that can be scheduled to update as needed, but not need definition to the point of being a separately maintained code-base.

    1. 7

      I’ve made hundreds of plantuml diagram for architectures, it’s been my go-to for almost a decade. And while I end up with the diagram I want in structure almost immediately these days it didn’t happen easily. The idea of taking a system graph as a dump from a system, putting that into something like plantuml and getting consistently good diagrams I can put in a document seems impossible to me.

      There is just too much information in the order of the adjacency definitions, not to mention the hidden ones we have to add to make it look sane. And then we have the issues of arch labels being placed mostly poorly, and as a result I avoid them when I shouldn’t.

      We probably need a system that has more tunable layout system than plantuml, making “styles” for it is infuriating because of inconsistencies and in the end you usually find out you can’t do what you want anyway.

    2. 16

      I’ve been playing with Mermaid[1], D2Lang[2] and Diagrams(mingrammer)[3]

      • D2 seemed to have better layout by default.
      • Mermaid didn’t support cloud architecture graphs by default. So had to use the python Diagrams library for those. I believe there’s open tickets to add support to mermaid.
      • You can play with all of this here: https://text-to-diagram.com/

      [1] https://mermaid.js.org/

      [2] https://d2lang.com/tour/intro

      [3] https://diagrams.mingrammer.com/

      1. 7

        The big advantage of Mermaid is that it’s integrated with GitHub so your diagrams will be rendered in the Markdown preview automatically.

        1. 3

          I spent an afternoon quickly throwing our AWS architecture into Diagrams and I am extremely happy with the result. I think for documentation alone it is simple enough that our lone devops engineer can quickly add and change some of the python code when we make system changes.

          1. 2

            D2 ended up being my choice.

          2. 9

            I sympathise with the author’s hurdles but I think they are symptoms of a process-related problem, not a shortcoming of the tools, or of how they work.

            The root of it is here:

            When my previous engineering manager joined the Atlassian Marketplace team, he asked everyone to draw an architecture diagram. Each came out extremely different.

            The author doesn’t say it but my guess based on this line is that no such architecture diagram existed at the time, for any engineering definition of the word “existed” in any case. Obviously a lot of thought had gone into the system’s architecture, but:

            1. There was no authoritative architecture specification
            2. That was kept under version control
            3. Against which the implementation of every module was reviewed
            4. Which was regularly re-evaluated
            5. And the design of which everyone understood not just in terms of “how” but also in terms of “why” –at least when it came to the overall design (like, why is there an arrow that goes from fronted to Marketplace?) and to whatever modules they were primarily working on.

            That’s why no one ever remembers to visit the Confluence page and add their box in the architecture: because “the architecture” is not part of the project, it’s filed under that “documentation” thing that the Agile product owner the organisation, let’s not start a witch hunt here, never actually bothers to budget time for but instead assigns some meaningless duration like “5% of the overall time”.

            If a project were to include a specific “documentation” headline, with actual time (and money!) budgeted for it, and people have tasks called “document module X” with “architecture”, “implementation” and “testing” sub-tasks, the way they have “test module X” with “unit test”, “integration testing”, “performance/load testing” sub-tasks, it would likely have all these things – up-to-date versions, “zoomed-in” views (that’s normal for large enough projects) etc.. “Likely” as in it would get done about as well as everything else in the project.

            Otherwise, all these things just get tossed away like things developers scribble for their own use because that’s what they are. They don’t get all the amenities of project deliverables because they’re not considered deliverables.

            This is not a technical problem. You can’t solve it with better technical tools. People were producing up-to-date architectural diagrams in excruciating detail decades before Visio were a thing, with paper and Rotring pens. There haven’t been any technical shortcomings to producing detailed, up-to-date, very formalized diagrams like these in more than a century – the problems are entirely organisational.

            FWIW I found that tools which go clickety-click are actually excellent for usage under such scenarios. By the time you get a code-based diagram editor to produce a diagram that’s actually fit for permanent and iterative refinement, including such absurd requirements as moving various modules to explore various options in a collective setting while projecting it on a screen, color-coding it on demand based on each module’s bus factor, playing with various views along with the people who write customer-facing documentation and so on, you wind up with a few thousand lines of weird markup that no one understands.

            Maintaining a formal, code-based version of your architecture in addition to all that is cool if you actually use it for formal purposes – e.g. if you use it for code generation, or formal verification. Otherwise, I’d honestly just go with Visio.

            1. 4

              Wow, lots of assumptions! No, there were architecture diagrams and they were maintained.

              The reason for the exercise was because the team had drastic changes and most people were new to the systems. It’s also just a good exercise, it’s very revealing. Try it with your team and see how different the diagrams will be. Simon Brown of the C4 Model loves to do this exercise too, for the same reason.

              I agree that specifically making documentation a task would ensure the architecture diagrams are up to date but I definitely think a part of the problem is that these tools are tricky, time consuming and somewhere else, other than in your code’s repository.

              1. 2

                Try it with your team and see how different the diagrams will be.

                Maybe I am misunderstanding something here? My experience with architecture diagrams is mostly on the systems, firmware and hardware side of things, so we routinely end up with documentation that needs to be accessible to people working on very different abstraction levels. So if you pick two people at random, it’s likely they have pretty different backgrounds, and can’t rely on each other’s common sense to fill in the blanks – not because they lack common sense, but because one might have their common sense trained on video algorithms and the other one on transistors.

                I would be extremely worried if the people on my team came up with radically different diagrams about how the system we work on was put together. It’s one of the things I always pay attention to (not that I can take credit for it, I learned it from someone, too) – people draw these diagrams all the time, in all sorts of informal settings, from onboarding new employees to discussing bugs or optimisations. If their diagrams differ radically, that’s a huge red flag. It means there are different ideas about how the system is put together floating around, so some of the people on the team are writing correct code only accidentally. (Edit: and, to clarify, it’s not their fault. Managing knowledge, and making sure everyone has not just access, but a working understanding of what they need to do their job, is a huge part of software dev management)

                I would obviously expect some details to be abstracted away in some diagrams, on account of both technical reasons (e.g. if the people writing the web UI need to understand intricate details of the kernel architecture, then that architecture is bad) and non-technical reasons (like Conway’s law). But even then, I would expect the right details to be abstracted away – that is, I would still expect people who work in one module to know some of the architectural details of other modules if they’re relevant for their work. For example, I’d be okay if the folks doing high-level graphics code just drew a blob that says “rendering pipeline” in an informal exercise, and not read too much into it. But if, upon being asked to ellaborate, some of them would draw something reasonably close to the standard OpenGL rendering pipeline and others would just draw a big framebuffer, I would have a very bad moment of introspection.

                (Edit: most important of all: I’m sorry if I jumped to conclusions – I usually here these things in… well, that kind of context. That’s no excuse for being mean.)

                1. 8

                  If their diagrams differ radically, that’s a huge red flag.

                  I don’t think so. Like I wrote about, some people have focuses, e.g. frontend vs backend. I’ve been on projects where I know that the frontend is big and complex and integrates with many systems, but I don’t know the specifics. I don’t even need to know the specifics to do my job effectively.

                  I can draw an architecture diagram that’s super useful to many people, but just has “frontend” as a big box. I don’t think that’s wrong. It’s just a view of the architecture.

                  1. 2

                    Well, yes, of course that’s okay – because it’s still the same architecture (of course some details are abstracted away). As long as everyone’s drawing the same architecture, it’s obviously okay if they’re drawing it in different ways. A drawing is just a way to express some knowledge – as long as everyone’s expressing the same knowledge, nothing is lost. It’s a red flag if people start drawing different systems.

                    What I was trying to get at is that picking particular technical means (D2 vs. Visio or whatever) won’t fix organisational problems. If architecture diagrams are ever out of date, that’s because, one way or another, your process allows for people to start writing code before there is a formalized consensus regarding the architecture, or if you really want to keep it super-agile, it allows for people to say they’re done with a task before any architectural changes have been documented. Changing how you draw diagrams isn’t going to fix that.

                    1. 2

                      If architecture diagrams are ever out of date, that’s because, one way or another, your process allows for people to start writing code before there is a formalized consensus regarding the architecture…

                      I don’t think so. We might all agree on the architecture change but just not update a diagram. Easy step to forget or easy to think it’s not your responsibility to update the hand crafted pixels that an architecture/senior engineer has spent a bunch of time on.

                      Yes, more process can help but… That has the downsides of more process.

                      1. 3

                        We might all agree on the architecture change but just not update a diagram. Easy step to forget or easy to think it’s not your responsibility to update the hand crafted pixels that an architecture/senior engineer has spent a bunch of time on.

                        That’s very much an organisational problem. Does it ever happen that you all agree on, say, adding Webauthn support for release 2.4, but just not add the code, because the person who got the assignment forgot, or thought it wasn’t their responsibility? I hope not…

                        Same thing here. If it’s a mandatory step, it should be a separate task/be a part of some wider task description/review checklist/a separate sub-task/whatever, just like everything else that’s considered important enough that it should not be forgotten. No one remembers every single task they have to do, that’s why we go through the effort of making task lists and hold meetings to assign everything on the list. If it’s not filed along with all the things that have to be done before you can say a task is done, of course it goes out of date.

                        If the person who’s expected to update it doesn’t even know they should update it in the first place, that’s a textbook organisational/leadership failure. Making sure people who are expected to do something know they’re supposed to do it is just a fancy way of saying delegation, that’s literally what the team lead’s job is. If updating the diagram isn’t done because someone was “assigned” but doesn’t know about it, that’s not going to get fixed by changing how it gets done.

                        It doesn’t have to mean it’s because the team lead isn’t doing their job. More often than not updating the docs just isn’t treated as important enough on an organisational level. That’s why bugs get fixed, but documentation isn’t updated to reflect the fixes. Fixing the bugs is considered important enough that they get on Jira or whatever (to make sure they’re not forgotten) and people see “fix X” on their task list (to make sure there’s no doubt as to who has to do it). Documenting it just doesn’t get the “we have to do it” treatment. If it was treated just like bugs, it would get fixed just like bugs.

                        “More process” doesn’t have to be some fancy twenty-step procedure. It can very easily be an additional “documentation” sub-task that goes in every task, just like “unit testing”. Have it go through the same review process, treat it the same as every other part of the task, and it’ll get done just as well as everythiing else.

                        1. 2

                          It can very easily be an additional “documentation” sub-task that goes in every task, just like “unit testing”.

                          We don’t do that. As part of writing code, you’re expected to test it. Maybe it should be the same with architecture diagrams, but we’re not there yet.

                          1. 2

                            Yep, that’s what I mean. IMO, if having an up-to-date architectural diagram is important, then it should have the same status. Just like code has to be tested before it’s considered “actually written”, it also needs to be documented before it’s considered “actually written”. Otherwise, you will always end up with code that’s written but somehow not reflected in the docs – because people will write it (for the organisation’s definition of write – it’ll be written, tested, reviewed etc.) and move on to other things they have to do. There’s always stuff that has to be done in the pipeline – unless documentation is in that pipeline, it will eventually get preempted.

              2. 2

                While you aren’t describing the author’s experience, you are describing something I’ve experienced. It’s a real problem when nobody – not even managers – understand the architecture which they oversee.

              3. 8

                So instead of a diagram you fail to update over time when changing the implementation, now you’ve got PlantUML (or Haskell) you fail to update over time when changing the implementation.

                If you kept your diagrams in your repo, then you could update them in a PR along with the changes just the same as you could update the PlantUML in a PR along with the changes. But in most cases this won’t happen for cultural/social and time reasons that have nothing whatsoever to do with diagram editors vs PlantUML vs whatever-silver-bullet, and only once you’ve solved those other issues will you make any progress.

                1. 7

                  My experience is that updating things via a GUI diagram editor is tricky and time consuming. My theory is that’s often a big part of why diagrams get outdated.

                  They’re also probably “out of the way” by not being in your repository.

                  1. 4

                    And if they’re kept up to date, these GUI diagram files are also a source of merge conflicts as they’re often a huge binary blob, or a jumbled one-line XML file…

                    1. 1

                      Oh yes, I completely forgot about that issue but it’s a huge one. Thank you for reminding me!

                  2. 2

                    I keep Alloy and some Elixir (in my case) models in the Repo itself, they do get outdated, but having Git control over a text format is for me a long way from using Google Slides.

                    Keeping them update is a problem, but it comes after ou reap some benefits

                    Not arguing, just contributing with my personal experience because I love the subject

                    1. 1

                      So what’s your solution?

                      To me this is another form of “A bad developer will write bad code in any language.” If the devs don’t do their job, and the process doesn’t enforce it, then the results aren’t going to be good.

                      1. 1

                        Obviously the solution is “enforce a culture of updating the diagrams, don’t imagine a change in tool will magically make that culture appear”?

                    2. 3

                      I tried a couple tools already: PlantUML/C4, D2, Mermaid and what I thought would be amazing but I found it too hard to use, CML plugin in Eclipse.

                      Today I tend to model high level aspects of architecture using Alloy. For example, I use a signature that represents components and they can have relations representing which systems they send requests to, and whether the requests are sync or async. It is also relatively easy to model job queues, caches.

                      1. 3

                        The problem with code based diagrams is that the automatic layout is often really bad. I’ve gotten some success by carefully tweaking graphviz clusters but the best solution remains, eternally and regrettably, tikz.

                        1. 1

                          I find the Elk layout engine with d2 to be, if not beautiful, pretty good.

                          My big gripe is with fitting a diagram on a letter sized piece of paper.

                        2. 3

                          I agree architecture diagrams should be code. I even made my own tool https://www.sheeptext.com/ to make simple diagrams I typically churn out easy to make. The tool is rough and buggy but I feel like there is potential to make it more declarative. It’s written in Rust so the frontend is WASM and there is a CLI (that I need to publish). I am planning on open sourcing this.

                          I feel like architecture diagrams can’t be automated and is beyond documenting all interfaces and request/responses. It’s kind of like how when you get better at chess you don’t turn into a human calculator, you’re not just brute forcing, your brain encodes patterns. Architecture diagrams are a very human encoding of patterns. That’s why, as you noted, different people focus on different parts of the system.

                          I also feel like architecture diagrams are the most impactful type of documentation a team can maintain, because it’s an efficient way of sharing mental models.

                          1. 2

                            made my own tool https://www.sheeptext.com

                            planning on open sourcing this.

                            jsyk, if it was open source I’d have made some diagrams today, despite bugs and roughness.

                          2. 3

                            Reminded me of encore.dev, which has a feature called Flow that shows an architecture diagram that’s always up to date.

                            1. 3

                              I hadn’t seen that, very impressive. I’ll spend some time looking at that further. Thank you!

                            2. 2

                              has anyone done this from kubernetes annotations? would be pretty neat to find a language to annotate the architecture from a typical gitops repository that contains all k8s deployments. we started to do this to annotate data flow (mainly to show which microservices read/write to a given topic in a distributed log)

                              1. 2

                                I tried a couple tools already: PlantUML/C4, D2, Mermaid and what I thought would be amazing but I found it too hard to use, CML plugin in Eclipse.

                                Today I tend to model high level aspects of architecture using Alloy. For example, I use a signature that represents components and they can have relations representing which systems they send requests to, and whether the requests are sync or async. It is also relatively easy to model job queues, caches.

                                1. 3

                                  Do you have an example Alloy model you can share? I’ve used Alloy before https://asim.ihsan.io/formal-modeling-as-a-design-tool/ and am curious how you model systems, especially queues and caches.

                                2. 2

                                  If someone adds a relationship to a new system, will they even remember to visit the Confluence page to click and drag and draw over the architecture diagrams?

                                  Maybe we should write architecture diagrams using code instead. With code, we can update architecture diagrams within a pull request, version them and quickly modify many of them at once.

                                  What makes someone more likely to open a PR against the diagrams repo than to edit a wiki page? I feel like the article either misses or rushes over that point.

                                  Generating architecture diagrams from the actual code of those architectural components? That would solve the problem above, and it would be fantastic, but it’s a different advantage to the one you (might) gain by storing your diagram definitions in a Git repo.

                                  1. 1

                                    Sorry, I probably did rush over that point.

                                    Two reasons:

                                    1. Editing a diagram on a wiki page using a diagram editor is often tricky and time consuming
                                    2. Ideally the diagrams would be in the same repository as your system
                                    1. 2

                                      Ideally the diagrams would be in the same repository as your system

                                      Does that approach pretty much require you to use a monorepo? If services were spread across multiple repos, finding all the diagrams that mention the service you’re about to change seems like it’d get frustrating.

                                      1. 2

                                        I think op’s idea goes hand in hand with [[Software Registry]] à la backstage. If you have an automated central registry where you automatically collect all of your repos, then it would be a good place where to look for other repos and potentially display those diagrams.

                                        1. 1

                                          Yeah good question, I’ve been slowly moving towards putting multiple services into the same repository. I wouldn’t say “mono” because I still find it useful for teams to have their own repositories.

                                          Not sure, maybe there’s no good single place for most teams. Maybe it’s just my ideal that I’m trying to work towards.

                                        2. 1

                                          I’m drawing software diagrams for my home stuff, and I’m realizing that, while I can’t put all the stuff I deploy in a single repo, what I CAN do is use the d2 link ability to provide a link to the relevant documentation and then repos, and then a back-link from the repo to the documentation page. That turns the diagram/documentation repo into the central “What is where and why do I care?” repository.

                                          It doesn’t need to be in the same repo, but it needs to be useful in the course of normal work.

                                      2. 1

                                        niiiiiiice

                                        1. 1

                                          I’m impressed by anyone who can get automatic diagramming tools to produce anything that’s readable. I’ve tried, but all my graphviz/plantuml/diagram diagrams look like hot garbage unless I explicitly position components relative to each other.

                                          I hope this isn’t an indication that our architectures are flawed.

                                          1. 1

                                            KeenWrite, the FOSS Markdown text editor I’ve been working on, includes the ability to render plain text diagrams via Kroki†. See the screenshots for examples. Here’s a sample Markdown document that was typeset using ConTeXt (and an early version of the Solare theme).

                                            One reason I developed KeenWrite was to use variables inside of plain text diagrams. In the genealogy diagram, when any character name (that’s within the diagram) is updated, the diagram regenerates automatically. (The variables are defined in an external YAML file, allowing for integration with build pipelines.)

                                            Version 3.x containerizes the typesetting system, which greatly simplifies the installation instructions that allow typesetting Markdown into PDF files. It also opens the door to moving Kroki into the container so that diagram descriptions aren’t pushed over the Internet to be rendered.

                                            †Kroki, ergo KeenWrite, supports BlockDiag (BlockDiag, SeqDiag, ActDiag, NwDiag, PacketDiag, RackDiag), BPMN, Bytefield, C4 (with PlantUML), Ditaa, Erd, Excalidraw, GraphViz, Nomnoml, Pikchr, PlantUML, Structurizr, SvgBob, UMLet, Vega, Vega-Lite, and WaveDrom.

                                            Note that Mermaid diagrams generate non-conforming SVG, so they don’t render outside of web browsers. There is work being done to address this problem.

                                            1. 1

                                              I do feel like you can get some of the way here by generating import cycle diagrams. Your most atomic level is just the raw import graph, and then you can layer some groupings on there (for example, all of the serializers.py files belong at the API layer, even if they’re in different directories) and you have at least one way of setting this up.

                                              This is also where stuff like Python shines, because you get a bunch of reflection “for free”, and you can add some tweaks to generated results by being judicious. You can pull out docstrings as well!

                                              There are libraries like django-fsm, where you define state machines that can kind of make things clearer. You have to trade off expressiveness, but honestly for a lot of problems that works fine! this isn’t the same as an architecture diagram, but I think some similar ideas might be had.

                                              One idea I’ve had floating around for a while is a DSL that can desribe most business logic, and spits out that logic in a “nicely formatted” way. Basically a wrapper around code, but which would let you offer “execution traces” to explain to stakeowners (for example) how the latest invoice to a customer was calculated, or why a notification went to A but to B.