1. 26

Something under a year ago I joined a growth-stage startup as Head of Architecture, to bring a bit of higher-level and longer-term thinking to our software engineering work. I think it’s going well, the CTO and the team seem to think so too, but how can we measure that?

From the perspective of “is this going well for the company”, it seems like many of the things we do as “architecture” are indirectly related to the outcomes - so I could ask something like “did we become more efficient as a software engineering team” and have an answer that doesn’t necessarily tell me anything about software architecture. Or they are too long term: “did you do that thing you said we should do 12 months ago” can be answered ‘yes’ for bad reasons and ‘no’ for good ones.

There are also things you can measure but don’t want to measure and certainly don’t want to optimise for: “how many meetings between stakeholders and engineers were there” or “did you produce any diagrams”.

Then from the personal perspective, there’s the usual leadership measurement problem that I only do some of the delivery work resulting from my decisions or recommendations.

Have you seen a software engineering team where software architecture worked well, and if so how do you know that it worked well?

  1.  

  2. 16

    I think it might be interesting to look at the blast radius for incidents, though, I have no idea what kind of software you work on…

    But imagine you have a single rails app that does everything for your customers and it goes down, due to a database failure, or something… what happens to your availability? It goes to 0 for the time the outage lasts. The blast radius of the database outage blows up your entire kingdom.

    You’re smarter than that, of course, so you move to a service oriented architecture. Then, a database goes down, and the whole kingdom blows up again.

    You correct that by creating more databases that exist per service, and your database for the “private messages” feature goes offline… you’ve affected part of your customer base, but only a small part of the entire kingdom is suffering— the blast radius was reduced, and your customers are happy because you’ve made decisions that keep partial availability under most problems that come up, so total outages are very rate.

    Also, similarly, the number of self-inflicted incidents caused by bad changesets/deploys. Your software engineering practices should explicitly target reducing this as a goal, and your architecture is very important in containment here.

    1. 1

      I like the “blast radius” analogy, thanks :). “Are we encountering fewer problems and having less difficulty as a result of those we do encounter” is a great thing to measure, along with “are we trying to find new problems in areas we couldn’t have thought about/would have run away from before”.

      1. 1

        The question is, less difficulty compared to what? To “us” doing almost exactly the same thing but with controlled bits different in a parallel universe? Because that’s the only way to compare apples to apples.

        1. 2

          Progress is measured by comparing parallel universe today to parallel universe yesterday.

          It’s important to remember that not all forward motion is progress, it’s possible for a code bases to grow simultaneously less stable, harder to work on, and less feature-rich over time.

          1. 1

            I agree. “Do we have this problem less often than we did before” could be a goal, but it takes a bit of asking why to find out whether that’s because we designed it out of the solution, or because we employ a bunch of avoid-that-problem people.

    2. 5

      I’ll respond with a question: What are you valuing in the solution?

      If you’re valuing software performance, what metrics are you watching? Queries per second? Mean response time?

      If you’re valuing security, what metrics are you watching? How many vulnerabilities are reported? Exploited?

      If you’re valuing maintainability, what metrics are you watching? Mean time to resolution for reported issues?

      If you’re valuing overhead reduction, are you tracking the participants and length of meetings so that management can get a sense of what meetings (and features) cost?

      If you’ve not already determined the things that you value in the solution, peruse the list of quality attributes and pick a few. Maybe five or six. Now prioritize them and choose one that is more important than all of the others. Start with that one when you think of all of the data that you can collect to indicate the behavior of the system with respect to that quality attribute. Note that I’ve very carefully avoided saying “measure” or “metrics” in that previous sentence. You’re observing and recording enumerable behavior. You want hard metrics, quantities and not qualities: quantities can be shown on a graph and be weighed against other graphs. Quality assessment should surface in the retrospective meetings that you conduct regularly.

      This aside, I can think of a couple of things for a quick check:

      • How painful is it to change major parts of the system? If you wanted to turn some internal function into an internal service ala SOA but not yet a remote service, how painful would it be to accomplish that?
      • Given any process in the system, i.e. an application process, a database process, a web server process, a log collector process, what happens when that service goes down? How quickly can normalcy be restored and how manual is the process?
      1. 5

        If you’re scared how a growth-stage startup CTO is judging your work:

        • Tribes - Is the company making a profit, or has piles of capital? If not, they will judge you on the amount you show you’re committed to getting the company back to profit / capital, no matter your role. Another wording this is are you part of the groupthink or not? If you’re not part of the tribe, doing a good job is going to be pointless if you’re not high level enough to have a board back you up.

        • Commitments - How “on-time” are your teams accomplishing their goals? If not, they will judge you on your failed commitments. Most likely your management has a severe lack of experience in executive roles, and being able to meet commitments is going to be magic to them. If you aren’t meeting your commitment - you are out of their control.

        • Bugs & PR - are your customers largely happy or disappointed in the quality of your product/solution? Could sales/bizdev blame you for lost accounts? This is where things start to get a little less political, this is about product quality control. How many bugs do you ship with, and how committed is the organization to handling any damage those bugs might cause?

        So, how do you judge your architecture in the face of this?

        • Tribal values - Your software should be as fungible in each direction as your company’s commitments are. If you are developing a backend for a mobile application - don’t worry about writing OS/kernel portable C++. But should you be able to give an estimate for adding Android support to your backend in a month? Maybe.

        • Commitments - Can people add things to your system without having to understand all of it? Are team’s commitments independent so failures don’t impact your entire system? Being able to have teams that can solve common problems, rewrite entire subsystems, etc without killing the productivity of the rest of your team is usually the sign of a relatively decent set of abstractions.

        • Bugs & PR - if you’re not tracking defects in any statistical manner, then start now. You have no idea how bad the situation until you have some visibility into your defect rate. One of the most useful stats that may be much harder to get in “modern” software engineering processes is the defect removal rate of each part of your engineering process. If you can’t get that, then you should have some visibility into how to associate parts of your product and parts of your process to what bugs came out of them.

        1. 3

          Those are some good ways to consider the architecture, thank you. I’m not scared of how my manager is judging me - we both seem happy with what’s going on - my motivations are to provide meaningful measurement of progress/benefit to the company, and from the selfish/career-centric side to have something more concrete than “well we both feel OK about this” in career reviews.

          1. 2

            You are thinking the right things as far as I can tell, and this was an awesome thread.

            My point, which was belabored I admit, was the most successful technical architectures for companies are not ones with the best technical output, but the best social output. I have many scars around that issue, because a lot of the time I thought I was solving a social issue that was technical, or vice versa. Being explicit about how they map to each other can help you navigate all kinds of issues, not just with management.

        2. 4

          I’d start with defining what the characteristics of a “good” architecture are. For me it’s things like:

          1. The system is reliable (new features are integrated quickly and without unwanted side effects)
          2. The architecture is easy to describe and understand (minimum complexity).
          3. The system is easily supported (it’s transparent what the system is doing from end-to-end, quick to discover the cause of unexpected errors)
          4. Flexible and easy to extend (think of a dependency graph, is it easy to make small changes without affecting other unrelated parts of the system)

          From thinking through what a “good” architecture means for you and your team, you could do retrospectives or other after-the-fact analysis on development work. e.g. for point 4 above “what % of the dev effort on that feature was related to refactoring the system to support the feature, as opposed to actually implementing the logic directly related to that feature”. I know that seems a bit vague and subjective, but having the conversations with your developers will bring up more opinions about the quality of the architecture.

          1. 4

            The success of your process might partly be the inverse of how much technical debt or problems you’re keeping out as you go along. On that end, I have two articles showing ways to measure technical debt in systems in a way you can act on. Those indicators might go into your own presentations to justify your actions.

            1. 3

              I have a small comment on tangential things here that might be helping and is easy to set up: https://lobste.rs/s/mkltez/i_am_proposing_refactor_fridays#c_ry7asx

              The most important thing is to figure out actual pain points. Which component failed regularly? Which pattern lead to hold-ups? What are the things i frequently see myself build around? Which are the things that may seem unwieldy, but actually never lead to failures? Which part of the system can people not wrap their head around?

              1. 3

                Let me try and have a go at it :-)

                The sad truth of it is, no we can’t. Because we can’t compare what we do with what would have happened if we did something differently. Because there would be other factors at play.

                • We’re having fewer outages than our competitors or other teams doing kind of the same thing? May be it’s our architecture, but may be we just hired better people or what we do is different from they do in some key aspect.
                • We used to have more outages, changed the architecture, and now there are fewer of those? Well, may be we just got more experience over this time and our old architecture would’ve served just as well.
                • We switched from that (clunky static|incoherent dynamic) language to this (expressive dynamic|reliable static) language and are happy about it? Chances are, people on this particular team just prefer this mental model over that one.

                About that last point… This is what it all comes down to, for me: if this particular team doing this particular project feels happy about architectural choices being made, then they’re good choices. We can’t measure it objectively (and it may even be impossible). And yes, it means that another team might do better, but this is irrelevant because there’s no “other team” anyone could just put on this project at no cost. The quality of architecture is the function of a team anyway.

                1. 1

                  Thanks! Capturing that subjective side seems important. The architecture is going to have as much to do with team quality-of-life as with objective capability measures, and the relation is reflexive; a happier team will not want to subvert or “work around” the architecture, and a supportive architecture will make the team happier.

                2. 2

                  I doubt there is a measurable way to know if architecture is good or bad. I expect the best you can do is qualitative measures. Often you anticipate the need for flexibility in the future in a component or subsystem, so you elaborate it and create more abstractions. Sometimes you buy flexibility you don’t really need, but now you’re paying for it every time you need to make a small change. You can have too much flexibility, which makes it hard to determine which subsystem should be responsible for something because there are many associated subsystems that are all flexible enough to take on the burden.

                  There are a lot of ways to fail at architecture but most of them aren’t measurable. Do you have too many systems? I like @apg’s “blast radius” analogy, but you have to weigh the cost of splitting your systems up and maintaining them in that fashion versus the cost of the downtime. Overarchitected systems tend to make some classes of work harder or take longer. Are you paying for complexity you’re not benefiting from? Good architecture I think, is really about aligning your costs and your complexities with the areas where you should be spending.

                  1. 2

                    On how to measure, a simple question I like to ask is:

                    When making a (typical) change or adding a (typical) feature to our software, is the amount of work required proportional to the scope of the change or feature?

                    I’m sure everyone can come up with a lot of exceptions to that rule but if typical changes requires a disproportional amount of work with regards to the scope, your architecture is bad. It may be good on certain metrics, but it fails at being viable.

                    In Clean Architecture, Uncle Bob alludes to the fact that if the “shape” of the architecture dictates the amount of work required rather than the scope of the features / changes, that is a sign of an inadequate architecture, which is pretty much the same thing.

                    1. 2

                      Your objectives are the organizational objectives… Because your software architecture is coupled to the requirements of the organization!

                      -> Good things to measure IMHO.

                      • Number of bugs reported. Months/Days/Hours/Minutes turn around on a reported issue. (Make sure to count triage as turn around!).

                      • Number of features/stories requested. Months/Days/Hours/Minutes turn around on requested features/stories. (Make sure to count triage as turn around!).

                      • Downtime. Months/Days/Hours/Minutes that a service is down. (Make sure to count triage as turn around!).

                      -> Real stories

                      Sitting down with the team, we celebrated that we’d gotten our reported exceptions remediation down from 1 week to 2 days. Graphing it, we could see that our time to fix changed soon after we started releasing nightly instead of every 2 weeks (Duh! But getting our tooling up to speed to support nightly rolling releases was a project.) Overall exception rate went down as well because issues didn’t linger for as long. (Countering our, and the CTO’s, #1 fear going into the effort that releasing more frequently would somehow make us careless).

                      A couple developers spent a couple months refactoring our big-hairy-API-module, but when we charted our velocity it turns out that the refactor didn’t actually help. There was an upwards inflection after the effort was finished, but looking closely at the kinds of of work being checked in, it was clear that we were doing more by supporting the next thing instead of waffling on the last thing that hurt us. The issues we thought would go faster because of the refactor, weren’t, whoops. Maybe next time we should check our backlog and make sure the refactor is in support of something tangible.

                      But, when we did a partial refactor on how we were handling application styling we could turn around feature requests on one part of the product in a week instead of a month, better believe we saw that!

                      When we started, we didn’t have a PM or a backlog. Having a backlog let us measure some of this stuff and know when our efforts were wasted vs. when our efforts were excellent… your best architectural change might be an organizational one.

                      By vertically integrating modules into stacks and handing whole stacks to developers, we had fewer integration meetings and we were more productive. (Like… way more, it shows on the charts!) but without the charts, who would we be to say that changing our approach on June 7th had any effect on June 14th.

                      Lots of stuff we tried didn’t matter one bit, and in retrospect it’s not because the effects weren’t quantifiable, it’s because the effects weren’t significant.

                      1. 1

                        Consider that the effectiveness of an architecture for large-team, complex projects can only be determined in retrospect in the long term. This is particularly the case for external domains - where the problem space is outside of the domains known by the IT implementers. Problems where knowledge is held by external domain experts.

                        One might say that architecture is a reflection of non-functional requirements. Both sides of that equation sadly have no standard practices or processes that can be readily categorized and compared. Do you know any?

                        We (as an industry) used to try to measure project success in terms of delivery on-time, on-budget and to specification. The agile movement has led to success just being some vague measure of “satisfaction”. The “satisfaction” that comes from meeting (likely not formally stated) non-functional requirements is simply subjective and vague until the application is used long term.