1. 61

Original paper here: https://www.microsoft.com/en-us/research/publication/the-influence-of-organizational-structure-on-software-quality-an-empirical-case-study/

  1. 15

    That definitely tracks with my experience. And it makes me think of SQLite, widely regarded as extremely high quality code, and developed by a team with minimal organizational complexity.

    Personal engagement: SQLite’s developers know each other by name and work together daily on the project.

    Fossil Versus Git § 2.5.1 Development Organization

    95% of the code in SQLite comes from just four programmers, and 64% of it is from the lead developer alone.

    Fossil Versus Git § 2.5.2 Scale

    1. 7

      I’ve spent some time trying to find the magic words or slogan to help companies with this problem. I think they see it, they just are unable to deal with it.

      Here’s the best I’ve got so far:

      Good companies approach a problem, they thrash, they organize, they codify, and then they automate. Bad companies approach a problem, thrash a bit, then assign it to a person.

      It’s a good way to end up with a lot of people, all personally responsible for “impossible” problems.

      1. 3

        Fear is a good form of persuasion.. I wonder if there’s a way to frame it so that everyone that can cause “social” problems (many levels to decision makers, lots of ex-engineers on a project, lots of engineers in general, …) somehow are afraid to do so.

        1. 9

          Much of unneeded organizational complexity is driven by fear/risk-mitigation motivations. Things are just too broadly-applied and there’s no trade-offs being discussed. At the policy level, you easily grok if we made a list, process, or committee for some risk, the risk goes away. What you don’t grok is how that impacts everybody else. And to your point, the tendency is for nobody to tell you. A 100 people all adding 5 minutes to a dev’s time each week might all have great reasons to do so and the risks are dire. But these things add up. Who wants to go fight a battle to save 5 minutes? Even if you win you still lose.

      2. 3

        Thanks for linking the original! Thoughts after reading it (from phone, so terse):

        • for org complexity model, unit of observation is module, but unit of treatment is the sub-org. So N = #suborgs, not #modules, which makes study sample size a lot smaller. Observing more modules by same sub-org is repeated measures, not independent observations.
        • These combos of org complexity variables are highly correlated with the sub-orgs. Therefore cannot distinguish between ‘this combo of vars caused it’ and ‘other unobserved vars experienced by this sub-org caused it’
        • why dichotomize ‘# failures in module’ into ‘error-prone/not error-prone?’ That’s throwing away information, it’s been
        • logistic model predicts a probability, why dichotomize that into a binary prediction? That’s throwing away info even worse, now you don’t know if&when your model was confident or not.

        more later.

        1. 3

          Continued, tho still on phone:

          • what worries me about these measures of org complexity is that they are easily measured. What if the true reasons for bugs have a common cause with org complexity, but are not as easily measured; while org complexity has no effect? Then we’d be left adjusting something that is merely correlated with our bugs, but not its cause. Just because the model predicts well, does not mean the relationship is causal.

          • precision, as a metric, is correct only for given threshold and prevalence (and ‘model expertise’, most easily visualised as its ROC curve). It will automatically improve (go up) if you set the threshold higher, and also go up when bugs become more common, because in both cases the number (ratio) of false alarms goes down. Better to report the ROC curve (which is prevalence-independent) + the precision for the observed prevalence and chosen threshold. See Luke Oakden-Rayner’s blog on this. Now I can’t distinguish model quality from threshold choice from prevalence of error-proneness among modules.

          • they mention a ‘lower confidence bound’ below which they call the module ‘not error-prone’, but don’t mention how to compute it.

          I do like the study because it could be on to something, so we should deffo explore org measures when thinking about what kind of experiments we should now try to run / vars to manipulate to test its findings, but it really needs replication across a lot of different orgs to make sure its success is not due to some unknown vars that in Microsoft happen to be correlated with the org chart.

        2. 2

          Thinking about my company when reading this paper… Our “user platform (dashboard/api)” is a monolith that has become such a blocker for new products that they’ve started accepting code from developers on other teams. I think the organizational complexity / structure led to the creation of this monolith. Humongous team of onshore/offshore devs created it.

            1. 3

              Works fine now.

              1. 1

                Maybe it’s blocked by geolocation? Your profile says you’re from Russia :/

                1. 0

                  And so? Why Microsoft should block Russians?

                  1. 1

                    Not saying they should, just that they may have decided to.

                    1. 1

                      Still a weird first assumption. Also the post prior to yours was from a russian saying it works now.

                  1. 1

                    That’s weird. Works for me, both that website and the linked PDF.

                    1. 1

                      Works here too.