1. 17

Inspired by this post on Stack Exchange. I’m posting it as a text because I’m much more interested in how other lobsters answer the question rather than holding up what the SE folks said.

In The Mythical Man Month Brooks shares the idea of a “surgical team”, where only a few people write the core code and everybody else does some form of support. The system seems appealing to me, but it hasn’t been adapted at all in the wild. I’m curious as to why, and what’s wrong with it from people’s experiences.

SE had a few answers I don’t find that convincing:

  • It assumes only one person has a computer. When everybody has a computer, the benefits go away. It seems to me the core idea is not so much “we don’t have enough computers” as much as “specialization is useful”. We might not need a program clerk anymore, but a VCS-expert would still be valuable.

  • In Agile, everybody should have code ownership over everything. In practice I don’t see that often; huge chunks of the system still have a bus factor of 1 or 2. Arguably, having a specialized documentation role might even increase the bus factor by reducing the amount of oral tradition.

  • How do we pick the surgeon? Everybody thinks they’re the top programmer. At least in the web dev world, there’s a shortage of senior programmers and a glut of juniors. I can see surgeon-style, if done right, working out with one senior as the surgeon and 2-3 juniors in lighter, specialist roles. This might also help with the catch-22 of “nobody wants juniors, but then juniors don’t become seniors”.

  • It’s more expensive. Probably :(

Those are my initial thoughts, and I’m interested in hearing yours.

  1.  

  2. 16

    It’s expensive. And everybody wants to be the hero. And companies have internalized this attitude and even advertise it as a selling point. “Here at cube farm, everybody writes code and tests and documentation and full stacks.”

    Ultimately, we’ve made coder a status position, and nobody wants to do nonstatus work.

    The coding clerk is a good position to analyze. Only at the very largest companies (Facebook, google) do you see this position return, as dedicated tools teams building mercurial extensions, etc. The work only gets done when the scope of the role is so vast it becomes a coding position in its own right and status is restored.

    1. 1

      I wonder if the concept of a “surgical team” is most alive today in the BSD operating systems. Aren’t there usually smallish self-forming teams that generally handle certain parts of the system (drivers, kernel, networking, etc)? Maybe LibreSSL would also be a reasonable example of a “surgical team”.

      1. 1

        I think that’s specialization of dev, but that happens in every company, too. The Windows team isn’t 1000 engineers all working on everything. The surgical team really meant the programmer programmed, and everything else from reading email to answering the phone would be handled by somebody else.

    2. 5

      It is important to note when and in what context the book was written (1970s operating systems). Rolling out a fix to a bug for customers was probably glacial.

      Things have drastically changed in how we are expected to deliver software. The ST approach would still be valuable in shipping things like embedded systems today. However when continuous delivery is in play, and teams are constantly iterating over new requests and features, it would be an organizational mismatch.

      1. 3

        I can only imagine it’s because in most places, the number one reason to hire a new developer is because you want to increase your bandwidth in terms of LoC. The prospect of hiring a developer not to write as much of the code you need the most right now runs counter to that.

        1. 3

          Maybe I’m misunderstanding this model, but it sounds a lot like what Jessica Kerr discuses in “Hyperproductive development”.

          I did recently find myself in a role not entirely unlike the surgeon role. While the surgeon model was great for the initial bringup of a new project, it didn’t scale nearly as well once customers started to adopt and we needed to screen incoming questions and bug reports. There’s just too much triage required and issues started piling up until the rest of the team learned to debug things to a root cause. But being able to do that debugging requires a level of understanding and intimacy with the code that seems to run counter to this model.

          I also found this model to be highly conducive to causing burnout.

          1. 1

            This is probably the best reason I’ve heard so far.

            1. 1

              If your ‘assistants’ weren’t finding/explaining root causes, or triage incoming work, how were they assisting you?

              1. 1

                Handling small components or bits of leafy development that were less core to the system. Work that was well separable from the main development.

            2. 2

              The expanded access to PC’s and commoditization of developers probably caused styles like that to go away. I think the team roles part might still exist in startups or FOSS where founders or core team make biggest decisions on important code. Then, as users or customers come in, more developers are hired for extensions, bug fixes, and support.

              1. 1

                It’s (perceived to be) more risky by the people making the decisions.

                Firstly, because they feel that losing the ‘surgeon’ could sink the entire project (I’m not sure how that risk stacks up against the relative cost - imo it’s much cheaper to build software this way). Secondly, because they are granting the ‘surgeon’ tremendous organizational power. Nobody climbing the corporate ladder is going to promote a subordinate to hold more power than they do.

                Personally, I can think of two or three people offhand I’d be thrilled to support this way.

                1. 1

                  This model isn’t scalable. Even a moderate size tech startup has hundreds of thousands of lines of code. The startup I work at has ~400 people and probably 2m LOC. This model would be impossible to do for almost any moderately sized technology shop.

                  It made sense when your average codebase was measured in thousands, or even maybe tens of thousands. But once gets above that, it’s impossible for any given person to understand the whole system, much less maintain it along with just a small handful of other people.

                  1. 2

                    Plenty of hospitals have more than 400 employees.

                    Presumably within that group there are (many) teams of 5-20 people working on neatly separated parts of the problem, no?

                    1. 1

                      I think the idea was developed for teams who build OS’s, though, so that suggests it’s supposed to be scalable in practice.