1. 35
  1. 23

    Has a bit too many “us vs them” vibes for someone who has happily switched sides multiple times.

    From the top of my head I’d file most of these under “prod or staging behaves differently than the step before and how would the developer know”, “people are told to deploy to staging but it has no documentation whatsoever as what user this will run, which directories they may write to, sometimes not even what version of software/dependencies are deployed”.

    Maybe most of the time both parties are to blame by not talking to each other earlier?

    1. 4

      Us vs. them being Dev vs. Ops? I think that’s a bit of an oversimplification of what the author is saying. They end the article with “it’s not their fault.”

      The argument is that because of capitalistic [my word] pressure to constantly accelerate feature lifecycles, and partly because we all bought into this myth of DevOps as an abstraction between teams, there’s no longer an incentive to talk to each other earlier. I don’t think they’re arguing that Devs don’t know what they’re doing anymore because they’re all noobs, they’re saying that Devs don’t know what to do because we’ve all bought into this idea (and I’m speaking for myself as someone who has also jumped around) that if it works in Docker on my machine it should just work in production. Whose fault is it when it doesn’t? Devs’ incentive is to blame ops, and vice versa.

      In other words, the new organizational structure of “DevOps” increases pressure and incentivizes adversarial relationships between teams/roles. That they start by describing an org structure that was boring, was not rushed, and actually worked, underscores that they’re critiquing an organizational problem, not a Devs vs Ops one, and I think they do a pretty good job of that.

    2. 14

      Painting the picture that everything was rosey back in the day of QA teams and playbooks is rich.

      Outage in production, oh but no one documented “error code x” in any playbook. No one knows how to fix it and devs are locked out of production so it’s not possible to do any in depth troubleshooting.

      Have every patch release take weeks due to endless QA protocols that both failed to ensure quality at the same time as taking forever to get through.

      Today certainly has a lot of other problems, but it absolutely wasn’t better before.

      1. 5

        couldn’t agree more. this still feel like “us vs them” problem of the traditional org chart (dev team + ops team).

      2. 13

        I think the key here is that separate ops and development teams are the anti-pattern. If you have organizational throw it over the wall, you have technical throw it over the wall whether or not you pretend it’s devops.

        1. 7

          The problem is not that devops exists, it’s that organizations aren’t actually collocating the roles such that they have shared goals.

          1. 2

            Well yes

        2. 12

          I used to run the CI and build environments for a software product, and every time a dev came up to me and said “It builds in my environment” as a reason for breaking the build, I would just calmly say, “well, you should fix it so it builds on the build servers, otherwise I’ll just take your workstation and add it to the build environment”

          That would usually give them a little perspective on the problem… I never did have to follow through on that threat.

          1. 4

            Alternative: You may have some non-obvious discrepancy between the dev and CI environments. The problem could range from silly dev issue (relying on hardcoded paths) to silly ops issue (CI run out of disk space and fails builds randomly) all the way through the spectrum in between. But there may be something to fix together and the whole exchange sounds bad (fix it for me, no you fix it for me).

            There’s some learning to do on both sides…

            1. 5

              Alternative: The build environment is documented, monitored, and well-known. The developer’s workstation is … not. I’m not saying that I wouldn’t work through the issue with them, but “It works on my machine” is the developer saying “you fix it for me”.

              1. 1

                Yep. In my last job the tests ran perfectly fine on dev machines 90% of the time and only failed on CI. But they ran on every developer’s machine - it was the CI.

                1. 3

                  In my last job the tests ran perfectly fine on dev machines 90% of the time

                  We’re those 10% because of you actually breaking the tests functionally or because you broke something else?

                  To me, CI should be similar to the environment in production and if it fails it mean that there’s high chance that it wouldn’t work in prod. We can take the bet, but we often agree with devs that if it fails, nobody is willing to check if it would fail on prod or not (because you then have to take care of the outage).

                  1. 1

                    Those 10% were “developer made a mistake”. But CI is useless if 4 out 5 errors are not genuine and would not occur on developers’ machines or in prod. JUST on the CI, but I could’ve phrased that better.

            2. 3

              Long time since an article has made me nod all the way down :)

              1. 3

                devops !== removing QA team. that’s just wrong

                1. 1

                  I would have been politely but sternly told to do more troubleshooting.

                  “RTFM” :)

                  There are some things that I view differently, but lack of effort is really what bothers me the most. And I’m on the “other side”.