1. 7
  1. 7

    I agree with some of this post, and some parts not so much. But this one always irritates me,:

    Do not make production changes on Fridays

    If you can’t deploy on a Friday, you need to fix your deployment strategy. By removing Friday from when you can deploy, you’re wasting 1/5 of your available days.

    Note: deploy != Release. Use flags, canaries etc.

    1. 10

      I strongly disagree with you. Incidents are fairly strongly correlated with changes. Relying on on-call engineers who have less context than whoever deployed, and extending the time taken to bring people in, extends the time it takes to resolve an incident.

      You can absolutely develop on Fridays, if you insist on working on Fridays. But maybe it’s a better time to chill out or have meetings, as everyone’s decision making will be worse for being tired from the week.

      1. 3

        Sure, but incidents can be reduced with smaller changes deployed (and released separately, where possible.)

        Our general guidance is “be thoughtful/don’t be rude” e.g. don’t click “merge” and then shut your laptop lid.

        1. 2

          Sounds a lot like “don’t make mistakes”. Which is harder on a Friday, not least due to tiredness, but also because you might have plans, or you might have managers who are desperate to report in meetings that things got deployed.

          But really, what is this extra day of deploying (or releasing) buying you?

          1. 2

            Less stress on Monday! If lots of developers are waiting for Monday to deploy, that makes it all the more risky. It also ends up encouraging people to batch their changes (“I have 3 things done, but as it’s Friday I’ll put them all out together on Monday”), and again that increases risk.

            If deployment on Friday is a risk, what makes you think Thursday is so much safer?

            I wish I could source this quote, because it really sums up deployment for me:

            Deployments should be boring, they are the heartbeat of your organisation

            1. 2

              It also ends up encouraging people to batch their changes (“I have 3 things done, but as it’s Friday I’ll put them all out together on Monday”), and again that increases risk.

              Really? I’d consider this less risky because the changes live together in test.

              If deployment on Friday is a risk, what makes you think Thursday is so much safer?

              1. Everyone is one day fresher;
              2. The following day will be a normal work day.

              As to saving Monday stress - I’d rather increase the risk of incidents on Monday and Tuesday to reduce the chances of incidents at the weekend.

              1. 2

                Really? I’d consider this less risky because the changes live together in test.

                With a few caveats, bundling changes together increases risk due to how those changes could intersect; if they are completely isolated from each other, then sure, the risk is about the same as deploying separately, but if they are more complex changes, or touch similar areas (or related, e.g. a library update and a UI change which somewhere depends on said library, transiently), then the risk compounds - its multiplicative rather than additive.

                The other issue with deploying multiple changes together is that if something goes wrong, you now have a bigger change set to investigate, which can slow down your time to repair, or if you have to roll back you are now undoing unrelated changes too.


                I do definitely want to avoid weekends working; having been in this particular project were we have ~50 developers in one repo deploying 10-30 times per day, the number of times we’ve had an outage which has needed someone to work a weekend has been maybe 3 times? I think one was just switching off the flag and coming back to it on Monday too. The site in question is no Facebook for traffic levels but is significant.

                That said, I don’t buy your freshness argument in the slightest. If you people are not fresh enough to deploy on Friday (responsibly), then they are not fresh enough to debug something they deployed on Thursday either, thus you shouldn’t deploy on Thursday too.

                1. 2

                  It really seems to me like your mindset is along the lines of “all change is risky, we should minimize that risk by minimizing the number of times we make changes”. I think this can be a reasonable strategy.

                  I would argue, similarly to @pondidum, that batching changes up like that doesn’t minimize risk, it at best concentrates when the new problems may arise.

                  Speaking just from my own experience, having a big batch deployment process where you bring in a bunch of changes (frequently from different people) and deploy them all at once, compared to anybody deploying whenever they’d like ends up causing more problems. The biggest reason for this is just that the person doing the deploy frequently doesn’t have the context on the changes they’re herding out (speaking for myself, it may just be one weekend later, but remembering the nitty gritty details of what I did on Friday and how it may interact with production when it gets there is really hard on Monday, now add in changes made by other people that you may not have dealt with at all!)

                  To me, if the goal is minimizing off-hours work, then you want small, bite-sized change sets, deployed by the person who actually wrote that code, so they can debug it immediately if any problems show up. In that case, the minimizing risk rule shouldn’t be “don’t deploy on Friday” and instead should be “don’t deploy in the hour before you sign off for the day”.

                  And I know this is a bit of an appeal to authority, but I am saying this as someone who’s been working as a devops/infrastructure person on a highly complex platform for six years where we do let developers hit the deploy button whenever, and the only after hours issues we’ve had to deal with have been amazon breaking and not our apps breaking.

                  1. 2

                    Seems like your assumption is that there is no environment where you can stage all your changes and integration test them. My assumption is that you should have that to the extent possible. That way all your changes can play together and you can see how they interact. Certainly if you’re releasing multiple times a day, you lose the ability to immediately know what broke production.

        2. 3

          I personally would change it from “Do not make production changes on Fridays” to “Do not make risky production changes late in the work day”.

          1. 2

            I’d agree to that! Fits well with my general guidance on deploying: “be thoughtful/don’t be rude” e.g. don’t click “merge” and then shut your laptop lid.

          2. 3

            While I broadly agree that you should be able to deploy at any time, I think it comes down to how well managed your risks are. For one, at my work it’s fairly common to keep half an eye on a service for a bit (eg: five minutes to half an hour or) after you’ve deployed, so leaving work right after a deploy would be considered bad form. That said, the likelihood of provoking a failure will depend on throughput, amongst other things.

            On top of that, deploying a new version your application itself is (hopefully) a less invasive change (infrastructurally) than deploying new supporting infrastructure. So you might say, you don’t want to do anything with terraform on Friday afternoons, as if if that goes wrong, you want to give yourself a decent window for recovery.

            1. 2

              All other comments are mentioning that Friday is bad is because engineers are tired and there is a higher chance of errors. Another way to look at why deploys on Fridays are bad is because of people who are using that code. If deployed instances bring any change (and they do), that will disrupt someones workflow, on the very same Friday or during an important weekend, risking to waste their time.

            2. 3

              If you need to build an architecture which involves microservices, I am sure that your cloud provider has a solution that fits better than Kubernetes. E.g: ECS for AWS. Kubernetes is a fantastic toolkit, but only shines when all that it has to offer, gets used.

              I wish this was workable. Cloud engineering and devops jobs are being consumed by the Kubernetes monolith because it so just-works, and it just works so well enough that everyone deploys it so they can put it on their CV/resume and get the next job which necessitates k8s experience as a requirement. It’s like a cancer, but it doesn’t kill its host, it just reduces your effectiveness in suboptimal cases.

              1. 2

                In light of the author’s full advice, I think that a better takeaway is that AWS provides a poor Kubernetes experience, not that Kubernetes is inherently deficient. The author says that they are “sure” that public clouds have better options than Kubernetes, but this only tells me how few clouds the author has used.

                1. 2

                  I wonder if we have any old timers who remember the introduction of posix (I don’t). Posixly correct behavior generally seems worse than extended (eg gnu) behavior but it is also portable.

                  I suspect that the deal with kubernetes is similar: it’s obviously better than trying to roll your own container orchestration on top of pretty much anything that isn’t a dedicated container scheduler but kubernetes isn’t especially better than any of the proprietary hosted solutions AND the arguably superior also rans (mesos, nomad) never had the concerted marketing and community building thrown behind them.

                  Eventually something will come along that doesn’t play well with kube but is seen as a must have and a new hype cycle will begin around a replacement that works in that context.

                2. 3

                  Certify yourself with official courses.

                  Is this a common opinion? Curious to hear about certifications devs took and didn’t later think their time would’ve been better spent otherwise.

                  1. 2

                    The credentials line lost me. Getting the certifications is so narrowing, where it’s much more interesting to see what you have done. Or is this mindset overly limiting?

                    1. 1

                      I haven’t been certified in anything but I currently work in a company that does encourage certain certifications. From an engineer’s perspective I don’t view these as an investment in great knowledge but rather an investment in opening up more job opportunities and learning certain jargon and vocabulary.

                      My hunch regarding certifications is that it impresses non-technical people.