1. 20

  2. 13

    Disclaimer: I interact with people who work on k8s/GKE at work but I’m not on a core k8s team.

    I work on an Anthos project at Google, and the majority of my project is poking around at k8s.

    K8s is a big thing. I don’t actually think there’s anyone in the world that knows everything about how it all works. It’s easier to think about k8s as some sort of cloud operating system upon which things that are useful to the end customer are layered.

    To decide if you need k8s and how much of it, I frame it as “you need it when you know you do”. I’ll use GCP as an example.

    • Do you know you need to do anything other than a web site? If you don’t, you probably want App Engine.
    • Do you know you need to do anything more complicated than individual containers? If you don’t, you probably want Cloud Run.
    • Do you know you need to do anything more than have those containers running? If you don’t, get a managed k8s cluster and just write a k8s deployment.
    • …and now you’re down to knowing if you need Istio, secret management etc. etc.

    You eat the k8s elephant one bite at a time. K8s is surprisingly good at letting you get something going with very little interaction, and then there is a pretty steady learning curve to getting more and more out of it.

    The cargo cult-y aspect of k8s which this article is railing against is the same sort of thing we saw when Docker caught fire. “Containerize everything!” “Why, boss?” “Because the magazine I read at the airport says so so do it!”. If you don’t know if you need some feature of k8s, you almost certainly don’t and there is a simpler solution. If you are feeling a pain point strongly and k8s says it will fix it, it probably will.

    The cost is that k8s is this operating system, and operating systems are complicated bits of software. We rely on others to provide us with window managers and GUIs to help us visualize our Linux processes, just like we rely on things like Istio and Grafana for k8s. You wouldn’t just go and download the Linux kernel and slap it on a machine and go, and you can’t do the same for k8s. Unless you have very strong reasons to do so, you should use a managed k8s solution that can get the right defaults and give you some useful tooling out of the box. If you can’t use a managed solution, you should pay some sort of consultant to do it for you, it’s not something you can just learn by reading docs. It requires experience. There is a reason Google Cloud sells this very product. GCP doesn’t sell things that it doesn’t think people will buy :)

    To the article itself: I feel like the author is trying to prove something to themselves than to others. “Here’s why we were right to not use k8s”. And perhaps this is so. But I think it starts from a place of seeing the k8s elephant and going “Wow that’s one big elephant forget it” (which is pretty reasonable TBH) but not really going through the “you know when you do” game of incremental complexity gains.

    The only thing that I think is truly bad about k8s is actually setting it up in a correct way. Kubernetes The Hard Way still gives me nightmares. But the author specifically discounted using a managed solution, so perhaps given that constraint what they did I would have done too.

    1. 2

      I don’t actually think there’s anyone in the world that knows everything about how it all works.

      And this is exactly the reason I am hesitant to try it.

      1. 1

        Are you similarly hesitant to try using a smartphone? ;-p

        1. 2

          Can my phone cause a customer facing outage impacting 1000+ customers?

          1. 1

            Depends what’s installed on it & what your role is ;-p

            1. 1

              I don’t think so. I have three phones on my desk. If my phone goes bust I can be up and operational in less than 15 minutes on a different one. Moving a large scale infrastructure off from k8s to something else is a several day process unless you specifically prepared for it in advance. It is not sufficient that you can quickly provision the same capacity elsewhere. Based on my experience customers think about k8s as full fledged, stable all the way hashed out product where everybody knows everything to the last screw and nothing could go wrong. When it explodes in their face they run around screaming like a little child. I was part of teams where such customers reached out to us and begged for bringing back their service online, which ironically nobody could because we were running into an issue with k8s that nobody could debug, let alone fix. We are talking about SREs with 10+ years experience before you pull out the “must be some junior” card. Since these incidents I talk out every single customer to use k8s and just consume cloud native services, use virtualisation, packaging, auto-scaling groups which are all very well understood, documented, battle tested, monitored, debug-able, recoverable solutions to the problems k8s claiming to solve. Since the k8s hype train is 100% driven by people who has absolutely no clue about k8s most of these project are going to hit a wall hard sooner or later. It is funny how 50% of companies that I am familiar with using k8s already experienced some outages purely because of it. And there are some more:


              At this stage I think it would be beneficial to create a resistk8s.net website where everybody who has some skin in the game could tell their stories to inform decision-makers about the reality of operating k8s. A fragile, anti-Unix, over-complicated, half-assed piece of useless engineering that aims to solve non-problems so that you are locked in to this k8s zoo indefinitely. I am really glad that I am not alone with these views and I am hoping for more articles like the Coinbase one.

              1. 2

                That wasn’t the direction I was going with my last comment but I hadn’t seen that list of failure stories before so thank you for the link!

    2. 7

      To avoid using k8s, they had to develop their own deployment UI and their own AWS orchestration tool.

      While I agree with the overall article, it seems like they are suffering from the “Not invented here” syndrome. It might be that their tool is easier to maintain than a k8s cluster, but I’m sure they still had to invest a significant amount of manpower to develop their system.

      1. 13

        Remember they have above-average security needs, and you will have a much better understanding of your own in-house system’s security aspects than any standard solution.

      2. 2

        This is pretty similar to how we did things at Simple (no idea how that stack runs now). Here is a old blog about it: https://www.simple.com/blog/infrastructure-as-code

        We ended up in the same place with Envoy, Route53 for service discovery, etc. When I left things were possibly headed towards k8s.

        1. 1

          I am curious how deep k8s’ hype train goes. I have successfully avoided it so long. Cloud native services are much cleaner and less error prone. Simpler tools like Serverless.com gives a nicer abstraction if you do not want to use a cloud vendor specific tool.

          1. 13

            I find it ironic to describe k8s as “hype” and then prone serverless…

            1. 3

              Serverless wants to solve one problem and one problem only. How many problems does k8s try to solve?