1. 44

I’m looking for a factual, objective opinion of Kubernetes. On the interwebs the only articles I find are biased (both for and against) Since the audience here is primarily technical, I thought this might be a good place to ask this question.

Basically things like how long have you been involved with any Kubernetes efforts at your company? What has that experience been like? What was the rough scale of that installation (in terms of number of containers, traffic being served, managed or self-hosted, …)? What have been the good things about it and what were the pain points? Please be as objective as possible.

Super curious about the responses!


  2. 36

    On the interwebs the only articles I find are biased

    Everyone is biased. I don’t think it’s useful to seek an “objective opinion” - opinions are by their nature subjective.

    As someone who has only used Kubernetes professionally, my thoughts basically boil down to:

    • YAML is not a very good configuration format, and is the norm, leading to a lot of unnecessary friction
    • The system itself is very complex, even in its most simple incarnations, but it is also very powerful
    • If you need the kind of scale, reliability, and automation Kubernetes provides, you need Kubernetes or something else like it - at that point it’s down to preference and what integrates well with your stack
    1. 11

      YAML is not a very good configuration format, and is the norm, leading to a lot of unnecessary friction

      And aside from YAML as a configuration format, templated YAML (for example with Helm Charts, but many methods exist) is a really hard to deal with, and IMO a strong sign that this kind of declarative configuration is not a good fit for a tool of k8s’ complexity.

      “Lets do declarative configuration so you don’t need to program and it’s easier, but then let’s programmatically generate the declarative configuration” … hmm, okay…

      I’m not a huge fan of YAML as such, but I think the misapplication of a “one size fits all” declarative configuration format is probably the biggest issue.

      1. 2

        Ehh, I think templating is fine in moderation. I don’t particularly like Helm, you can get 90% of the functionality with plain POSIX sh and heredocs. Having spent just shy of a year designing a company-wide OpenShift (RH K8s) deployment setup, I can say that Yaml is terrible format for configuration, templates are great in moderation, and jesus christ build your own images nightly. K8s is insanely complex, it’s effectively a distributed operating system, and so it has all the complexities of one plus the complexities of being distributed.

        I also agree, YAML isn’t the worst in small doses, but when literally everything is a .yaml, it’s nothing but spaghetti garbage using strings/ids to tie together different disconnected things potentially across many different files.

        1. 1

          And aside from YAML as a configuration format, templated YAML (for example with Helm Charts, but many methods exist) is a really hard to deal with, and IMO a strong sign that this kind of declarative configuration is not a good fit for a tool of k8s’ complexity.

          In my opinion templating yaml has been approached in the wrong way, or at least in an incomplete way.

          Referring to the helm charts example: as yaml (which is json) are a textual representation of object, often times what would be better is not templating the text representation, but attaching a generated subtree to an otherwise leaf node.

          Example: when including a yaml file into a chart template you have to “manually” call the indent macro to make sure the yaml parser is happy. But if we were reasoning in terms of trees of object, we could just say: this placeholder object here shall be replaced in its entirety with this other tree there (that you get by parsing that file) which will become a subtree of the initial tree.

          Indentation would then be “automatic” because the next logical step would be to re-serialize the object and feed it to the kubernetes apiserver.

        2. 4

          I agree there is no escaping subjectivity, but would add: Not everyone is equally biased or honest with regard to any particular issue. Some opinions are well-informed, some aren’t. Some people’s advice is worth hearing on some particular topic, because they have overcome at least some of their biases through diligent, open-minded work, or because their biases are similar to one’s own (yes, that can be very useful - “I’d like to know if people who, like me, for some reason like X also would recommend Y”). Some people’s advice is irrelevant on a given topic. Getting advice from a range of people on here is likely to be more valuable than getting marketing disguised as technical blogging, for example.

        3. 19

          Not directly answering your question, but I think one thing that gets people confused is that the relevant “scale” doesn’t really refer to the number of container instances / processes.

          What’s more relevant is the number of different services that need to be isolated, and the number of people working on them.

          So basically if you have 50 different services by 5 or 10 different teams, then it may make sense to use Kubernetes.

          If the whole company is operating a single service but has 10,000 instances of it, then it may not 00 even though it’s a large “scale”.

          I say this because I saw a 10 person company using Kubernetes, saying it’s because they have a lot of traffic. The amount of traffic is mostly irrelevant – it’s more the amount of configuration.

          So another way to say it is that heterogeneous workloads can benefit from an abstraction like Kubernetes, but a homogeneous workload may not. (And it’s also possible to make your workload more homogeneous to reduce the ops burden.)

          1. 6

            Still, it offers a nice framework anyway. You have loadbalancing, Network policies, replication, volumes, secret management, …

            You have a definition of how your system is supposed to be running. Not just a shell script with diagrams showing that you should manually ask one hosting company to add firewall rules or give you an additionnal 50GB of disk on VM vmbackend02.

            1. 14

              You can have this kind of definition using Ansible/Terraform/etc. — tools that manage resources but don’t introduce a whole-ass distributed cluster mechanism.

              1. 4

                I’m not sure how much you’ve used those tools, but they definitely have their edge too.

                In addition, they absolutely don’t offer the same features. Ansible and terraform won’t restart my containers when they crash, they won’t re-schedule them if the VM stops, if you want secret management like in kubernetes you’d have to work with a distributed solution anyway, …

                I’m very surprised by how much a bad rap Kubernetes is having… I’m not sure many people go beyond “it’s a yaml thingy that does too much”.

                1. 2

                  Nomad does what you describe, has no YAML and a lot less complexity.

                  1. 1

                    That isn’t true. Nomad doesn’t manage storage/volumes, it doesn’t handle secrets (vault does), I’m not sure it has security policies, and last I check the service mesh was done using consul.

                    So no, Nomad is doing 1 thing (admittedly well), but then you have to plug many other services from different places to get some features from kubernetes.

                    I agree there is a lot that you can dislike in Kubernetes, but there are also many things to dislike in ansible terraform and nomad that you guys mentioned.

                    1. 2

                      However Kubernetes also isn’t just one single piece of software, but multiple, like etcd, etc. and even when setting up Vault (which often gets done in Kubernetes as well), Consul (which often gets done in Kubernetes as well) you will end up with a much easier to reason about and way more flexible system. Also Nomad allows something like what Kubernetes calls volumes in recent versions.

                      I did not mention ansible nor terraform. I however set up both Kubernetes (managed and self hosted) and Nomad with Terraform.

                      Also I am not saying Nomad is perfect and I really don’t think it is, but acting like Kubernetes is the only or best option simply isn’t true. It’s certainly to most hyped though, with all that comes with that. I also think it might be better suited if you are setting up a cloud provider like AWS or GCP, but that’s something that might change when someone starts out offering a managed Nomad (or Hashicorp Stack) cluster.

                      I also think it’s somewhat likely that there will be new generations of these types of software (I say these types, because I wonder how containers evolve as well). I think they first represent a first generation and because of that - like with every other kind of software - this in a way is experimentation and so I wouldn’t be surprised in the next couple of years we see way better options coming up, whether that’s Kubernetes 2 or something completely different.

            2. 1

              The scaling of individual contributors is something most don’t consider when evaluating k8s, and is a huge reason to consider the pros/cons of vendor lock-in with microservices.

              Coinbase recently said this was the main driver for moving away from their monolith to aws based microservices

            3. 18

              We have been running our own self-hosted kubernetes cluster (using basic cloud building blocks - VMs, network, etc.), starting from 1.8 and have upgraded up to 1.10 (via multiple cluster updates).

              First off, don’t ever self-host kubernetes unless you have at least one full-time person for a small cluster, and preferably a team. There are so many moving parts, weird errors, missing configurations and that one error/warning/log message you don’t know exactly what means (okay, multiple things).

              From a “user” (developer) perspective, I like kubernetes if you are willing to commit to the way it does things. Services, deployments, ingress and such work nicely together, and spinning up a new service for a new pod is straightforward and easy to work with. Secrets are so-so, and you likely want to do something else (like Hashicorp Vault) unless you only have very simple apps.

              RBAC-based access is great in theory, but the documentation is poor and you end up having to cobble things together until it works the way you want. A lot of defaults are basically unsafe if you run any container with code you don’t control 100%, but such is life when running privileged docker containers. There are ways around it, but auditing and tweaking all of this to be The Right Way™️ suddenly adds a lot of overhead to your non-app development time.

              To re-iterate, if you can ignore the whole “behind the scenes” part of running kubernetes, it’s not too bad and you get a lot of nice primitives to work with. They’ll get you 90% of the way and lets you have a working setup without too much hassle, but as with everything else the last 10% takes the other 90% of the time to get it just to where you want - granular network control between pods, red/green or other “non-standard” deployment methods.

              1. 14

                “Kubernetes? You probably want Ansible.” – A wise man

                1. 3

                  Just my opinion, but I don’t agree as they implement different levels of abstraction. Kubernetes for defining a set of services that you want to run across a cluster, and then letting the scheduler make sure you have everything running with fault tolerance and auto scale. Ansible for configuring a machine with a common set of libraries and keeping it in sync - for example to install docker and run a Kubernetes node. It doesn’t handle scheduling and runtime management of services across a cluster like Kubernetes does.

                  Don’t get me wrong, I think Ansible can be fine for a lot of smaller teams and projects, especially if there is already existing expertise around managing Linux servers. Kubernetes gets a lot of hate because it’s complicated, and somewhat unlike previous solutions for managing clusters that run services. I think that’s mainly a result of people using it in situations where it’s probably overkill, and also because developers hate learning new shit when old shit works (which is okay).

                  1. 12

                    Yes, they’re different layers of abstraction, and that’s exactly the point.

                    The comment could easily be rephrased to: “Massive scale cluster orchestration? You’re probably just looking for basic automated deployments.”

                2. 14

                  I’m a consumer (hostage?) of a k8s setup at work. A few thoughts:

                  • Monitoring is a bit of a mess, and we still end up using 3rd-party error reporting anyways.
                  • Grafana/Prometheus sparks strictly less joy in me than Datadog.
                  • The tooling to go and “lay on hands” in production isn’t great, and the answer is to write shell scripts and a toolbelt (which I think is a good idea regardless) to paper over that.
                  • Whether or not it can support applications at roflscale is silly because most of that shit these days is handled by CDNs anyways (at least for our usage pattern at work).
                  • The massive tooling around Terraform, Helm, and all kinds of other stuff just…yuck. Yuck yuck yuck. When it works it works okay but like god help me if I fall into a place where I can’t get there.
                  • Then there’s the occasional leaky abstraction of cloud providers and other things, which I am happy I don’t have to handle.

                  Anyways, to me, I think that it probably scratched somebody’s itch really well, but it’s just this big gnarly thing that everybody reaches for and talks about. I think it also makes sense if you do the whole microservices thing, but I also think that that idea is really only for niche companies.

                  If I’m going to give up the ability to keep a mental model in my head of the moving parts and root causes of failure, I’d rather go all the way and just pay for boring scalable monoliths on Heroku.

                  1. 5

                    Been using it for about three years at work, mostly in the context of GKE.

                    It’s… okay. It’s super complex and it’s front-loaded with a whole ton of terminology you have to absorb before you can do anything with it, and then it’s still super complex and you have to configure five different parts of the layer cake just to get the simplest service online, not to mention probably learning at least one tool that generates config files from other config files.

                    But on the other hand, if you actually need a thing that does what it does, which is run containerized stuff on a cluster, with scaling and high availability and semi-intelligent placement and service discovery and whatever else, then you’re pretty much either using k8s or inventing your own — I don’t know of anything that sucks less. And if you invent your own, pretty much the best-case scenario is that you create something that works… but not as well as k8s, but in the process you gain enough understanding to throw your thing away and use k8s effectively, because they’ve already solved problems you didn’t know you had going in.

                    In short, the sheer meta-ness of it all breaks my brain and makes me cry and wish for the days when I could point at a gray box with blinkenlights on a rack and say “the application runs right there”, but kubernetes didn’t create that problem, the exigencies of modern software design did. Once you get used to it, it’s tolerable.

                    On a semi-related note, for a personal project of mine I’ve been making use of podman plus some systemd scripts to coordinate a bunch of interacting services. For the most part, podman is best thought of as “daemonless docker” (which means it interacts better with process managers because the container is actually a child of the command that started it, and you can actually tell when a container is up, and not just when the daemon has acknowledged a request to start it, and also means that it interacts better with security and accounting because running as an unprivileged user actually runs as an unprivileged user, instead of handing off work to a privileged daemon), but one place where it differs from docker is it makes it super easy to create shared network contexts like k8s pods… it’s even in the name :) Anyway, because it is a personal project, it runs on a single server, has no SLA, and can tolerate a certain amount of manual intervention if I ever have to move it somewhere else or recreate from scratch. So that’s why I can do without the complexity of k8s.

                    1. 5

                      And I’m super curious about viable alternative viable alternatives, now or in the future.

                      For example Micro VMs seem promising https://firecracker-microvm.github.io/

                      1. 5

                        My experience with it has been extremely negative. Have been working with it for the past few years in three companies I consulted for. In one instance, they replaced two rest APIs running on EC2 with handrolled redundancy and zero downtime automated deployments using only AWS CLI and a couple of tiny shellscripts with a kubernetes cluster. Server costs jumped 2000% (not a typo) and downtimes became an usual thing for all sorts of unknown reasons that often took hours for the “expert” proponents of kubernetes to figure out.

                        There’s probably not so many alternatives to it, but for the very narrow spectrum of problems it solves for a given company, you would probably build whatever you need more reliably and elegantly without bringing in a gigantic amount of complexity and computing power overhead.

                        Is running containers on servers such a big problem to solve? Is dynamically and programatically routing hostnames to services so difficult? Is it so difficult to put a couple of replicas of a service behind a load balancer and redeploy them one at a time with live checks so the service never goes down? Is creation of resources such as VMs, container instances, storage volumes, etc. on cloud services on build time such a difficult thing?

                        Frankly, I find the answer to all these questions to be ‘No’. Maybe you just hired code monkeys capable of imitating whatever the current hype is, rather than skilled engineers / hackers. If you are thinking of the famous Virdning’s first rule of programming then my suggestion to you is: Create an Erlang/OTP cluster instead. It is much simpler, sane, and straight forward. And has a better trackrecord.

                        Furthermore, I find kubernetes badly engineered on the surface. Grep your service name on your yaml files and you will get 10-30 hits. That alone is screams of bad design. Also, why are the yaml manifest so intricated tree structures? When did we forget that complex config structures are an horrible idea that has caused endless suffering? Why are they not simple flat configs? Why isn’t the canonical case achieve with the defaults requiring zero configuration?

                        I get that it is hip, and probably 50%+ of all developers in the world praise “from google” as the pinnacle of software, but it is just a piece of software with very questionable quality and certainly over-estimated utility.

                        It is so complicated to run, that people think they sound smart by saying: “never host it yourself!”. Yet services like GCP have embarrassingly long downtimes. In several instances bring people’s services down for hours.

                        I do like people to use it, it means easier to beat competition.

                        1. 4

                          My experience while working with Kubernetes on a big company as an external contractor/consultancy.

                          This company had pretty legacy software, but somehow they decided to run everything in Kubernetes. Probably because it was the new shiny thing, but it actually worked pretty well.

                          There were many teams working on many services that talked with each other, it was a mess, but at least everyone played by the same rules inside the cluster. Operations only needed to know how to run a Kubernetes cluster on whatever cloud provider and using some DBaaS for persistency, devs could just forget the cloud and focus on Kubernetes. Scaling pods was manual but it was good enough for their workload tbh.

                          We only had to train them on good practices for infrastructure reproducibility in case the whole cluster got destroyed and needed to be recreated.

                          It’s not for everyone. In this case it worked because there were a lot of people working there, so the initial invest on complexity paid off later.

                          With a competent infrastructure team, it’s basically providing a complete abstraction for the development teams. Productivity was good, and deployments were calm and counted on dozens per day.

                          1. 4

                            I’ve taken a bit of career detour these last ~5-6 years, finding myself at the helm of teams building out internal platforms for a large (previous gig) then medium enterprise (current gig)—so called “platform engineering” or “technical infrastructure” teams.

                            It could be Australia being a bit behind the times, but both instances have been when the organisation in question is transitioning from a “Dev throwing over the wall to on-premises Ops” to a “You build, you run (in a cloud)” way of working. So my focus has been on generally introducing self-servicable cloud tech foundations, then tailoring a paved road “platform” atop.

                            The idea being that should a product engineering team choose to ride (and it is a choice) at the platform rather than the foundational level then a considerable number of cross-cutting concerns are handled on their behalf (security, reliability, observability etc.), allowing them to spend more time on the differentiating business logic.

                            Anyway, I mention this because in both instances a container orchestration platform has been at the crux of what I’ve built, early on with Rancher 0.x-1.x and then the Kubernetes choochoo train after that, but critically, always with the teams coming in at an abstraction above. That is, I’m of the opinion that Kubernetes should be considered a platform for building platforms—otherwise there are just too many concepts for a product engineering team to grok.

                            As a concrete example, at the moment we have a singular Kube CRD that hides multiple clusters per environment and distills the myriad of Kube/Istio/ArgoCD/Prom/Vault/OPA (blah blah…) resources down to a small easy-to-reason-about config (averaging about ~30 lines of YAML). Apart from the low cognitive load for teams, it also allows us to manage underlying tech upgrades or introduce new features at our pace.

                            I realise this might sound over-engineered, turtles all the way down, or science project like… so figure it is worth mentioning a few things:

                            • Both these orgs were/are operating (micro-)service ecosystems > 150 services, acutely, with private on-premises connectivity requirements as well. I feel that’s when this sort of platform atop a cloud starts to shine. I don’t discourage our teams from reaching for cloud FaaS or the likes, but the inter-connectivity with the rest of the ecosystem does start to break down a little. Especially when met with our security/risk/compliance posture (bank). A common pattern I see is a brazen team going foundational with gusto then jumping up to the paved road after learning some things the hard way. YMMV but I would go as far as saying Kube is not suited for a relatively homogeneous, single tenant/team/startup/static type environment.

                            • As someone else mentioned in this thread, definitely don’t operate Kube yourself if you can avoid it. Outsource as much of this stuff to SaaS as possible.

                            • It is imperative for these platform engineering teams to act like internal product teams that are both in completely tune with the user needs, and also comfortable with trying to automate/outsource their way out of a job. It doesn’t personally fill me with joy to be building a platform abstraction on top of a cloud but unless you can wire those constituent cloud services together in a way that suits your org, it is where we are.

                            (This reply getting on a bit. Would be glad to talk more about this stuff in DM if you like.)

                            1. 4

                              I approached Kubernetes as a complete skeptic but having spent 12 months with it I am a convert. My experience is perhaps a little different since I’m writing Kubernetes operators which use many of the core APIs to control the behaviour of other apps running in the cluster.

                              What Kubernetes has done well is to provide a well thought out collection of abstractions for running modern multi-component applications: pods, endpoints, services, volumes, configmaps, secrets. The complexity of use comes from the myriad of configurations that are needed to interface existing systems into this abstraction.

                              But there are some beautiful parts to that complexity. If the application in your pod needs environment variables then you can set them directly or map them transparently from a secret or configmap without changing your application. Similarly you can map a configmap to a volume so that it looks like a directory to your app even though the configuration files it contains are managed externally.

                              There’s a lot to learn but you don’t need to learn it all at once: just the core concepts and how they interface with your application’s requirements.

                              1. 3

                                I implemented and have been using Kubernetes at work since 1.8. First installed from scratch(!) with no auth/etc (all within private network). We’re now using Rancher to handle provisioning for us which comes with all the bells and whistles built-in. We run/manage ~6 clusters currently. Lessons learnt:

                                • Build a toy cluster from scratch using vagrant or similar as a learning experience. It has allowed me to really understand the “low level” components of K8s. At its core, it’s really not that complex (or, at least, it wasn’t in 1.8!).
                                • YAML is ok for simple configuration but completely useless for anything more complext. Expect to bring workarounds or other horrible hacks (helm chart syntax).
                                • The ability to bring up rapid staging environments really is amazing and has been a huge win for us. Moving this to prod takes seconds too as the docker images already exist.
                                1. 3

                                  I was doing some admin/DevSecOps for a client last year. It was a hoot, so I thought I would explore Kubernetes as something that would be even more fun and allow me to help other folks.

                                  For me, it looked as if the capabilities being sold vastly outweighed what the average client would need. If you’re operating at the kind of scale to need that, good for you. You’ll have a guy who is the expert on all things Kubernetes and now you’ve got architecture coordination issues (which is good).

                                  I was much more concerned about devs who loved the idea and were going to use it no matter what their problem. Professionally, it looked to me like yet another example of new tech coming out, far too many people clamoring on the bandwagon, and then a bunch of messes being made. At this point in my career I don’t have the patience to play “consultant cleanup”, although I understand there’s a lot of money to be made doing it. So I bailed and did not continue learning.

                                  If I get a client/situation that truly demands it, though? Sure. Kubernetes, Docker, AWS, Lambda, and CloudFormation would be my go-to. I’d modify from there depending on the situation.

                                  1. 2

                                    I’ve used it a lot, self-hosted on AWS and on AKS and one of my colleagues was pretty literate about Kubernetes, so we asked a lot of things to him :).

                                    I think it seems difficult because Kubernetes is, in some way, a different operating system, with different abstractions. Some old tricks can be applied but for most strange cases, is way better to create an operator and speak native Kubernetes. My main pain points with Kubernetes (apart from YAML, which we templatized with Jinja) were not on Kubernetes itself but on how our microservices architecture was designed, making it more difficult to let Kubernetes freely decide lots of things. Your services should be as independent as possible. We also hosted our database in Kubernetes (BIG NO), but now we moved it to a cloud service because managing state in Kubernetes is also a pain point.

                                    Deployments were quite easy and fast. Also, operations were fine. There’s a lot of tooling, each version getting better for monitoring and logs.

                                    1. 2

                                      Mixed. If you want me to run it on Kubernetes, I can do it, be it EKS, bare metal, Rancher, you name it, I can happily do it. Is it a solution to the problems your architect thought when they decided on it? Never.

                                      1. 2

                                        I have a few small projects that I host on a cheap k8 cluster with digital ocean using kubernetes to deploy. It’s not easy to get to grips with but is incredibly powerful and once setup reduces friction of deploying to near zero.

                                        1. 2

                                          It’s not perfect, but it is the best alternative I used so far.

                                          We use a GitOps model, our YAML (which is actually dhall) configuration is stored in a git repository, any changes in that repository trigger a pipeline that deploys the new configuration to our clusters (dev, staging, prod). It’s a bit more complicated than this, because there are helm charts involved and there is the CI/CD pipelines for each service running, but that is a summary of what happens.

                                          This is not perfect at all, it has some problems, but sure beats running ansible on hundreds of EC2 machines and then managing monitoring, load balancing, and all that separately.

                                          So now I am mostly curious what other people use instead of kubernetes, because as i said, it’s not perfect and I am always ready to try something better.

                                          1. 2
                                            Is it just me or is this a contradiction in terms?

                                            o·pin·ion /əˈpinyən/ Learn to pronounce noun noun: opinion; plural noun: opinions

                                            a view or judgment formed about something, not necessarily based on fact or knowledge.
                                            1. 1

                                              No experience is ever 100% factual and objective ;)

                                              My only problem with k8s is that it seems overly complex and small teams might fall into the same trap as with OpenShift and other systems. You kind of need a dedicated team of people to run it on your own, or use a hosted solution (and from what I hear they’re often also not yet 100% stable).

                                              I didn’t use it too extensively, but it’s ok. If it actually solves your problem…

                                              1. 1

                                                I see the enormous benefits, but I also feel the very big friction due to the complexity of the thing as a whole.

                                                Your kubernetes cluster is running on Linux but it really feels like a different operating environment. What you can and cannot and how is very different from “regular” Linux and hard to replicate on developers computers for the sake of “agility” (minikube can only go so far).

                                                Still worth it, though. Imho the worthiness is not really about technical stuff and/or containers etc but is more about fixing in stone the terminology and procedures for both developers and sysadmins and being able to offload part of the operations load to the development team (you built it, you run it) in a safe manner via containers/roles/namespaces/cgroups/logging&auditing/LDAP integration (etc).