1. 8

    When training junior Linux users you’ll often have to dig into that when someone cannot “cd ” because they get a permission denied. They inevitably tend to try “sudo cd ”. Of course it does not work, for the reason mentioned in the article, e.g. “cd is not an executable”. It’s always a fun moment in my experience since it leads to the very same explanation exposed in the article.

    1. 20

      it might be useful to tap into the wider Kubernetes ecosystem, e.g. operators - if you want to run PostgreSQL, Redis, Cassandra, ElasticSearch, Kafka with limited human resources, it might be easier to do so via Kubernetes Operators (whether or not such operational complexity, even abstracted, is worth it with a limited team, is an entirely different discussion)

      From personal experience, if you think this is why you want to use Kubernetes, think again. You have to deal with topics of the actual software you want to run, should they arise, you have to deal with problems that Kubernetes might throw at you. And now you add a whole new thing that touches both and is its own beast. And the only ones being able to really deal with it is people will deep knowledge in all three of these.

      Also the idea of operators very much feels like workarounds for workarounds. Building an abstraction for an abstraction that abstracts the management of that abstraction hardly feels like good design. Even if we say they are just better abstractions.

      What I’ve seen at multiple companies now is that eventually one ends up with sort of an operator-stack that is one big customized setup for that one specific company. Speaking about snowflakes and pets…

      In other words, you should be very sure about this being the right approach if you build your production services on top of this.

      Not to say Nomad is without flaws, but it’s easier to decide what you want or need.

      And with it being simpler, but having similar concepts, even if you end up switching over to Kubernetes the “lost” work time will for most situations be lower, than the other way round.

      This is all just subjective and personal experience and of course situation changes. Both projects are developing rather quickly, so mentioned things might do as well.

      In short: Don’t just choose Kubernetes, because there is operators.

      The Kubernetes ecosystem is massive. There are entire companies, tools and whole niches being built around it ( ArgoCD, Rook, Istio, etc. etc. etc.). In some cases tools exist only because Kubernetes is itself so complex - Helm, Kustomize, there are a bunch of web UIs and IDEs ( Octant, Kubevious, Lens, etc.), specialised tooling to get an overview into the state and security of your Kubernetes cluster ( Sonobuoy, kube-hunter, kube-bench, armosec, pixie). Furthermore, there are literally hundreds of operators that allow abstracting the running of complex software within Kubernetes.

      While this is true I really wonder whether I am the only one who thinks that a lot of these are simply not great pieces software. I don’t mean to pick on them, and having used some I really appreciate the effort, but for the sake of honesty a lot of these are not nice to use in a productive manner, but have very annoying rough edges. I don’t want to get in on individual ones, but to give some examples. For operators you might have silent errors, which can be very creepy, esp. when the configuration has minor variations compared to the software, or automatism that fights you. For UIs and IDEs there are the typical “smaller project” things. Like hard to search through logs, interfaces hard to adapt, stuff is shown out of date, something named badly or confusingly, etc. It’s the kind of topics one has when an IDE offers support for something new, like it was with Git or other things a decade or so ago. When things are not polished, they at times might be worse than not using them and I switched back and forth a lot using them.

      Nomad and Consul came a long way there over the last year as well. Their web interfaces used to be like that, but now they start to be quite nice to use. Also certainly not perfect, but they actually made certain third party tools obsolete.

      In the end you still should know how to do stuff on the command line, no matter what you choose. It will come in handy.

      1. 2

        I heartily agree with you about abstractions-on-abstractions. Most Operators are just ways to combine several Kubernets native components together into a single, proprietary, package. It’s like a Helm chart, but different so that only the developers of the Operator really know what’s going on. In my experience, using those kinds of Operators is more-or-less a waste of time since you have to either learn the Operator and all of it’s constructs or you could just learn the Kubernetes constructs and learn how they operate with one another. I am firmly in the latter camp; I am also in the camp that self-hosts and does not use Kubernetes at home because it’s not a good tool.

        That being said, though, there is one place I can point to and give a two-thumbs-up-recommendation for an Operator. This is in places where the Operator actually provides new functionality in the Kubernetes API and not just an alternative abstraction for a Helm chart. The Operator in question is cert-manager. It provides functionality that Kubernetes does not provide natively and cannot be reasonable shoehorned into whatever it already provides. The new constructs map readily to a usable pattern that is easy to grok.

        On the other hand, there is the RabbitMQ operator which just takes all of the functionality of a Helm chart and hides it in things that can’t be viewed without a lot of kubectl magic… There is a place for everything and everything in it’s place. Use cert-manager. Avoid all other Operators unless there is a firm understanding of the additional abstraction layer it necessitates.

        1. 2

          IIUC, the promise of operators is that with just a simple API call, I can have, say, a database cluster that then maintains itself, replicates itself, backs itself up, recovers itself on a new node if something happens to the old master, etc. I already have that with AWS managed services like RDS. If a disaster happens while I’m asleep or on a plane (though the latter doesn’t happen much these days), I can be confident that the service will recover itself. Yet I doubt there are sysadmins at Amazon babysitting my specific AWS instance. That’s why it seems plausible, at least to me with my lack of expertise in this area, that a Kubernetes operator should be able to do the same thing.

          1. 7

            Yes. The difference is that if something ends up not working (which I guess is the reason there is DevOps, SREs, etc.) with Amazon you call support, whereas with operators you hopefully have enough overview of the insides of the operator.

            You also might end up fighting some automatism. So you should still really know what you are doing and don’t assume it will just do everything for you.

            Or coming from a different angle. If everything worked as intended all of that wouldn’t be required. So I always wonder what happens if stuff breaks and the operator is another thing that can break and some of the bigger ones are pretty complex, having their own bugs. And since the initial thing you start off is a “disaster” you want to recover from I think defaulting to assuming everything will go fine from them one might not be the best approach.

            Of course there are different operators there. This is not to say you cannot have a simple operator for a piece of software and it will make your life easier. There’s however also giant ones and by just simply installing it if something stops working and you rely on it it might your life a lot harder and outages a lot bigger. So what I mean is really, that you should know what it implies if you download some operator with all these nice features that is run by some big corporation that had a team build that operator for some integral piece of software. If they have a problem in the operator they will sure have someone capable of fixing the issue. The question is whether your team can do much more than filing a bug report and hoping it’s fixed soon. Like don’t start to get an understanding of it when the disaster is already happening.

            1. 5

              Strongly second this. At work we use an in-house operator to maintain thousands of database clusters - but the operator is like a force multiplier or a bag of safe automations. It allows a small team to focus on the outlier cases, while the operator deals with the known hiccups.

              The whole thing relies on the team understanding k8s, the databases and the operator. The second feature we built was a way to tell the operator to leave a cluster alone so a human could un-wedge it..

              1. 4

                In my very limited experience with operators, they tend to be big old state machines that are hard to debug. It takes a very long time to get them working reliably and handle all the corner-cases.

              2. 4

                Beware of wishful thinking. It’s plausible but in reality we’re not there yet. Operators might help, but they’ll also have crazy bugs where they amplify problems because some natural barrier has been erased and is now “simply an API call”. One classical example is such operators breaking pod collocation constraints, because it’s so easy to mess it up and miss the problem until an incident happen.

                Sysadmins at Amazon are not babysitting your specific AWS instance, but how did they build something super reliable out of RDS ? I can only imagine, but I think they started with the widest range possible of known failures so they could put failover mechanisms in place to fight them. Then they added monitoring on the health of each Postgres instance, on the service provided, and on the failover mechanism themselves. Then they seated and waited for things to go red, fixed, and refined this for years, at scale. The scale helps here because if shows problems faster. I’m 100% sure most Kubernetes operators are not built that way.