1. 10

    1. 9

      It’s great to not depend entirely on etcd, but just wanted to point out that the work of managing etcd and Postgres differ vastly.

      Etcd is a very simple K/V store, with Raft consensus out of the box and with just a few knobs to configure. Take the officially distributed binaries and run them on a systemd service and you’re mostly good to go.

      Postgres can do basically anything, which means that you’ll not use 99% of the features and has the drawback of complexity. Replication and fail tolerance is possible but more complicated.

      If there’s a complete failure of your systems and need to reinstall everything, I’m way more confident on my abilities to remember, understand and setup etcd on a hurry.

      If Kubernetes can run perfectly fine with a K/V database, why use anything more complicated than that. I think the focus should be towards alternative K/V databases, not relational databases.

      1. 2

        Are there any good tutorials on running etcd from the ground up? It seems like every time I try to run my own k8s, I run into etcd networking/configuration problems and holding all of the k8s stuff in my head plus the etcd stuff is just too much. Maybe if I could understand etcd better in isolation I would feel more comfortable operating k8s.

    2. 3

      If you’re running your own Kubernetes cluster, then you know the pains of managing etcd.

      I’d actually like to hear about this, because I’ve found etcd to be almost entirely hands-off. I have ran a kubeadm kubernetes cluster in my home since version 1.12 or so, now upgraded to a currently supported version.

      I run “externally managed” etcd, meaning I manage the etcd cluster myself, as opposed to the option to host it within the same kubernetes cluster it backs. I am running etcd in docker, in a 3 node cluster.

      Setting it up required creating/signing a few certificates, one for each node and one for kube. Upgrading it is trivial, I haven’t had to do more than stop the old etcd containers and start a new one.

      The most trouble I’ve had with it was one node seemed to experience some kind of corruption and wouldn’t join the cluster. I can’t remember if I restored the single node’s backup or let it sync from the other nodes but in any case it was a non-issue.

      1. 4

        It depends; At my past job, we had services causing a bunch of etcd churn, and it didn’t deal well with keys changing often. It would fall over, and require manual intervention to recover.

        If you’re not writing to it often, it’s fairly painless.

        1. 2

          Fair point, I have ~350 pods which is tiny.