1. 10

  2. 1

    Perhaps moving to Thanos would be a simpler solution? but I am not sure introducing k8s into the stack…

    1. 5

      Adding k8s into an environment that doesn’t use it would be a significant increase in complexity, and as far as I know we’d still need to set up a storage backend for Thanos as well. Although I didn’t mention it in the entry, we’re operating at a low enough metrics volume that 13 months of metrics is currently only 758 GB, which is easy enough to hold on regular storage without downsampling or other Thanos features.

      (I’m the author of the original article.)

    2. 1

      These days things are almost entirely down to the routine tasks of changing our lists of Prometheus scrape targets as we add and remove machines, and keeping up with new versions of Prometheus, Grafana, and other components.

      You could using Consul or some other Prometheus-supported service discovery for dynamic scrape targets.

      Of course, this ignores the fact that operating a whole new service like Consul isn’t free. But might be useful regardless.