1. 17
  1.  

  2. 33

    HashiCorp sponsored this post.

    1. 1

      I wonder if the post is will be deleted by the powers to be…

    2. 10

      This is a great article, with very valid points and well researched decisions. That said:

      Cloud Agnostic

      This is cheating. Just because you switched from hosted Kubernetes (GKE) to self-managed Nomad doesn’t mean you can’t have self-managed K8s.

      Everything else is fine, I liked the article.

      1. 8

        The post smells like one big Nomad advertisement.

        The line

        Nomad’s batch scheduler is optimized to rank instances rapidly using the power of two choices described in Berkeley’s Sparrow scheduler.

        “scheduler is optimized” is an overstatement. The paper to which they link to says, I quote:

        …a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design.

        Basically it’s a poor man’s load balancer. Sparrow scheduler chunks each node into slots and schedules workloads into them. As result randomized scheduling “provides near-optimal performance” only in one situation – when scheduled workloads are homogenious (like batch jobs). Good luck with reaching a high saturation of nodes with this kind of scheduler if you have to deal with any of the following factors

        • the workloads are anything more complex than batch workloads,
        • incremental schedling is needed,
        • the nodes are heterogenios (differ one from another significatly),
        • containers are used by your team in composable manner (employing pods) to achieve high performance through data locality

        It’s near impossible to allocate resources efficiently (with randomized scheduler) for non-homogenious workloads without high levels of churn.

        But from what I remember about Nomad, their shecuduler isn’t the “randomized” one, but the one using a system of two queues to shedule workloads. The Nomad’s scheduling page confirms it. So the article’s point is moot.

        But let’s leave advertising tone of the article aside and stick to an engineering side of things. It’s unclear to me why the option of custom k8s scheduler has not been considered at all. Engineering effort needed to implement a custom scheduler is (roughly) an order of magnitute leser than

        • migration of their rendering stack from managed GKE to a new scheduler
        • subsequent support of Nomed with dedicated on-call personnel

        References:

        1. Sparrow: Distributed, Low Latency Scheduling
        2. Configure Multiple Schedulers | Kubernetes
        1. 1

          It’s unclear to me why the option of custom k8s scheduler has not been considered at all.

          They did mention that they at least thought about developing their own, although they don’t go into more detail on whether or not this was completed/implemented and details about why this ended up not working (assuming it didn’t work).

          All these issues eventually convinced us to develop our own in-house autoscaler.

          1. 2

            All these issues eventually convinced us to develop our own in-house autoscaler.

            I don’t think so. IIUIC, with the phrase quoted above they refer to Nomad, cause:

            • they say it at the end of “Reasons Behind the Switch” section concluding their decision and
            • “autoscaler” isn’t a custom scheduler… unless it’s the author’s (sloppy) way to name it so
        2. 6

          I find it odd when blog posts only go on about the pros of a new service but fail to mention the cons or downsides.

          1. 5

            GKE maintains a fairly large footprint on each node for running system-level jobs.

            I’ve found this to be true of Kubernetes in general, including Amazon EKS. Services that might reasonably be considered part of the control plane, like CoreDNS, are run inside our nodes, when IMO they should be run in the control plane that the cloud provider is managing. They’re charging for that control plane, after all. That’s one reason why, for my tiny company, I switched to Amazon ECS; whether using EC2 or Fargate, a larger proportion of the compute capacity that we’re paying for can be dedicated to our services.

            1. 4

              tl;dr: k8s isn’t suited to run thousands of small jobs

              Nomad seems kind of reasonable, but so does writing your own job orchestration + scaling + provisioning on multiple providers