I think the point is that Zookeeper’s guarantee is overkill. If you temporarily lose data about a server being in a serverset, you’re probably not going to lose any money, unlike if you lost someone’s order form, or processed an order twice by accident. Where Jepsen is generally trying to demonstrate the consistency guarantees that a service actually has, Eureka never claims to have linearizable or serializable consistency.
I think that historically, most people used Zookeeper not for its consistency guarantees, but because Zookeeper provided a recipe for service discovery, and ephemeral nodes + watches provided the API that people wanted, despite its weird CAP tradeoff for a service discovery system.
I believe @robdaemon’s point is not whether Zookeeper is any good, but rather why should one trust Eureka is correct at all. What is the algorithm behind guaranteeing a Eureka node converges on a correct state?
There’s a difference between service discovery and distributed consensus. While Eureka has always been good about converging on state, it’s made vastly easier since all data in Eureka is ephemeral and all sources heartbeat. Lets say a Eureka node drops out, comes back and a node disappears from discovery due to a bug during resolution. 15 seconds later, the node heartbeats and it’s back in discovery.
Discovery is a much different animal and entirely trades consistency for availability. It’s OK if things aren’t quite right, they will be eventually.
During partition events (too many expiring nodes at once), the registry is frozen for deletes to ensure the entire registry doesn’t empty out. At that point, the intelligence in the HTTP clients takes over. Even though you might have 8 nodes to choose from, 3 might be bad. The clients check liveliness of those nodes and stop using them if they’re down.
In the end, it’s about resilience. Ephemeral state makes correctness a lot easier as well.
I work for FullContact, we use Eureka for discovery across our infrastructure and it’s been near bulletproof for us.
You’re missing the point again. You’re still framing it as a problem which would require a formal proof. It’s not. And you’re not “replacing Zookeeper”. You’re replacing Zookeeper for service discovery. There’s a huge difference.
Zookeeper is a fine system, but it’s fragile operationally compared to systems that don’t need to be perfectly consistent. If you’ve ever run ZK in AWS across multiple AZs, you’ll know that it can be problematic. And when ZK goes down, that means discovery is out and your services probably can’t even start up. Discovery has to be reliable.
Service discovery can play a little more fast and loose with correctness for much greater resilience. Eureka does not do any of the other functions of Zookeeper. It doesn’t do leader election, or distributed consensus. It can’t, it’s algorithms would be terrible at distributed consensus. https://github.com/Netflix/eureka/wiki/Understanding-Eureka-Peer-to-Peer-Communication for a somewhat high level overview. But think about the data in a server registry. Hostnames, datacenter info, and a timestamp. That’s really easy stuff. You can even blow away an entire Eureka node’s state and rereplicate from another node. If you’re missing entries, you’ll get them once the Eureka clients heartbeat again. It’s not a one time registration, it’s continuous checkins.
The only issue we’ve had with Eureka was when we were in early phases of rolling it out. We do red/black (also known as blue/green or A/B) deployments. Our servers weren’t shutting down gracefully and would cause dozens of registered servers to expire at all once, causing Eureka to think it was in a network partition and freeze the registry. Once all of our services were in, a few dozen nodes either way stopped pushing it under the heartbeat threshold for “emergency mode” and we fixed our graceful shutdowns to cleanly deregister nodes.
Jepsen, and most databases, are all about making sure that the sum of applied operations are valid when all the clusters have divided and merged again. Eureka isn’t. Among other things, in a healthy cluster, a registration that comes in at time t is automatically invalidated and purged at time t+delta (user-configurable); clients have to phone home on a regular basis (at least more often than delta/2; delta/3 is best practice) if they want to be kept in the registry. Even a single Eureka server with no failures would therefore fail Jepsen if you just paused after writing for a bit before doing the read verification. Clearly, this is not doing the same thing.
In fact, Eureka’s barely a database at all. It’s really, at its core, just a way of passing around expiring caches of service name to IP address. That’s it.
Eureka’s total algorithm is really just a simple union:
Whenever I get a registration, either add it to the service -> IP map, or, if I’ve already got it, update the expiration deadline to be the minimum of the submitted one or now+delta. (Clients can ask to be expired sooner than now+delta, but cannot request to be purge-proof, basically.)
More often than delta/2, ping other servers and give them a dump of my registry, including my expiration times. They’ll merge with the above algorithm.
Every delta, delete anything that hasn’t phoned home in delta, either directly or through one of the other Eureka servers.
…except that, if I’d lose more than 20% of my servers by doing this purge, stop expiring entries until that condition is not true.
Note how insanely different this is from Jepsen. That’s because it’s not trying to do anything remotely similar. And that’s actually the point of this whole post: ZooKeeper is really complicated because it’s trying to be a CP database. Eureka is really simple because it’s just providing a clean way to union a bunch of rolling caches. If you’re doing lock coordination, Eureka would be insane, but this is exactly the kind of thing you want for service discovery.
What is your threat model here? We can only talk about consistency if there’s something to be consistent about. Jepsen is worried about scenarios like: a value is 2, you add 1 to it, you add 1 to it again, you read it, and suddenly it’s 5. If a service entry for a particular server in Eureka got duplicated, that would be maybe marginally annoying, but it would barely even qualify as a bug.
I don’t believe the datastructures would allow for duplicates, but missing entries sure, probably possible. During normal operation however, checkins are replicated to all servers which fixes up missing entries pretty quick, except during partition events. In that case, you probably have bigger problems than load distribution.
Maybe it’s more constructive to frame this as: what guarantees do you want from your service discovery mechanism in the presence of network partitions?
With zookeeper the guarantee is: you will either get the network consensus answer, or no answer at all. With Eureka it sounds like the answer is: you will always be told about any services that the node you’re talking to has received recent updates from, and you will sometimes be told about services that are in fact down. We could formally model/prove these I guess, but IMO the main use of these formal models is assuring you that particular invariants are maintained (e.g. that the results of a sequence of compare-and-swap operations on the same field will admit a serializable ordering), and I don’t think there are really any relevant invariants for the service discovery use case.
[Comment removed by author]
I think the point is that Zookeeper’s guarantee is overkill. If you temporarily lose data about a server being in a serverset, you’re probably not going to lose any money, unlike if you lost someone’s order form, or processed an order twice by accident. Where Jepsen is generally trying to demonstrate the consistency guarantees that a service actually has, Eureka never claims to have linearizable or serializable consistency.
I think that historically, most people used Zookeeper not for its consistency guarantees, but because Zookeeper provided a recipe for service discovery, and ephemeral nodes + watches provided the API that people wanted, despite its weird CAP tradeoff for a service discovery system.
I believe @robdaemon’s point is not whether Zookeeper is any good, but rather why should one trust Eureka is correct at all. What is the algorithm behind guaranteeing a Eureka node converges on a correct state?
There’s a difference between service discovery and distributed consensus. While Eureka has always been good about converging on state, it’s made vastly easier since all data in Eureka is ephemeral and all sources heartbeat. Lets say a Eureka node drops out, comes back and a node disappears from discovery due to a bug during resolution. 15 seconds later, the node heartbeats and it’s back in discovery.
Discovery is a much different animal and entirely trades consistency for availability. It’s OK if things aren’t quite right, they will be eventually.
During partition events (too many expiring nodes at once), the registry is frozen for deletes to ensure the entire registry doesn’t empty out. At that point, the intelligence in the HTTP clients takes over. Even though you might have 8 nodes to choose from, 3 might be bad. The clients check liveliness of those nodes and stop using them if they’re down.
In the end, it’s about resilience. Ephemeral state makes correctness a lot easier as well.
I work for FullContact, we use Eureka for discovery across our infrastructure and it’s been near bulletproof for us.
[Comment removed by author]
You’re missing the point again. You’re still framing it as a problem which would require a formal proof. It’s not. And you’re not “replacing Zookeeper”. You’re replacing Zookeeper for service discovery. There’s a huge difference.
Zookeeper is a fine system, but it’s fragile operationally compared to systems that don’t need to be perfectly consistent. If you’ve ever run ZK in AWS across multiple AZs, you’ll know that it can be problematic. And when ZK goes down, that means discovery is out and your services probably can’t even start up. Discovery has to be reliable.
Service discovery can play a little more fast and loose with correctness for much greater resilience. Eureka does not do any of the other functions of Zookeeper. It doesn’t do leader election, or distributed consensus. It can’t, it’s algorithms would be terrible at distributed consensus. https://github.com/Netflix/eureka/wiki/Understanding-Eureka-Peer-to-Peer-Communication for a somewhat high level overview. But think about the data in a server registry. Hostnames, datacenter info, and a timestamp. That’s really easy stuff. You can even blow away an entire Eureka node’s state and rereplicate from another node. If you’re missing entries, you’ll get them once the Eureka clients heartbeat again. It’s not a one time registration, it’s continuous checkins.
The only issue we’ve had with Eureka was when we were in early phases of rolling it out. We do red/black (also known as blue/green or A/B) deployments. Our servers weren’t shutting down gracefully and would cause dozens of registered servers to expire at all once, causing Eureka to think it was in a network partition and freeze the registry. Once all of our services were in, a few dozen nodes either way stopped pushing it under the heartbeat threshold for “emergency mode” and we fixed our graceful shutdowns to cleanly deregister nodes.
Jepsen’s not applicable to Eureka.
Jepsen, and most databases, are all about making sure that the sum of applied operations are valid when all the clusters have divided and merged again. Eureka isn’t. Among other things, in a healthy cluster, a registration that comes in at time t is automatically invalidated and purged at time t+delta (user-configurable); clients have to phone home on a regular basis (at least more often than delta/2; delta/3 is best practice) if they want to be kept in the registry. Even a single Eureka server with no failures would therefore fail Jepsen if you just paused after writing for a bit before doing the read verification. Clearly, this is not doing the same thing.
In fact, Eureka’s barely a database at all. It’s really, at its core, just a way of passing around expiring caches of service name to IP address. That’s it.
Eureka’s total algorithm is really just a simple union:
Note how insanely different this is from Jepsen. That’s because it’s not trying to do anything remotely similar. And that’s actually the point of this whole post: ZooKeeper is really complicated because it’s trying to be a CP database. Eureka is really simple because it’s just providing a clean way to union a bunch of rolling caches. If you’re doing lock coordination, Eureka would be insane, but this is exactly the kind of thing you want for service discovery.
What is your threat model here? We can only talk about consistency if there’s something to be consistent about. Jepsen is worried about scenarios like: a value is 2, you add 1 to it, you add 1 to it again, you read it, and suddenly it’s 5. If a service entry for a particular server in Eureka got duplicated, that would be maybe marginally annoying, but it would barely even qualify as a bug.
[Comment removed by author]
I don’t believe the datastructures would allow for duplicates, but missing entries sure, probably possible. During normal operation however, checkins are replicated to all servers which fixes up missing entries pretty quick, except during partition events. In that case, you probably have bigger problems than load distribution.
Maybe it’s more constructive to frame this as: what guarantees do you want from your service discovery mechanism in the presence of network partitions?
With zookeeper the guarantee is: you will either get the network consensus answer, or no answer at all. With Eureka it sounds like the answer is: you will always be told about any services that the node you’re talking to has received recent updates from, and you will sometimes be told about services that are in fact down. We could formally model/prove these I guess, but IMO the main use of these formal models is assuring you that particular invariants are maintained (e.g. that the results of a sequence of compare-and-swap operations on the same field will admit a serializable ordering), and I don’t think there are really any relevant invariants for the service discovery use case.