1. 28
  1.  

  2. 22

    Are there any stripe folks on lobsters who know why stripe chose Consul as the service-discovery tool, instead of straight-up DNS or zookeeper? b0rk phrases it as “effective and practical” but hashicorp’s first commit to Consul was almost exactly three years ago, so if they’ve been using it since it came out. In comparison, kubernetes that she contrasted as “a new technology coming out”, had its first commit two and a half years ago, but it was already 250 files, so it was probably in development for at least half a year before that. I wonder if maybe the way Stripe talks about Consul has changed since they started using it–since they’ve used it for a couple of years, they think of it as battle-hardened, even though in the larger world of distributed systems it is not particularly broadly used. This might be true for Stripe, since they have already worn down the rough edges for their use case, but I don’t think if I was looking at choosing a service discovery system for a company that I would consider choosing Consul anti-flashy.

    One thing that worried me about Consul is exactly what Stripe ran into when running it, that it’s built on top of a framework for guaranteeing consistency. In practice, strong consistency might not be what you want out of a service discovery framework. It might be appropriate if you really don’t want your servers to ever accidentally talk to the wrong server, and you reuse names (or IP addresses, which is often what service discovery nodes point to), since after you remove a node from a cluster, you can be pretty certain that all clients will see it provided the service discovery cluster is talking to people. In the long run, a better solution to this problem than requiring strong consistency in your service discovery tool is requiring that clients and servers authenticate each other, so they agree that they’re talking to who they think they’re talking to. If I was picking a flashy new service discovery framework, I would probably look at eventually consistent tools like Netflix’s Eureka. If I was trying to do something battle-hardened, I would probably pick Zookeeper.

    Looking at Zookeeper naively, you might ask, “Why is this strongly consistent hierarchical file system the go-to default for service discovery?” One thing is that it was designed for generic configuration changes, so it receives updates via the “watches” API, and the “ephemeral znodes” API, which can be the fundamental building blocks of a service discovery tool. That’s the long and short of why people have used it for practically a decade.

    Other than that, zookeeper doesn’t have a lot that particular commends it. It does great throughput (for a strongly consistent store), and many people have used it for service discovery for a long time so you can be pretty confident in it. On the other hand, when it’s not confident it can make progress safely, it just doesn’t make progress–this can mean that new nodes can’t start up (because you can add them to a cluster) and that old nodes can’t be removed (because you can’t remove them from the cluster). Leader elections can also be pretty painful. Unfortunately, these are also problems that Consul faces, because it made the same choices about consistency that Zookeeper did.

    Now that they’re using DNS on top of Consul, they have two single points of failure. Although we treat DNS like air, and assume it’s an unlimited resource, DNS is still a name server, and it can still go down. With that said, DNS is really battle-hardened, so usually the problem comes when somebody poisons your DNS server somehow. This is problem is mitigated by being in an environment where experts run your DNS servers, but it can still be bad.

    The other thing is that network partitions are real, and you don’t necessarily want to take down your own website because your service discovery cluster can’t talk to every remote host. Just because your service discovery cluster is partitioned from them, doesn’t mean that they’re partitioned from each other! The nastiest problem then isn’t when Consul is down, but when Consul is up and is sending you garbage that makes you think everyone is down. An end solution ends up being to only trust Consul as a source for adding new information–your load balancer assumes that Consul tells you about new nodes, and ignores information about dead nodes until it can validate for itself that they’re dead.

    As b0rk mentioned, DNS can be pretty slow to update, which is usually the reason why people don’t want to use just DNS. If you’re happy with DNS’s propagation speed, it might make sense to cut the middleman and skip running your own service discovery tool. With that said, it can be a hassle to have to wait on the order of minutes for a service discovery update–in particular, it makes rolling restarts especially slow, since if you want to sequence rolling 5% of your cluster each time, you’ve added at least twenty minutes to your deploy. You can use blue/green deploys to make it easier to roll back, but as your cluster size grows, it becomes increasingly expensive.

    With all that said, I think this is a really cool story of taking a technology, getting it to work (upstreaming bugfixes! be still my heart), improving reliability by relaxing consistency, and simplifying healthchecking. Despite my skepticism of the newfangled hashicorp stuff, service discovery is a well-known danger area, so having zero incidents in a year is pretty dang good. I hope companies continue to put out blog posts like this one–the history and explanations of why decisions were made are great. Stripe does have an advantage since they’re employing b0rk though ;)

    1. 3

      I’ve never operated a system that size. I have been involved in designing and operating systems for many thousands of concurrent users. What I don’t understand yet about the scale of systems like the one described here is why not use a pull-based worker model?

      In my experience it’s far easier to scale workers (clients) of a lightweight, federated message bus like Tibco rv, RabbitMQ, or a tuple space like JavaSpaces. Are these just out of fashion? Are there technical reasons to run workers as push-based servers with more complicated intermediaries such as are described here?

      1. 4

        Hey, I’m responding up closer to the top so anyone reading can easier find this. I’m going to try to summarize your proposed system design and mine and compare and contrast, then you can offer corrections. Please limit to corrections. I have also added my own opinions, so please respond with your own opinions. I have tried, but being human probably failed, to make it clear that something is my belief and necessarily going to apply equally to everyone, so please try to do the same if you respond.

        Proposal 1: Message Queue Based System:
        • Clients and servers are connected through a message queue.

        • A message queue is a collection of federated servers running something like RabbitMQ.

        • The queues themselves are transient, memory only.

        • A service is a single queue and has multiple consumers subscribed to it.

        • When a client wants to perform an RPC it:

          1. Has to find a message queue to talk to (service discovery, could be as simple as DNS or as complicated as Zookeeper)
          2. Has to find the specific queue to communicate on (could be that queues just have a convention in their name or it could be that there is another layer of service discovery).
          3. Publishes a message on the service’s queue with a return queue name (Is this another transient queue that only the client is subscribed to? Is it deterministic or new on each connection? Or is it a new queue for each specific RPC call?)
          4. It waits for the response on its queue.
        • Claimed Advantages

          • Adding a new worker for a server is very simple, it just subscribes to the appropriate queue.
          • Various metrics and relations between services is centralized.
          • Except for finding the MQs, service discovery + communication is all part of the same system which reduces operational complexity.
        • Claimed Disadvantages

          • Even federated, a single MQ failing can affect many requests.
          • Even federated, all the MQs could experience an outage which becomes an outage for every service (for example a bad configuration, or a bug in the MQs). The uptime of a service is dependent on the uptime of the queue system.
        • apy’s Opinions

          • I’m not convinced that the operational complexity is actually less, managing hundreds, if not thousands, of queues can be quite a burden.
          • I believe that queues being a choke-point for all communication is a significant risk.
          • I believe MQs such as RabbitMQ are very complicated pieces of software even if one does not use all of the functionality, the code is still there, which is an operational risk.
          • At scale I don’t feel confident the MQs are adding value. Consider a services that does 30 million RPS. Adding the MQ has a publish command, a consume call that goes to a subscriber, assuming messages are auto acknowledged the worker then publishes the response on the queue which becomes a consume as well. So we get 30 * 4 packets for a centralized service vs 30 * 2 (request response) for a direct connection. On top of that, if auto acknowledge is used then what happens if a worker dies while processing? In a direct connection, in many cases the other OS can close the TCP connection for you, otherwise one has to timeout. In the case of an MQ, I think the only tool we can use is a timeout, which can increase latency. If auto-ack is not used, then that is 30 * 5 million operations per second. (As I am not sure if I got the initial proposal correct, I could be quite wrong here).
        Proposal 2: Direct Connection Based System:
        • Clients and servers communicate over a direct connection (for example TCP) but possibly with a proxy running on the local machine.

        • Finding each other is a distinct problem from communication. Service discovery can be implemented as DNS with a short TTL.

        • Load balancing can be implemented as a proxy such as HAProxy or nginx. This middleware will run on every client machine and will be populated through service discovery. Alternatively if one does not want to run this middleware, the client libraries can implement load balancing, this would be the same code that an MQ client that supports federated MQs would implement.

        • RPCs are a standard socket request/response.

        • Adding a new worker to a service requires updating service discovery which then will propagate to the middleware running on each client. In the case of DNS, a cronjob could run every minute and check and update the HAProxy or nginx config or inform the client library somehow.

        • Claimed Advantages

          • This has no choke point. If service discovery goes down then old configs can be used. If a worker goes down, only those requests hitting it are failed. Operational risk is spread out through many services rather than consolidated into a few. The uptime of a service is moved almost entirely to the service itself.
          • The layers involved are simpler, albeit there are more instances of the layers.
          • The communication is the same interface (a socket) at each layer simplifying the mental model.
        • Claimed Disadvantages

          • Adding a new worker is costly no matter how you slice it.
          • If we consider multiple instances of duplicate parts, there are many moving parts as each client has the full stack.
        • apy’s Opinion

          • I think that if we need federated MQs, that requires the same client intelligence of doing load balancing/failing over between MQs that we could use for talking to services directly. In that case we do not need the HAProxy/nginx layer if we do not want it and, I believe, we have a much simpler system than the MQ one.
          • In my experience, spreading operational risk around is what one wants at scale. Federated MQs, even being multiple machines, consolidate a lot of risk to one place and it’s likely difficult to keep those small group of machines at 5 or 6 nines uptime as scale increases.
          • When small the extra packets for the pub/subs is probably not a problem but at scale it adds up and could be.
        1. 1

          Your description of a message based solution is much more complicated than needed. I’ll respond in more detail when I have time.

          1. 1

            not necessarily going to apply equally to everyone, so please try to do the same if you respond.

            Yes, if you look back at my first comment to this post, my position was a message-based architecture for RPC has worked well for me, but that I have not worked on “Internet-scale” systems. I have applied it to enough scenarios from manufacturing to internal business systems to the back-end of web applications that I do believe it is widely applicable.

            BTW what I am describing was considered the hot ticket abour 15+ years ago. People would have looked at you funny had you suggested using HTTP throughout your data center. That the reverse is true today seems to me mostly based in culture than technology.

            Proposal 1: Message Queue Based System:

            Note this is not my proposal. This is your interpretation of what I might propose. In the interest of time I will try to provide some brief “diffs” and comments to your interpretation.

            Clients and servers are connected through a message queue.

            Because they are not “servers” per se, I usually refer to then generally as “workers”. Or for RPC in particular, “Repliers” in the sense of RPC being “Request/Reply”.

            A message queue is a collection of federated servers running something like RabbitMQ.

            Yes. As mention in this thread already, RabbitMQ offers many more bells and whistles than needed for simple messaging, but it does provide simple messaging that’s pretty easy to configure and operate.

            Note that federation is just an organizational convenience. It’s not needed for operating a lightweight RPC per se.

            Let’s say you have Departments A and B. They each have a set of RPCs and other lightweight messaging needs. But you’d only like to expose some of those from A to the clients in B, and vice versa. Federation is essentially limiting the exposure to meet those organizational needs.

            In Tibco RV this is done using a “routing daemon” between subnets. The daemon only allows messages matching a certain pattern through to the other subnet.

            In JavaSpaces this is done using one or more spaces specifically for inter-departmental collaboration. Withing a department there would be one or more spaces devoted solely to that department.

            Also because these are lightweight messages, there is no reliability expected of the federation. No clusters, no persistent queues, etc. If something fails or does not respond in time, it is the edge clients that have the intelligence to take appropriate action.

            The queues themselves are transient, memory only.

            Yes.

            A service is a single queue and has multiple consumers subscribed to it.

            In its simplest, yes. Because these are lightweight messages with intelligence in the clients, there’s nothing to prevent multiple instances of the messaging system, each with the same “route” in rabbitMQ terminology. Clients can round-robin requests, or they could always use a favored instance unless it’s unavailable, etc.

            Has to find a message queue to talk to (service discovery, could be as simple as DNS or as complicated as Zookeeper)

            Yes, usually simply a DNS lookup of a single instance or as mentioned above, a list of them to use according to some policy. But it can be a very simple mechanism generally.

            Has to find the specific queue to communicate on (could be that queues just have a convention in their name or it could be that there is another layer of service discovery).

            Generally the requestor just knows the names of the RPC calls or one-way lightweight messages/routes it wants to send. Things can get more elaborate where, for example, a client may receive in one message information about subsequent messages it will send.

            In the case of RabbitMQ such lightweight routes only comes into existence when there is a client that subscribes. And it goes away when the last such client has disconnected.

            In the case of Tibco RV there is nothing configured in the middle. It’s up to some client to be looking for a particular pattern in the message. In the case of JavaSpaces the same is true, some client has to be looking in the space for a particular pattern. These pattern-based systems are preferable, but RabbitMQ’s in-memory, transient queues are very manageble.

            Publishes a message on the service’s queue with a return queue name (Is this another transient queue that only the client is subscribed to? Is it deterministic or new on each connection? Or is it a new queue for each specific RPC call?)

            The message can contain any route to reply to. The client can be automatically provided its own, they do not have to be explicitly named. This is somewhat like Erlang processes providing its own PID or the PID of some other process to reply to.

            It waits for the response on its queue.

            Generally its just a response, that can go anywhere, and whatever client is interested can subscribe. But typically, yes, it’s just fine for a client to get a call back or just sit in its thread and wait for the response.

            Adding a new worker for a server is very simple, it just subscribes to the appropriate queue.

            Yes.

            Various metrics and relations between services is centralized.

            I’m not sure what this means.

            Except for finding the MQs, service discovery + communication is all part of the same system which reduces operational complexity.

            Yes.

            Even federated, a single MQ failing can affect many requests.

            Federation is not needed for technical failover. As mentioned above, that’s often as simple as having a second instance to connect to. Because instances do not need to be clustered, cold or warm standby’s, etc. are not complicated either.

            all the MQs could experience an outage which becomes an outage for every service (for example a bad configuration, or a bug in the MQs). The uptime of a service is dependent on the uptime of the queue system.

            Because there is no need for the individual instances to be aware of each other, the risk of all instances being misconfigured or going down is very low. The intelligence of a worker to subscribe to another instance and/or to have a standby is simple. According to the value of the individual RPCs more elaborate measures can be taken to increase the probability of availability. But there is little risk in a catastrophic failure due to this architecture in particular.

            I’m not convinced that the operational complexity is actually less, managing hundreds, if not thousands, of queues can be quite a burden.

            In this scenario using RabbitMQ, one line of code in a subscriber client creates a queue and a final remaining such client exiting removes the queue. There is no management overhead. RabbitMQ with small messages over in-memory, transient queues has not been any kind of burden that I have noticed.

            I believe that queues being a choke-point for all communication is a significant risk.

            I’m not sure what you mean by choke point, in light of how I describe things above.

            Most of the effort is in the client application code. Consider that an individual client only requests a message when it is ready.

            On the other hand a “service” does not have that advantage. It may not be in good shape to accept a request. Something outside the service has to decide when to choose a particular service, has to hope that service is ready to respond, and has to handle failure in-flight. These are the choke points of traditional server-based RPC.

            I believe MQs such as RabbitMQ are very complicated pieces of software even if one does not use all of the functionality, the code is still there, which is an operational risk.

            To some degree, yes. RabbitMQ would not be my first choice, but it would not be my last (it’s fairly easy to administer for these setups, compared to some other big-honking message systems). There are other messaging systems that do not bring along all the capabilities of RabbitMQ. In this setup there is precious little for the message system to do, so not much is needed out of the box.

            Consider a services that does 30 million RPS.

            Although I’ve never scaled out to that degree, I can say that 30 million RPS would not have to go through one instance. In Tibco RV, they would not go over one subnet. In JavaSpaces, they would not go through one space.

            what happens if a worker dies while processing?

            The intelligence is pushed to the edges, where the applications have “expectations” and know how to respond to those expectations not being met. Note for RabbitMQ acks can but do not need to go all the way back. Yes, the requestor has to use a timeout and/or poll. In any case handling a failure at the application level is not much different.

            If very low latency is truly an issue for certain messages, I would not use RabbitMQ but neither would I use HTTP. I would use something like Tibco RV over UDP on subnets dedicated to specific kinds of messages. That is about as lightweight and low latency as you can get. The administration, operations, and application code overhead is still remarkably low, but the engineering put into messages/subnets is right where you’d have to put in the effort no matter what else you do.

            Proposal 2: Direct Connection Based System:

            Clients and servers communicate over a direct connection (for example TCP) but possibly with a proxy running on the local machine.

            “Direct communication” is very much an illusion in your scenario of hundreds or thousands of services with 30 million RPS. Add the monitoring of how all the actual servers are holding up behind the middleware vs. simple clients making requests instead of accepting them. You’ve got to be kidding me if you believe the server approach is easier in any way, shape, or form.

            Finding each other is a distinct problem from communication. Service discovery can be implemented as DNS with a short TTL.

            Indeed. Let’s go back to your hundreds or thousands of services use case! No, thanks.

            This middleware will run on every client machine and will be populated through service discovery. Alternatively if one does not want to run this middleware, the client libraries can implement load balancing, this would be the same code that an MQ client that supports federated MQs would implement.

            If you enjoy that kind of server management, then more power to you. As compared to running simple client applications? Lay all that out on a spreadsheet and show me how the numbers add up. They don’t.

            Adding a new worker to a service requires updating service discovery which then will propagate to the middleware running on each client. In the case of DNS, a cronjob could run every minute and check and update the HAProxy or nginx config or inform the client library somehow.

            That you can write this out and weigh it remotely positive relative to a message client approach leaves me more than scratching my head.

            This has no choke point.

            I’d sure like your definition of choke point. I think you are ignoring the reality of running large numbers of HTTP servers with various URLs and performance characteristics, across data centers, etc. Maybe you’ve been hitting yourself in the head with the HTTP hammer so long it’s going numb to the reality.

            The uptime of a service is moved almost entirely to the service itself.

            I’m having trouble participating in this debate with a straight face. Uptime of a complicated middleware + service architecture absolutely does not have a simpler uptime.

            The layers involved are simpler, albeit there are more instances of the layers.

            Simpler as measured by what? Your experience with cumbersome, heavyweight message systems, I can only guess. But that is not at all the recommendation.

            Adding a new worker is costly no matter how you slice it.

            False. With messaging, it is starting a new client application. Period.

            If we consider multiple instances of duplicate parts, there are many moving parts as each client has the full stack.

            The full stack of a client? It’s an application binary with a networking library. It’s as small as can be.

            I’m ready to agree to disagree with you and call it a day.

            1. 1

              Could you please rewrite this in the form of my message where you have clearly demarcated the proposal vs your opinions and claimed advantages and disadvantages as I requested?

              And quick correction: I have not specified HTTP in my proposal, the actual RPC mechanism is unspecified. Pure ProtoBufs requires no HTTP. gRPC does have HTTP/2.

              Also, I don’t think you are being equally respectful in your response to my proposal as I have been to yours. I have in no way claimed your head is numb or an equally flippant response. I’m trying to offer a constructive discussion of our ideas and I would appreciate it if you did as well.

              Finally:

              Adding a new worker is costly no matter how you slice it.

              False. With messaging, it is starting a new client application. Period.

              You seem to be misunderstanding the structure of my post. This comment is specifically for proposal 2. Not for both proposals.

              1. 1

                Let’s just agree to disagree. I’m confident in what my experience tells me. I don’t need to convince you of anything. This has gone on long enough.

                1. 1

                  Ok. But, again, I think you are misinterpreting my post. I am not trying to convince you, I am trying to get the exact proposals fleshed out so someone else can actually understand them. I would like to feel confident I understand what you are saying, even if I disagree.

                  1. 1

                    OK, I will write something up over the weekend. I will use Tibco Rendezvous instead of RabbitMQ, as it is the most lightweight message system I am aware of or have used. It does not have the trappings of what I call “heavyweight messaging” which most people have used or are familiar with. There are no centralized mechanisms at all. Everything is at “the edge of the bus”.

                    With RabbitMQ you really do have to avoid all the bells and whistles to use just the lightest weight messaging. It “runs in the middle” but you just run as little of it as possible. Not ideal, especially for making this kind of an argument.

                    JavaSpaces adds just a bit more than Tibco RV with some great benefits for the effort. But it too pushes just about everything to the edge.

                    1. 1

                      OK. Forget RabbitMQ. RabbitMQ has a range of capabilities and we only want the barest minimum of them. All the bells and whistles people normally associated with “messaging” only cloud the issue.

                      Here’s the idea cast using Tibco Rendezvous (rv). This is going to be a fairly simple explanation in the interest of (my) time.

                      rv is a completely decentralized message bus using UDP. The protocol is implemented by “Rendezvous Daemons” (rvd). Typically there is an rvd running on every node that is running at least one client application. A client application is simply any executable that talks to an rvd. Multiple client applications can talk to a single rvd. The rvd is responsible for negotiating between clients using the publish/subscribe and/or request/reply API and traffic on the network. An rvd communicates back to a client application using an inbox dedicated to that client. (Similar to an Erlang process message queue.)

                      Messages can be multicast or name a specific inbox. Messages are subject-based using a hierarchical subject name with pattern matching. An rvd ensures sending and receiving of whole messages, a certain level of reliability, and certified messages if desired.

                      For discovery => RPC a typical sequence would be:

                      Multicast a request for the desired procedures. For example a client application looking for a common set of procedures related to a particular activity and product could ask for all interested parties to provide a list of procedures corresponding to “the inventory of components that are a part of some product family”. The replies would be supported procedures and the inboxes of clients that can respond to requests for that particular procedure.

                      For a given available procedure, select one of the inboxes and send a request/reply directly to that inbox. When more than one is available, certain policies can be implemented in the selection process. For example clients that make their inbox available can include some indicator of their recent load. Or an inbox can be selected from the list at random, or walking down the list with each request. The rvd of the sender and rvd of the receiver will be the only components that exist between the two client applications.

                      As with any RPC mechanism there should be some agreed upon “contract” and design appropriate to that contract: preconditions, postconditions, agreements on response times, appropriate behaviors for missed deadlines, etc. Idempotent messages vs. non-idempotent, etc. Above the level of messages one would consider designing things like “leases” for resources that have some critical nature, and so on. The underlying message system itself is very lightweight indeed. You build what you need above that, just for the needs of the specific distributed system.

                      In the case of applications that span subnets (e.g. organizational boundaries or just separate subnets based on security or performance, etc.) there are other daemons in the middle. These are Rendezvous Routing Daemons (rrd) Clients do not talk directly to them. These daemons look for certain patterns of messages that are allowed to be routed across subnets.

                      1. 1

                        Thanks for the reply.

                        What do you believe is the difference between running A Rendezvous Daemon and Rendezvous Routing Daemon to running HAProxy on the clients (note, HAProxy can route TCP connects as well as HTTP, so using HAProxy does not require using HTTP)?

                        1. 1

                          Yes, this is the crux of the matter. In the case of Rendezvous these are not connection-oriented. They are UDP, with rv implementing its own messages, sequencing, etc. So the bottom line is there are not a lot of established connections sitting open on some TCP middleware component. One pretty lightweight component implementing an inbox and a UDP-based protocol for the application(s) running on the same node. No load balancing, no routing, no DNS, etc. Discovery is just another lightweight message on the multicast UDP “bus”.

                          As for the routing daemon, that is also a connectionless component sitting between two subnets implementing (dis)assembly of rv messages to UDP packets and routing based on pattern matching of the subject name hierarchies of those messages.

                          Adding new subject names requires zero adminstration unless you want to add a new pattern to the routing deamon. All you need is at least one client to subscribe and one client to publish. Those are just applications. About as close to zero administration as a distributed application can get.

                          Adding new workers to increase throughput or avaiablilty of certain messages is just adding a new client application on an existing node, or adding a new node, which is just another rv daemon.

                          1. 1

                            Thanks again. Two follow up questions:

                            • This architecture is not possible in RabbitMQ, correct? RabbitMQ has brokers that all data transfers through. When you said RabbitMQ in your first message, did you have an architecture with a broker in your head?

                            • Tibco RV is proprietary software as well as protocol, is that true?

                            1. 1
                              1. RabbitMQ does not use UDP. It is connection-oriented. The lightest-weight usage of RabbitMQ that I’ve described in these threads approximates this architecture.

                              2. Yes. And you said your objective was 30 million requests per second using hundreds if not thousands of individual remote procedures. Is that true?

                              1. 1
                                  1. The Tibco RV architecture you’ve described sounds like there is no broker, but RabbitMQ is an MQ broker, so it sounds quite different than what you’ve described. Literally, all traffic has to go through some RabbitMQ instances. Am I understanding your incorrectly?
                                  2. Also, is the main thing you are trying to avoid TCP connections? Is this because you believe TCP connections to be very heavy (I’m not looking for evidence that they are or are not, simply trying to understand your motivation for pointing out connections).
                                1. The 30 million number was not necessarily a goal, it was just in my mind because I was working with a service that day that has that amount of load. Do answer your question, though: I’m trying to consider ranges of 10s of RPS to millions. Part of my claim was my belief that, at scale, the MQ (with a broker, in this case, I cannot speak to Tibco RV) becomes much more expensive than direct connections.
                                1. 1

                                  Sorry. I’ve run out of time.

            2. 2

              In my experience it’s far easier to scale workers (clients) of a lightweight, federated message bus like Tibco rv, RabbitMQ, or a tuple space like JavaSpaces. Are these just out of fashion? Are there technical reasons to run workers as push-based servers with more complicated intermediaries such as are described here?

              I’m not sure I’m following what you’re saying. I skimmed the article, but the basic idea is providing a tool for service discovering each other. They will likely then do some kind of RPCs between them. This is different than a message queue which is fairly awkward for RPC and is a centralized component. And few MQ’s handle clustering very well.

              1. 4

                RPC is very convenient in the message systems I mentioned above. Much easier to set up and scale this kind of RPC worker than an HTTP-based or any “server socket accept” based RPC server.

                The RPC names are dynamically established. The workers are round-robin clients. The response is based on correlation IDs and lightweight caller responses mechanisms. It’s much like an RPC in Erlang where the caller includes the PID to respond to.

                Managing and scaling this kind of RPC I’ve found to be exceedingly easier than traditional RPCs (and I’ve used a bunch of them from pre-CORBA, CORBA, XML-RPC, SOAP, and up to today’s JSON-over-HTTP RPC.

                Nor do these mechanisms need to be centralized. Each of the ones I mentioned above can be easily federated in a non-clustered architecture. (Or clustered if needed. Which is really another important topic - I avoid clustering if at all possible.)

                1. 5

                  IME, RPC-over-MQ has not been very convenient relative to just opening a socket because:

                  • The MQ is another piece of infrastructure that needs to be managed and operated. It adds a bunch of failure modes that need to be handled. Queues do go down and that needs to be handled and that adds complexity.

                  • RPC in an MQ is, IMO, just awkward. The OS already provides a tool for this in the form of sockets. The MQ does not simplify this, it only makes it more complicated. The MQ does offer some load balancing there but there exist tools such as nginx and HAProxy that are pretty easy to deploy, so I wouldn’t say the complexity the MQ adds is worth it relative to those tools.

                  The one strength I think something like an MQ has is centralizing metrics and the relationship between producers and consumers. That is often a challenge especially in the age of microservices. But that can generally be solved in less intrusive ways that do not create a bunch of new failure modes.

                  1. 4

                    “Just opening a socket” is not the same as putting multiple RPCs into production. That’s apples and oranges.

                    “MQ is another piece of infrastructure” – right. and so is a web server and all the HTTP middleware needed to make a scalable, reliable infrastructure appear to be something resembling “just opening a socket”.

                    In my experience it’s easiest to put the middleware in place for messaging because once that is in place everything to follow are just clients which are far easier than continuing to add servers.

                    “Queues do go down” – the beauty of RPC over these kinds of lightweight messaging systems is that the routing can take place over transient mechanisms. No need for clusters, no need to survive failures of any kind. Push the intelligence to the edges.

                    “I wouldn’t say the complexity the MQ adds is worth it relative to those tools” – the incremental cost of adding workers (clients) is far, far simpler. I’ve been doing it this way using several different messaging systems for 15 years. I’ve also been doing it from time to time using the traditional HTTP middleware. That way is far more cumbersome for developing and for operations and is more resource intensive.

                    I believe you are speculating what you believe to be true. But I can tell you this is a common misperception and thr truth on the ground is very much the opposite.

                    “MQ has is centralizing metrics and the relationship between producers and consumers” – something like that yes. Even when the messaging system is federated it has a lot of expressiveness because the same messaging mechanism can be used as the “control plane” as well as for the “data plane”. It is a simplifying mechanism, and not a all an additional burden.

                    “a challenge especially in the age of microservices” – if you like microservices you’ll love RPC over lightweight messaging systems. These are simple clients. Adding one more is a piece of cake. Adding ten more is not significantly more difficult than that.

                    “do not create a bunch of new failure modes” – messaging infrastructure reduces failure modes, regularizes the infrastructure, reduces the kinds and amounts of middleware. If you want to reduce failure modes, move as much as possible of your business functionality into non-clustered, lightweight client applications.

                    1. 1

                      “Just opening a socket” is not the same as putting multiple RPCs into production. That’s apples and oranges.

                      In what way? If one uses ProtoBufs, this is the solution. One opens a socket and serializes and deserializes PB frames. With gRPC, it’s the same, just add HTTP/2. Adding load balancing is putting a proxy between the sockets, which is doesn’t change the interface for the clients: they don’t have to change at all.

                      “MQ is another piece of infrastructure” – right. and so is a web server and all the HTTP middleware needed to make a scalable, reliable infrastructure appear to be something resembling “just opening a socket”.

                      Do you mean something like nginx or HAProxy by middleware? My statement was poor in the previous response, but what I should have said is that something like nginx or HAProxy have fewer new failure modes than something like an MQ. Their interface is the same (a socket) to the protocol services would use to directly communicate with each other. nginx and HAProxy are also very mature and robust solutions and much simpler than an MQ (at least one implementing AMQP).

                      Push the intelligence to the edges.

                      Yes, that is what I believe one gets by not using an MQ. An MQ has a lot of centralized intelligence in the MQ. Even if you federate, more logic is centralized.

                      the incremental cost of adding workers (clients) is far, far simpler.

                      That is about the only benefit, though, at, I claim, a much higher cost for everything else.

                      I believe you are speculating what you believe to be true. But I can tell you this is a common misperception and thr truth on the ground is very much the opposite.

                      As I said in my opening message: “In my experience”. So no, I am not speculating. We can come to different conclusions based on our experiences.

                      “do not create a bunch of new failure modes” – messaging infrastructure reduces failure modes, regularizes the infrastructure, reduces the kinds and amounts of middleware. If you want to reduce failure modes, move as much as possible of your business functionality into non-clustered, lightweight client applications.

                      My experience has not been one of simplification, especially in terms of failure modes. Most MQ implementation are very complicated pieces of software, even if you do not use the complicated parts. The middleware in something like gRPC is pretty small and has almost the same semantic as a socket. Testing locally is easier (no need to use the middleware in a local test). The protocols are generally simpler (STOMP is pretty simple but not really production friendly, AMQP is huge, and ZMQ isn’t much more than a socket library).

                      It looks like MQs have worked well for you. I cannot say the same for me. All I can do is offer a counter point for another person reading this who is not sure which to choose.

                      1. 1

                        “Adding load balancing is putting a proxy between the sockets”

                        Exactly so. Each client added to a messaging system stands alone. Each service added to an HTTP system involves configuring sufficient middleware to make that new service known, redundant, monitor-ready, etc. This has organizational and technical costs well beyond the case of adding a new standalone client.

                        “nginx and HAProxy are also very mature and robust solutions and much simpler than an MQ (at least one implementing AMQP).”

                        That’s good because you’re going to need more of them. Somewhat kidding, but note that an MQ should not be installed and configured for all the bells and whistles. If you need clustered, high-availability, etc. then distinct MQ’s should be configured for that. The lightweight RPC MQs do not require much at all and should not be used for anything but dynamic, transient messaging, RPC or otherwise. Even better, do not install any of the clustered, high-availability MQ systems at all.

                        “An MQ has a lot of centralized intelligence in the MQ.”

                        Clearly your experience is with heavyweight messaging. That is not at all uncommon but terribly unfortunate. I do not recommend them whatsoever under any circumstances.

                        OK, so I’m pretty sure we’ve come down to the situation where you’re experience is with heavyweight MQs. But I would recommend you look into just how lightweight some message systems can be. Most people are completely unaware of this approach these days. And my impression is they’re not aware of the cost of running stacks of HTTP all over the data center.

                        1. 1

                          But I would recommend you look into just how lightweight some message systems can be.

                          Can you state what you consider a “lightweight” message system to be? You mentioned RabbitMQ in your first post, which implements AMQP, in all its complexity, which I have mentioned multiple times but you state is “heavyweight”.

                          1. 1

                            “Lightweight” in the case of RabbitMQ means essentially in-memory, transient queues with routes that go away when the last client that subscribes to it goes away. Note this roughly corresponds to the essence of Tibco Rendezvous. I place pretty much everything else under the heading of “heavyweight”.

                    2. 3

                      nginx and HAProxy that are pretty easy to deploy

                      So, yes, they are easy to deploy, but to your point against MQ because it’s an added piece of infrastructure with failure modes, this applies as well. Nginx and HAProxy can fail.

                      In addition, nginx and HAproxy don’t just support adding an removing backends all willy nilly (the paid nginx might, but I’m not sure! And you can build this into your setup with something like openresty, too). For that, you might need some sort of central registry, dare I say, a service discovery mechanism that can update the config and HUP it into memory. This adds yet another component with potentially many failure paths to handle and deal with. So, it’s not all gravy with just sockets, either, but I do tend to agree with your general sentiment.

                      1. 1

                        So, yes, they are easy to deploy, but to your point against MQ because it’s an added piece of infrastructure with failure modes, this applies as well. Nginx and HAProxy can fail.

                        Yes, however they are much less complex pieces of software than something like RabbitMQ. The setup in this article is also fairly common: the clients all run HAProxy, which localizes the failures (unless it’s something systemic).

                        Additionally, an MQ is not removing the need for service discovery. You still need to find your MQ instances, especially if they are federated. And you need to find your queues in the MQ (either by convention or more software). So service discovery has to happen somehow. It could be as simple as something that polls SRV records every minute and updates HAProxy configs or it could be a full blown Consul setup like Stripe.

                        1. 2

                          the clients all run HAProxy, which localizes the failures (unless it’s something systemic).

                          Hmm… I didn’t get, from the reading, that the clients all run HAproxy. But, maybe I’m misunderstanding what your notion of client is here?

                          My understanding is that all of the API servers sit behind a pool of HAproxies–they are configured using consul-template, which will essentially HUP HAproxy when a change is made. I’ve used an isomorphic setup in anger before, with nginx, ZooKeeper, and a process that uses ZK watchers to rewrite an nginx config and HUP it into existence. At the very least, to do this, you have an additional component to monitor in the Watcher.

                          Additionally, an MQ is not removing the need for service discovery.

                          Possibly. I’ve deployed RabbitMQ in the past with a well known hostname, a VIP, with a node in hot-standby, which worked surprisingly well. RabbiitMQ being what it was, if we had been using the “clustering” capabilities, we’d probably have been fine doing the same thing for each additional cluster member, e.g. 2 machines behind a VIP.

                          But, yeah, to your point, depending on the setup, things could get very complicated with a MQ, too. I only wanted to point out that there isn’t really a silver bullet here.

                          1. 2

                            Hmm… I didn’t get, from the reading, that the clients all run HAproxy. But, maybe I’m misunderstanding what your notion of client is here?

                            No, you might be correct. What I was describing is how SmartStack was setup at an employer, and the pictures look pretty similar. I’m partial to this solution because it decouples publishing the topology of a cluster from using it. I’m also suspicious of coupling the transport with the service discovery.

                            I’ve deployed RabbitMQ in the past with a well known hostname, a VIP, with a node in hot-standby, which worked surprisingly well.

                            In the cloud era that can be harder. And it has issues like if you do failover, all your requests will increase in latency rather than just those to a particular host that has failed.

                            I only wanted to point out that there isn’t really a silver bullet here.

                            Absolutely. To clarify in case there is any misunderstanding, I am not claiming something other than an MQ is the silver bullet, rather that I believe an MQ increases complexity with no tangible gain when it comes to RPCs.

                            1. 1

                              In the cloud era that can be harder.

                              Yes, for sure! ELBs are definitely not equivalent to a VIP for this use case .:)

                              1. 1

                                However one must also consider administering and operating these systems, monitoring, etc. Simple client applications reading and writing to messaging systems are far easier to add, configure, monitor, scale out, fail/restart, etc. than ELBs and servers. Big picture, the reduction in time, effort, money, etc. is significant.

                                Client applications are just standalone “console” applications… typically no clustering, etc. necessary. If one fails, typically use your favorite supervisor to start it. If it fails too often, use that supervisor to escalate the problem.