1. 42
  1. 28

    What most of these discussions miss is the existence of services. The alternatives are not a binary “microservices capped at 50 lines of Go” and “one big Rails app for your whole product” – there is a lot of room in between to have multiple services of varying sizes and number.

    1. 17

      Yeah, the part of the article talking about having 80 little services sounded painful. Khan Academy has been rewriting our system from a Python 2 monolith to Go services, but our whole system is ~26 services broken down by function and natural boundaries and not “make this thing as tiny as possible.”

      A lot of people use the term “microservices” when they really just mean “services”, and I’ve somewhat given up on trying to change people’s minds on the terminology. But this article is actually talking about _micro_services, and that really does seem problematic.

      1. 2

        The phrase “Service-Oriented Architecture” was thrown out in favor of Microservices, but I like SOA better because it leaves open the possibility of right-sized services of whatever complexity they need to be.

    2. 7

      This is an article I’m been trying to write for awhile and failing. The reason is, I can’t come up with the clear-cut “use x not y” scenario here. Monoliths are easier from an infrastructure perspective, but in my experience working on large monoliths at a certain scale becomes extremely difficult. You need a lot of practical experience with the monolith before you can start contributing meaningful improvements and overall feature velocity drops.

      I suspect that’s why microservices took off, in part due to CTOs losing interest in the amount of time it took to ship a new feature. Suddenly you could leave the cruft behind and embrace the specific language and framework best suited to the problem at hand, not “the tool you were forced to use because that’s how the monolith works”. However it does add a ton of operational cost in terms of management. Tracing, a nice to have in a monolith becomes a real requirement, because you need to see where the request is failing and why. Your metrics and logging platform becomes really a core production system, but at any time one of your services can crush it.

      I think if I were starting a company today I would likely start with a monolith, only because it would keep my application complexity down. Then as I grew I would break off services into microservices, but I don’t know if this is “best practice” or simply the pattern I’ve seen work ok.

      1. 10

        You need a lot of practical experience with the monolith before you can start contributing meaningful improvements and overall feature velocity drops.

        I keep seeing people make that argument, but I never really understand it. I can’t imagine what architectural benefit is gained by having the network boundary between the components of your system? How is an HTTP request better than a function call? In what world do you get better IDE, debugging etc. support for making an HTTP request compared to making a simple function call? How is it helpful that whenever you make an HTTP request instead of a function call there’s the possibility that that there might be network delays or version differences between the two components talking?

        And before anyone replies with “but with HTTP requests you get better logging, you have tracing frameworks etc. etc.”, what stops you from logging and tracing the same things through function calls? And before anyone replies with “but with microservices, the requests need to be self-contained, so they result in better decoupling”, what stops you from designing the internal structure of a monolith the same way? (I.e. passing around immutable data, instead of turning your codebase into OOP spaghetti?) I think microservices force teams to write more pure and functional code and that’s why people perceive architectural benefits in them, but the only thing stopping you from writing your entire application in a functional style is your habits IMO…

        So, I think microservices are only about performance and scaling (the performance kind, not the complexity kind).

        1. 7

          I keep seeing people make that argument, but I never really understand it. I can’t imagine what architectural benefit is gained by having the network boundary between the components of your system? How is an HTTP request better than a function call? In what world do you get better IDE, debugging etc. support for making an HTTP request compared to making a simple function call? How is it helpful that whenever you make an HTTP request instead of a function call there’s the possibility that that there might be network delays or version differences between the two components talking?

          You’re looking at microservices through the technical lens, and through that lens you are correct: they’ll almost always fail. But your analysis doesn’t consider people.

          In my experiences with large development teams, microservices shine when you need to scale people. When you have an application that becomes so large that the test suite is irreducibly slow, QA takes a week to complete regression, release engineering means coordinating hundreds of simultaneous changes in diverging areas of the system, that one developer can’t keep the features of the product in their brain at one time, and you’re still working on dozens & dozens of new features and the requests aren’t slowing down…

          @maduggan mentioned “feature velocity” – they’re spot on. When the 10+ year old monolith I am working on decomposing was a baby, you could launch new functionality in minutes to hours. Now, two weeks is the lower bound… and it’s not just a procedural lower bound anymore.

          Microservices let you get back to better velocity and better fault tolerance – if you build for it! – but you pay for it in exchange for more difficult debugging, slower performance, etc etc. IME, those are the worst tradeoffs to take when just starting a product, so always start with a monolith. Worry about the problems you’ll face when you’re a big success if/when you become that big success!

          1. 4

            In my experiences with large development teams, microservices shine when you need to scale people. When you have an application that becomes so large that the test suite is irreducibly slow, QA takes a week to complete regression, release engineering means coordinating hundreds of simultaneous changes in diverging areas of the system, that one developer can’t keep the features of the product in their brain at one time, and you’re still working on dozens & dozens of new features and the requests aren’t slowing down…

            Couldn’t all of these issues be solved more efficiently by splitting the monolith up into packages? (Ie instead of moving something to a separate service, distribute it as a package and consume it just like you would any other 3rd party software.)

            The only overhead I can think of is that you may need to host your own repo mirror (in the worst case). You may also need to handle sunsetting of older, incompatible package versions, but this release synchronization problem already exists in micro services, so it’s not really a strong argument against packaging.

            1. 4

              In some cases I’ve found that pattern to work really well. One of my current monsters is a Ruby application, and partitioning functionality into individually versioned Rubygems hosted in the same repo that holds our vendored dependencies have let introduce a hub & spoke model. You can isolate development in those gems, they can have their own test/deployment/release cadences, and the core application can pull them in as they are ready.

              But it’s not a silver bullet. As an example, we’ve had to introduce new software and infrastructure to change how our core search process works. We decided to cleave that as a microservice: the technology stack is necessarily different from the core application, and hanging it as a sidecar let us make use of an existing OLTP -> reporting stream. No new infrastructure to the core application, no changes to tap into the data, and a nicely defined failure domain we can take advantage of: the search microservice lives behind a circuit breaker and we always have the ability to fall back to the existing (horrifically slow and expensive) search.

              Another place where I’ve felt pain is with acquisitions. Your nice homogeneous company can become a sprawling polyglot with very different infrastructure and deployments. Exposing an API / RPC endpoint is often the least common denominator you have.

              1. 1

                We decided to cleave that as a microservice: the technology stack is necessarily different from the core application.

                I think this is a solid argument for extracting a service, at least under certain circumstances, but I wouldn’t call it a microservice architecture just because you support your monolith with a few tiny suppporting services with well defined purposes.

                BTW, having to use a different stack doesn’t always have to mean extracting a separate service. Whenever I need to cross the language/ecosystem barrier, I first evaluate whether FFI is an option, if not, then I consider writing that “service” as a one-shot CLI application and spawning it as a child process from my service every time I’d otherwise make a network request (making sure that the executable is reproducibly built and injected into your path is very easy with nix, for instance). I know these are painful options compared to just writing a new function or a new package, but I think they’re still less painful than writing a new service.

                Another place where I’ve felt pain is with acquisitions.

                Yeah, that does sound hairy…

              2. 4

                If we split a monolith into distinct packages and those packages can only interact with each other through strict interfaces, then I’d argue you’ve implemented micro-services that run inside of a single execution environment. The organisational benefits are the same in both models.

                1. 3

                  I strongly favor monoliths where possible, but this isn’t true. Teams releasing bugs in their packages can stall a release of the monolith if they cause a rollback. The packages approach definitely scales better than an entangled monolith if you’re doing it right, but at some level of scale those packages will need to be broken out into services with their own ops teams. As a hyperbolic example, suppose all of Google was a monolith—barring obvious technical limitations. It wouldn’t work.

                  In my eyes, the real problem is people dramatically underestimate the development scalability of a well designed monolith. Or have problems with a poorly designed monolith and jump to microservices right away rather than refactoring their code.

                  1. 1

                    I’m not sure I totally agree. If you: a) allow your software to be released continuously as opposed to large staged releases then an individual change is a small rollback, not an entire feature; or b) use feature flags to enable new features (and expose bugs) then a rollback of a release might be as simple as unflagging instead of doing a new deploy; or c) you release each package independently as opposed to all at once, then you can rollback a single package instead of the whole application (consider say hot reloading a single module with erlang/elixir)

                    That’s not to say you should do these things, but more calling out that proper scalable package based development is much more similar to microservices than we normally acknowledge and that it’s easy to conflate monorepo problems with monolith problems with deployment practice problems with release strategy problems and so on, but that they’re actually different problems and certain things aren’t necessarily incompatible?

                    1. 1

                      I think I was unclear. I meant they aren’t completely equivalent. Yes the benefits are similar, but at some level of scale you will need to split up into services. To express it mathematically: suppose each team causes a rollback every 1000 years, but you have infinity teams. Your monolith rollback rate is now 100%.

                      1. 2

                        But why does a package rollback have to mean a system rollback? Imagine you have organized your system into n packages, each of which (or their subsets) are somehow versioned independently. Then you have a topmost layer where you bind these together, specifying which versions come from where and builds the monolith. If one of the packages turn out to have a bug, you just revert that package by configuring the topmost layer to pull its previous version and rebuild -> deploy.

                        1. 1

                          That’s a rollback with more steps. You’re waiting on a full rebuild, and by reverting one package and not the others you’d be pushing an untested configuration into production. People roll back to the last known-good build for a reason, whack-a-mole in prod is not a fun game.

                          If you have perfectly clean interfaces exactly as a discrete service would have, it could work, but you’d still be waiting on a rebuild, instead of rolling back the 1-10% of canary instances immediately. And if you have enough packages getting individually rolled back you’d be constantly rebuilding and redeploying one binary to a huge fleet.

                          @rslabbert’s point about hot code reloading could solve these issues, but in practice few people use languages with that capability. The Erlang VM is enough of an “operating system” that you could argue code reloading modules isn’t that different than redeploying containers on k8s, and message passing isn’t that different than connecting services with RPCs. In other words, the hot code reloading solution also demonstrates the desirable properties of having discrete services.

                          1. 2

                            by reverting one package and not the others you’d be pushing an untested configuration into production … whack-a-mole in prod is not a fun game.

                            How is that different than rolling back one microservice and not the others?

                            I think the rest of your argument is primarily about performance and I readily admit that microservices will eventually win in any performance-related direction you pull the scale to infinity.

                            1. 1

                              Like I said, it’s the same if you assume that your packages are equally as isolated as the equivalent microservices. Except the ergonomics of releases and rollbacks become more complicated for ops teams. It’s not really about performance, it’s about the ability to respond to incidents quickly. There are myriads of performance issues—mostly related to locality—that I’ve chosen to ignore for this discussion. One of the biggest advantages of monoliths is operational simplicity, take that away and they’re a lot less compelling.

                              1. 2

                                I don’t think there are fundamental reasons why the ergonomics have to be worse. If you’re fine with having a network interface between the two sides of a function call, than that means you can introduce enough dynamic dispatch that at the layer where you aggregate your “micropackages”, the rebuild only involves rebuilding the packages you’re rolling back.

                                That said, I acknowledge the status quo when it comes to the availability of tools for the ops teams. Microservices are at one end of an architectural spectrum, and there’s a natural tendency for people to meet at the ends of spectrums and then the network effects kick in.

                            2. 1

                              untested configuration

                              Yeah, but that’s the point! :) Opting in to a service architecture necessarily means losing the concept of a single testable configuration, in this sense. This is a good thing! It acts as a forcing function toward improved observability and automation, which ultimately yield more robust and reliable systems than QA-style testing regimens.

                  2. 2

                    Packages are worse than monoliths and worse than services IME. Teams pin to package versions, requiring massive coordination when rolling out changes — and you don’t understand what client code calls you, and so understanding impact on perf, IO, etc is much harder. And if pinning is banned, rolling out changes is even harder: the engineer making the change has to understand every callsite from every team and is inevitably responsible for anything that goes wrong.

                    Services make understanding the full lifecycle of your code basically trivial: you know latency of your API endpoints, you know how much IO it’s doing, you know every bottleneck in your own code. Making a small improvement is easy, and easy to roll out. Not so with packages.

                    Services are one approach to scaling teams. Monoliths are another, although people vastly underestimate how hard they are to scale to large numbers of engineers. I worked at Airbnb, which went the service route, and FB, which went the monolith route: both worked, and IMO FB worked better — but FB invested far, far, far more resources into scaling the monolith, including staffing teams to create new programming languages (Hack), VMs (HHVM), and IDEs (Nuclide). Airbnb largely used off-the-shelf components. It’s def a tradeoff.

                    For small teams it’s no comparison though: monoliths are easy, services are a pain.

                    1. 1

                      Teams pin to package versions, requiring massive coordination when rolling out changes

                      True - I guess this is both a blessing and a curse since teams can keep on using older versions if they need to - the flip side is you also risk having an unknown amount of old versions floating around at any given time. I can definitely see scenarios where this could become a problem (ie you quickly need to address a CVE)

                      the engineer making the change has to understand every callsite from every team and is inevitably responsible for anything that goes wrong.

                      Don’t microservices also suffer from this problem? (Except now you don’t have the benefit of static analysis)

                      1. 1

                        Don’t microservices also suffer from this problem? (Except now you don’t have the benefit of static analysis)

                        Not really. With a service, your team knows roughly every codepath that might hit disk, open a DB connection (or attempt to reserve a thread in the DB connection threadpool), etc — because only your own team’s code can do that, since the machines are partitioned away from clients. Basic monitoring + a staged rollout is pretty safe, and if something goes wrong, the expertise for understanding how to fix it lies on your direct team. Every caller is logged via however you’re logging your API requests, so as long as those graphs look good, you’re good.

                        With a “modular monolith,” IME a lot more human complexity happens, because you’re more likely to need to work cross-team when you make changes: any client could be hitting disk, opening connections, etc, because your code for doing that is shipped directly to them. And working cross-team is slow.

                        1. 1

                          Sorry, maybe I was unclear - in a microservices setup you still have to be aware of all the calls made in to your system so as not to accidentally introduce breaking changes to the protocol.

                          If you ship your code as a library the other teams probably have a test suite and / or static analysis tools that exercises the integration between your code and theirs, which they can run as part of an upgrade.

                          With microservices this isn’t possible, so you usually have to resort to having a CI environment that’s a poor man’s version of the prod environment. In my experience the QoS of the CI environment is way worse than its prod equivalent, which usually causes slow and brittle tests.

                          Also, intentionally breaking changes to your microservice have to be carefully coordinated between all teams that use it, and is usually precluded by a period in which a backwards compatible version has to be maintained in parallel.

                          Contrast this to the library approach where teams can opt in to new versions on demand, so there’s no need for a “compatibility layer”. Of course this has problems as well (eg what if your team’s library uses log4j?) but I would say it’s a viable alternative to microservices if your primary concern is code splitting.

                          1. 2

                            If by “opt in to new versions on demand” you mean that teams can pin to old versions of your library… That’s true! But, it’s not the unpinned “modular monolith” others were describing. Pinning carries its own issues with it:

                            • Fixing important bugs? Time to patch the universe, because no one is on the latest version, and you’ll need to upgrade all of them. Or, you’ll need to update many old versions of your own code.
                            • Fixing non-critical, but noticeable bugs? Whatever support channel you have open with clients will get pinged forever about the bug in the old version.

                            Services have many technical downsides, but they’re nice at removing human communication bottlenecks. At a small company those don’t exist, at a big one they dominate almost all other work. There are other ways to skin the cat, but they usually require at least as much effort as services — I haven’t seen a perfect solution yet.

                      2. 1

                        That depends on how the packages are structured. I’ve done the modular monolith approach using a monorepo that did not use package versioning, and this structure works much better than a regular monolith.

                        The true purpose of decomposing into modules, IMO, is not versioning but isolation. You can ensure that there is no implicit coupling between modules by running the test suite without any undeclared dependencies. Shopify has taken this even further with their static analysis tooling.

                        1. 1

                          Shopify has taken this even further with their static analysis tooling.

                          The Shopify tooling was mostly a play to keep the engineers working on it from leaving AFAICT. The monolith there has been so stalled for years that teams actively build things as services that actually would have to do in to the monolith to work right, just to get anything done at all.

                          1. 1

                            source for this?

                            1. 1

                              My 5 years of employment at Shopify :)

                    2. 1

                      In our case, while we started out with a microservice for political reasons, it actually turned out to be the right call. We started out receiving requests to our service via SS7 (the telephony network), which required linking to a commercial SS7 network stack that only has support for specific versions of Solaris. Over time, it became a requirement to support SIP (over the Internet). Since we started with a microservice (an SS7 interface and the business logic), it was easy to add the SIP interface without having to support two separate versions of the code had it been a monolith.

                      Velocity is not a concern here, since our customers are the Oligarchic Cell Phone Companies (it took us five years to get them to add some additional information in the requests they send us). Performance is a concern, since our requests are in real time (they’re part of the call flow on the telephone network).

                    3. 2

                      I keep seeing people make that argument, but I never really understand it. I can’t imagine what architectural benefit is gained by having the network boundary between the components of your system? How is an HTTP request better than a function call? In what world do you get better IDE, debugging etc. support for making an HTTP request compared to making a simple function call? How is it helpful that whenever you make an HTTP request instead of a function call there’s the possibility that that there might be network delays or version differences between the two components talking?

                      The main benefit of a service-oriented architecture (or microservices, I guess, but that’s a semantic distinction I’m not really interested in exploring right now) is that services can be deployed and updated independently of one another. Folks who make this choice are often choosing to trade social coordination problems for distributed systems problems, and a certain point, it’s the right call. People who chose this architecture often don’t make raw HTTP calls to other services—the service contract is defined through things like gRPC, Thrift, or Smithy, which allows for the automatic generation of client/server stubs. I’ve found it to be a very pleasant development experience.

                      The other benefit of a service-oriented architecture is the blast radius reduction of an outage, but that’s a bit more rare.

                      1. 1

                        … services can be deployed and updated independently of one another. Folks who make this choice are often choosing to trade social coordination problems for distributed systems problems …

                        In {my-language-of-choice}, I use a lot of packages written by people that I have never met and never talk to. They don’t ask me when they want to release new versions of their packages. Though I guess there’s an important point to make here that not many languages support having multiple versions of the same package in the same process, so if different teams want to depend on different versions of the same package, that forces them to separate processes. Depending on the technology choices, that could indeed inevitably shatter the monolith, but a shattered monolith still doesn’t have to mean “every team develops a microservice”. In my experience, people reach out to creating new services too fast, while they could just as well be maintaining a package that a few “macroservices” could pull in and update at their leisure.

                      2. 2

                        Operationally we have a lot of tools that let us look at HTTP requests, and run things on different machines.

                        You have a web app that has a 3D rendering feature. Having two processes instead of function calls lets you properly provision resources instead of doing “our web app requires GPUs because 1% of requests spin up blender”

                        Similarly while you can have instrumentation for function calls, having stuff broken up at the process level means your operations teams will have more visibility into what’s going on. By making that layer exist they could independently do things like move processes to different machines or add debugging only to certain parts.

                        It seems like it’s futzing around but if you are bought into kubernetes and the like this stuff isn’t actually as hard (security is tricky tho)

                        1. 1

                          Operationally we have a lot of tools that let us look at HTTP requests

                          How are they better than wrapping your function with a decorator (or whatever your language supports) that logs the arguments and the stack trace at that point along with the time it took to execute etc.?

                          You have a web app that has a 3D rendering feature.

                          That’s certainly a use case that justifies spinning up a new service! But I find it unlikely that those people that boast having 2500 distinct microservices in production actually have 2500 genuinely unique hardware requirements, like I can only simulate this flux capacitor on that quantum FPGA we’ve bought for $100M. If you have a few services lying around supporting narrowly defined needs of a central monolith, I wouldn’t call that a microservice architecture.

                          having stuff broken up at the process level means your operations teams will have more visibility into what’s going on… move processes to different machines or add debugging only to certain parts.

                          This is a little fuzzy, because: a) It assumes that your operations teams can understand the workings of a web of microservices, but they can’t understand the web of function calls in your monolith, or how a bunch of packages work together. Imagine if you composed your application out of “micropackages” to make an analogy, along with a topmost layer that pulls together those micropackages, what stops your operations teams (capable of understanding the microservice mesh) from isolating, modifying and deploying your micropackages? b) I can’t tell how much of the benefit is actually performance-related (like “it seems like this functionality is consuming more CPU than is available to the rest of the application”), in which case, it goes back to my original argument that microservices are about performance.

                        2. 1

                          For one, you can mock HTTP “calls” in basically any language. The difficulty (or even, possiblility) of mocking functions in other languages varies greatly.

                          1. 1

                            How is it possible that you can turn a piece of functionality into an HTTP endpoint, register it somewhere so that its consumers can find it, and have that registration point be flexible that you can redirect the consumer or the producer to a mock version of the other side, but you can’t do the same without leaving your process? In the worst case (not a real suggestion, but a lower bound) why can’t you just register your function to a global “myServices” dictionary that is a key-value store that binds names to functions (classes, objects, function pointers, etc. etc.), then whenever you want to call it, just grab your function from the dictionary and call it. I know this is less practical than just importing a module and calling its functions, but certainly not more so than turning the whole thing into an HTTP workflow.

                            1. 2

                              Everything you’re asking about in this thread is entirely possible. Here is an example of an architecture that can be deployed as independent services or as a single monolith. Same code, different wire-ups.

                              The reasons to prefer multiple services are almost all organizational rather than technical; service-oriented architectures are almost always technically worse than alternatives. But most organizations of nontrivial size are not constrained by technical bottlenecks, they’re constrained by logistical ones.

                        3. 3

                          at a certain scale becomes extremely difficult.

                          Yes, but most places aren’t at a certain scale.

                          1. 2

                            Then as I grew I would break off services into microservices…

                            Agreed. I’d add that splitting off services that are non-essential is a good way to start. That way you can figure out how you want to “do” microservices (and your team can get up to speed) without being overly concerned about downtime. You also get the resiliency benefits since the microservice can, in fact, go down without taking the entire app with it. My personal view is that a core monolith (macroservice?) that contains everything that is absolutely essential to the product with a bunch of non-essential microservices supporting it is often the sweet spot.

                            1. 1

                              Microservices as a tool for scaling organizations. When done well, the contract between services is clearer — it’s just the surface-level API + SLAs. Whereas multi-team monoliths need a TPM to coordinate work.

                              On a purely technical level, microservices are worse than monoliths in almost every way. But they’re incredible for scaling out organizations.

                            2. 5

                              I do not like the false dichotomy „microservices × monolith“. Microservices are not the only way to do modular software…

                              1. 4

                                how they can commit something to Git and within the matter of hours it’ll be in production. Is it too good to be true?

                                Are people building monoliths that take more than a few hours to deploy? What is the story behind that? Is it mainly a megacorp enterprise thing about having tens of millions of lines of code and many different instances running in several global locations? In the smaller companies I have worked at monolith deployment was usually under an hour once development and testing were concluded.

                                1. 3

                                  In my experience, the issue is usually distributed state and the absence of constraints. In the synchronous SoA world, the data flow often has no clear direction and the result is an explosion of complexity that requires more and more complex tools to manage it. In my current job we have a single sophisticated service for state management (ES/CQRS due to regulatory requirements) that does just that. And every other (of about 20) services is either preparing data to be sent to it or reacting to state changes happening in it. For every pair of services that interact with each other, it’s always clear which direction the data is going and there are no loops. For me it’s the first time I actually have a good feeling about a distributed architecture and the reason is simplicity through tough constraints.

                                  1. 2

                                    Lord I love articles talking about how microservices actually fare. When you take function calls and make them network calls, you suddenly have to add latency requirements, alerting, monitoring, idempotency requirements…

                                    It’s often not worth it, and you slow down delivery on whatever your company is actually supposed to be delivering to customers.

                                    1. 1

                                      At heart, true microservices are about decoupling, specifically the decoupling of external business benefits and your code/architecture. A lot of shops can’t do this. They have either a monolith too confused to split up, a series of business groups that are a mess, the lack of an appropriate tech stack, and so on.

                                      The saddest thing about the current debacle with the buzzword “microservices” is how many folks can’t or won’t be able to use this strategy, and yet they try anyway, consuming lots of expensive tools and dev hours in a chase for nothing. It just makes a bad situation worse.

                                      The true alternative to microservices is a robust and rock-solid component system. People talk about these things but never seem to be able to grow one. Instead, we’ve created a lot of dev environments that trick us into thinking we’re creating component systems when in reality we’re just making messes. The reason so many shops are trying to go microservices, ie trying to deploy chunks of useful business stuff into the OS, is that we’ve all done so poorly over the years of actually keeping those boundaries between components in place.

                                      In this context, the total decoupling of solution from problem based on business features that we call microservices, the word “monolith” doesn’t make much sense. Whether you compile and distribute in one big compliable unit or a hundred small executables should be a deployment/system question, not a coding one. If you’re seeing a difference between monoliths and microservices, I’d argue you’re not understanding either one.