1. 99
  1.  

  2. 49

    As the saying goes:

    The first rule of distributing systems is don’t, if you can avoid it.

    What’s been amazing to me has been the amount of time and effort spent on this technique outside of the small fraction of places it makes sense, all while it is trivially obvious for anybody who stops and thinks for teo minutes what the problems are gonna be.

    The conference-industrial complex sold snake oil and everybody just lapped it up. Such is the quality of “engineering” in software.

    1. 29

      And worse, the snake oil often claims to cure a disease most people don’t have in the first place: the need to scale up dramatically on extremely short notice.

      1. 33

        I read a joke somewhere (I cannot remember where): “kubernetes. An ancient Greek word for more containers than customers.”

        1. 4

          I believe @srbaker coined that phrase.

          1. 3

            Corey from Screaming in the Cloud (https://www.lastweekinaws.com/podcast/screaming-in-the-cloud/) has a variant of this (if I’m remembering well):

            Kubernetes, the Greek god of spending money in the cloud

            1. 2

              Boss wants to market horizontally scaling when vertical would be 10x cheaper :)

        2. 31

          Disaster #2: development environments

          God, yes. This is such a bugbear for me. It flatly shouldn’t be necessary to run Docker, or Kubes, or a VM, or whatever else, in order to orchestrate a flotilla of companion services, in order to develop software. It should be possible for me to deliver value by working with only a single repo, a single codebase, a single built artifact, in isolation from all other runtime requirements. Every service should have a ‘dev mode’ in which it can be run, and exercise all of its business logic, without needing to speak with anything else. Integration testing is still critically important! But it shouldn’t be part of my edit/compile/test loop.

          There are exceptions — if your application is tightly coupled to a DB, for example. But if shipping 1 feature means touching/operating 10 things locally, it’s a sure sign you’ve over-microed your services.

          1. 6

            This is why I’m really in favor of modeling services, if you’ve already committed to them, in terms of the verbs your system affords and not its nouns. If you design services around the resource type they manage, you’ve ceded a huge amount of the control you might otherwise have over their dependency relationships.

            1. 11

              This is why I’m really in favor of modeling services, if you’ve already committed to them, in terms of the verbs your system affords and not its nouns.

              I’m interested in this idea, but I’m not sure I follow - would you mind giving an example?

              1. 2

                Sure, and in retrospect I’m realizing it sounds like I was saying not to model entities (the “nouns” in this case, like Users, or Products, or what have you) at all, which definitely wasn’t my intent. I mean it more in the sense of the methodology by which you design a system with services – beginning a design in terms of the request patterns (the “verbs,” like a checkout flow, for example) makes clearer which entities are necessary for each request pattern, and then you can model those as needed, but armed in advance with knowledge of their access patterns. So in the checkout flow example, you’re obviously going to find in the process of mapping out the logic that users and products are entities you need to operate with. This sounds really trivial written out, and I suppose it is, but if so it’s surprising how often I’ve seen people design systems by beginning with the entities they think they’ll need, and then designing the request flow afterwards in terms of services that represent those entities. In my experience, this makes it hard to maintain low coupling and high cohesion, since it frequently leads to pretty arbitrary dependency graphs between all the services you already have. I suppose this doesn’t automatically get you to the “single codebase” ideal, but having a judiciously-managed dependency graph certainly gets you a lot closer to it.

              2. 2

                Like vosper I’m also curious what you mean, precisely… I have an idea, but I’d like to hear your take :)

                1. 1

                  Yep! I replied to vosper in the sibling thread.

            2. 11

              Agree with most of these, with the exception of the BFF pattern (or API gateway). In my experience, they’ve proven to be quite useful, and greatly reduce your exposed “attack surface”. You can even mock out new endpoints in them with fake payloads so your FE team doesn’t need to wait on the backend team (as well exercise weird edge-cases on the client-side).

              With that said, you do have to build them so they gracefully handle service failures, and operate statelessly beyond some caching if necessary.

              A service-oriented architecture (be they micro/nini/macro services) really doesn’t make sense for your technical infrastructure unless you have multiple different product teams that need to share the same backend functionality. Any org with less than 10-15 engineers should probably work on monoliths, then start to break it up if/when you grow. Peeling stuff out is painful and annoying, but it’s still less annoying than having one developer per every 5-10 micro services.

              1. 10

                Microservices solve an organizational problem, not a technical problem. After all if your service is scalable, you can just have many copies of a monolith (and if it isn’t, splitting it into microservices won’t magically help). The problem they’re intended to solve is that in large projects in large organizations, one team tends to hold another team back, maybe by months or years, waiting for everyone to be ready for a synchronous product release.

                If you were a dictator with perfect information, you could reduce it to an optimization problem: what’s the cost of introducing a new service (including ongoing future cost and the overhead of maintaining a new communication interface), compared to the cost of adding new functionality to an existing service. Unfortunately dictators with perfect information don’t exist and a lot of organizations make poor decisions.

                1. 6

                  Microservices solve an organizational problem, not a technical problem. After all if your service is scalable, you can just have many copies of a monolith

                  I largely agree with this, but one thing I’ll note about Khan Academy’s transition from a monolith to separate services is that we have certain services that have greatly benefitted from being separate services. Our monolith was built on Google App Engine and scaled out okay thanks to App Engine’s autoscaling. A couple of things that are technically better now that we have services:

                  1. The service responsible for providing information about our content is able to have a much larger cache in each instance, and there are only a small number of these instances.
                  2. We largely use Google Cloud Datastore as our database, and that matches the App Engine model perfectly well because there’s no notion of “too many connections” to Datastore. But we have some parts of our application that benefit from a relational DB, and PostgreSQL does care about the number of connections. At peak times, our monolith could have way too many instances for PostgreSQL. Now, that DB is accessed through a separate service which only has a few instances at peak.

                  I do agree that microservices have a lot of intertwining with how the org is set up and works, but there are legitimate technical reasons to break a system apart.

                2. 7

                  Queuing theory. Retry with exponential back-off and jitter. Is this stuff people pick up from blog posts or are they actually taught in a typical undergraduate CS program?

                  1. 4

                    Pick up from asking friends while struggling with an issue, or from reading papers and blog posts. Or at university. I think the statistics professor gave me a nudge. I’d read the TCP papers at the university library and asked about something after, I think, a lecture about the Poisson distribution, and got a really helpful monologue about skewed distribution of input. (Translated to programmerese: use backoff and make sure you don’t accidentally synchronise.)

                    There are many good papers. It’s a bit difficult to explain how to recognise the good ones. Reading the morning paper archives and the papers it links to might help.

                    1. 8

                      use backoff and make sure you don’t accidentally synchronise

                      I once took down pretty much the whole Google App Engine platform, in the middle of a Republican convention that Google/YouTube was sponsoring/covering (yes, that happened), because I didn’t think about that. Whoops.

                      (Also just terrible project management, ridiculous demands from Google/YouTube to me, the solo developer at the agency they’d outsourced to, and other reasons. But definitely the thing that technically happened is I DDOSed App Engine, and Google paid me to do it)

                    2. 1

                      Yes, I was specifically taught these concepts at my undergraduate CS program in my networking programming course. Graduated in 2019 from Edinboro University of Pennsylvania, which is not an especially well known program. But of high quality!

                    3. 7

                      The good thing is that many of the disasters I’ve talked about have good answers, and the industry has created better tools to make them solvable by organizations other than FAANG.

                      I like the article, but it’d be 10x better if he enumerated what these actually were. Especially the one about dev environments, I still haven’t found a really good solution there.

                      1. 7

                        I’d love to see a post about microservices that doesn’t have several comments criticising approaches based on / using them on the whole. As with everything that’s adopted widely, there are reasons other than those involved being blinded by hype or having poor engineering skills. In plenty of cases, architectures based on this kind of foundation are working well and on balance have been a good choice.

                        The linked post discusses how to avoid some issues that are common where, yes, there may be some blinding by hype and some engineering … missteps. Please don’t take the prevalence of such posts as evidence that the whole approach is somehow always wrong. There are plenty of articles that discuss how to avoid pitfalls with OO design in software, or functional programming, but we don’t - or shouldn’t, at least - write either off as bad in general.

                        I’m not a microservices or distributed systems advocate, and I have plenty of grey hairs that are no doubt caused by working with them, but I believe that good engineering requires being open minded, working with real evidence and thinking deeply, not dismissing entire approaches with zero reasoning or explanation provided.

                        1. 2

                          It’s a great list and I’ve tripped on most of them myself or a team near mine. I’ve wrote another blog post on how we deal with each of them with our team.

                          We have a core set of principles the teams should adhere to and they’re direct counters to the pitfalls described https://medium.com/productboard-engineering/countering-microservice-disasters-5a8f957803cb