1. 8
  1.  

  2. 12

    Once you see engineers starting to twiddle thumbs waiting for tests to run, you know the time is right to split things up.

    Before that, you should make sure your tests are actually slow. At my last job we were pushing for microservices because tests took 20 minutes to run and deployments were taking half an hour. But when we investigated, we found that

    • The CI server was running seeds.rb and creating 10 million database records that were never used. Removing that saved five minutes of testing.
    • We had a homegrown caching system which persisted between tests, and running through the object spaced manually deleting them between every test took another 5 minutes. Switching to Rails.cache removed that.
    • Our deployments spent something like 25(!) minutes building assets that should have been cached, but our Chef configuration wasn’t reusing the cache between deployments.

    In the end we got our test suite down to 7 minutes and our deployments down to under 5. That pretty much killed the need to split up our monolith. We did have some services, but they were for things wildly different, like a device management service that was distinct from our core.

    1. 3

      I agree with a lot of the points raised in the article: microservices/SOA is ultimately more about scaling your engineering team than scaling your backend. But I’d add one more useful reason for SOA, though, that’s applicable even when you’re small: simple, robust fault tolerance. If one service is failing, you can circuit-break and isolate the failure to that particular set of functionality; when everything’s deployed as a monolith, if one piece of code starts acting up it can be very difficult to prevent the damage from spreading. For example, if someone ships a bug that chews through all the IOPS available on your EC2 machines, or exhausts all available file descriptors, or writes logs faster than you can rotate them, or… etc; it’ll hose even the working code if it’s deployed as one big bundle that services any kind of request. With small, independent services running on separate machines, you can isolate these kinds of issues much more cleanly and keep most of the backend up even when something’s gone wrong with a single service. It’s obviously not perfect — what if what’s broken is something centralized like your deployment tools? — but it minimizes a lot of otherwise-scary bugs in practice.

      1. 6

        it’ll hose even the working code if it’s deployed as one big bundle that services any kind of request.

        You can still have separate machines that only handle certain types of requests/are optimised for differen workloads, with a monolithic codebase.

        I quite like building things that way, and basically build each “microservice” as a separate library, to enforce modularity. Then, link them all together. These can also be tested separately. I admit I have not tried this at massive scale, however.

        1. 2

          On top of alva’s comment, some setups will have features to both restrict resource use and detect craziness that indicates a bug. They’ll take action ranging from notifying an admin to halting the application. Instances not using that buggy functionality will be unaffected. From there, the admins might put in a temporary filter for packets calling that function that makes them fail fast before even reaching the instances’s buggy code. This is removed after the application is patched.

        2. 2

          Great write-up. The author even preempted my counter from first half of the article: that most of this advice can benefit monoliths like Dijkstra was doing in 1960’s. Meyer with Eiffel is also big on this but with Design by Contract. The author noted later that Linux kernel is good example of modular design for monoliths. Yep.

          One thing that’s missing here, maybe just out of author’s experience, is that microkernel-based systems have long provided these benefits. Matter of fact, those focusing on real-time or isolation address reissbaker’s concern about a failure in one component hosing your system. Add in the self-healing functions of QNX or Minix 3 to get a nice bargain. If deployed right, you can also do live updates on these systems of features that traditionally required taking down running OS. If real-time, those updates don’t have to degrade system performance since they can run in low-priority process. The swap happens after everything is verified and ready for hand-off.

          The thing that’s totally wrong is “you can’t really patent anything.” Patent suits are a multi-billion dollar industry. Patents on key tech can also be valuable in acquisition negotiations or even selling off a failed company. So, best advice for smaller firms is iterate toward success focusing on growth and profit but also patent some key advantages. Larger firms are incentivized to patent everything they can if in U.S.. That’s what they’re doing, too.