1. 20

I got myself taken seriously at work the other day, so I’m writing a draft of… something (in normal times it would have been a handful of lectures or workshop sessions; in these pandemic times we’ll have to see) that I want to use to introduce developers at work to the practice of designing and running distributed systems.

In short, work has decided that they want to take the monolith they’ve run for years and cut it into services. My opinion is that we’ve never asked the majority of developers working here to do this, and they’ll have a bad time if they have to figure out how to design and run distributed systems from first principles. I’m putting together some kind of primer they can use to (a) find out what they don’t know, (b) find out where to learn more, and (c) contains some of the lessons we’ve learned the hard way. I want to focus more on the practice of doing these things than the theory; understanding Paxos is very nice, but usually doesn’t help us not get paged at night.

I’d like to ask what resources people here know of and can recommend for this? I have a pile of bookmarks, but it’s probable I don’t know of some helpful texts.

I’d also like to know if anyone has tried a hand-on approach to these kinds of lessons, and how it went? I had an idea of having people build a small test system that I’d break in various ways (overload it with many queries, or heavy queries, or introduce network splits, or …) but I’m not sure if they’d just get lost in the system setup and we’d never get to more interesting lessons.

  1. 8

    I collect some links here.

    This 2020 MIT distributed systems course seems nice.

    1. 8

      Awesome book about distributed systems by Martin Kleppmann - http://dataintensive.net/

      Awesome Distributed Systems

      Testing of distributed systems - collection of links to interesting resources about distributed systems testing.

      An introduction to distributed systems is actually an outline of a paid course by Aphyr, but it can be useful too.

      1. 3

        +1 for Kleppmann’s Designing Data-Intensive Applications book. It’s really the best thing I’ve read on distributed systems: it has a good breadth of knowledge, and has enough depth that you can start searching out more detailed information for topics you’re interested in learning more about

        1. 2

          another big +1 for Kleppmann’s. One of the best books in my library. I keep getting back to it all the time.

      2. 4

        Beyond what everyone else said (I’m a big fan of the “Designing Data Intensive Applications”!), I think that my employer’s “Builder’s Library” is neat. One of my favorite talks is also given by a colleague (for some definition of colleague…) on designing reliable control planes—the talk is summarized by the speaker on a thread on Twitter, with my favorite patterns being:

        • asynchronous coupling between systems,
        • avoiding (cold) caches,
        • and a strong bias to constant work/reducing modalities as much as possible.

        I’ll also recommend using some sort of RPC system—not for performance per-se, but to automate code generation in whatever language a system might be using. There’s a bunch of other nice properties that a fully-featured RPC system gives you (intelligent load balancing, service discovery, retries, throttling…), but I strongly feel that a good RPC system is one of those things where spending a few days now saves you months or years of work down the line. I’d also avoid RESTful APIs as much as possible—I’ve found that they tend to slow you down and there’s little benefit to writing them by hand or via something like OpenAPI.

        1. 1

          Thanks for the builders library and the YT vids. And I’m very interested in hearing more about your thoughts around RPC over REST. OTOMH i’d draw a line in the number of api consumers, as

          • 1 consumer? RPC
          • few consumers of similar types? REST
          • tons of diff consumers? GQL

          but that seems a bit too simple. How do you sell it to the others?

        2. 3

          I was going to recommend the course @nikivi already did, but since you mentioned about being more practicably than theoretical I think my answer would be helpful.

          In regards of distributed systems to give people a bit of vocabulary, a good idea is to start with the following books

          • Martin Kleppmann Designing Data-Intensive Applications.

          • Burns, B. (2015). Designing Distributed Systems. O’Reilly.

          You can find both on safari online, before jumping into the book itself, take a look at the outline to understand how they structure the path ahead.

          Also, Sam Newman’s latest book Monolith to Microservices can be a good source of ideas as well, as you’d find a good overlap between DS and the micro-services hype.

          I’ve been thinking myself how to rump up our internal teams at work, it could be a good opportunity to collaborate.

          1. 1

            I am not sure where the first one is, but the latter book’s actual link is here, https://www.oreilly.com/library/view/designing-distributed-systems/9781491983638/

            1. 1

              My bad, it auto completed with my references and it was wrong.

            2. 1

              I Loved all of those. Burns distributed systems is full of fantastic tricks with containers. And from the perspective of an architect that deals with real world enterprises, Newman’s is absolutely invaluable. I think this last one OP will benefit you greatly. It neatly delineates migration strategies and political/organizational complexities, that as developers we too often miss.

              1. 2

                One that nobody has mentioned yet, that I found also very interesting is this MEAP on software telemetry. It gives a different perspective and helps putting order in the chaos of monitoring and observing your apps. Lastly, Release-it by Nygard is another great one to have around.

                1. 2

                  I think that https://cloud.google.com/blog/products/devops-sre/join-sre-classroom-nalsd-workshops could be of some use to you, it is not focused around a single programming language or approach (and it does focus on the design of the distributed system) but IMHO does a good job in evaluating different options while using a practical~ish system. I had the chance of going over the workshop (or a very similar one) in the Velocity Conference last year and I really enjoyed it.

                  They’ve made available all the materials, slides, guides, notes, etc. (https://landing.google.com/sre/resources/practicesandprocesses/sre-classroom/) which should also help you to reuse whatever (if any) part you find useful.

                  1. 2

                    I attended one of these workshops last year during velocity and it’s a fun and refreshing experience. If you have safaribooks the presentation is here https://learning.oreilly.com/videos/oreilly-velocity-conference/9781492050742/9781492050742-video328447