1. 7

  2. 4

    This is okay, but there are a couple of interesting quirks to it.

    For one, it’s missing some important operations-side stuff. Off the top of my head, I’d add “High Availability/Redundancy”, “Auditable Access Control”, “Disaster Recovery”, “Geographic Distribution”.

    A pet peeve of mine: on one hand it assumes that application state should be eliminated in all cases for scalability, on the other hand it has an entire section devoted to database management. Every nontrivial application has state at some level, and scalability has to take that into account – databases scale differently, but they still need to scale.

    Finally, it seems to concentrate on containers as the ne plus ultra of certain tech paths. If the goal of “Setting up development environment” is fast provisioning for new developers, then wouldn’t spinning up a VM meet that just as well as spinning up a docker container, assuming both can be done in minutes? For that matter, wouldn’t having a library of unallocated bare-metal servers with fast automated provisioning provide the same benefit?

    Compare “Database dumps for development environment”, which is largely technology-agnostic. As long as the system is automated and sanitizing, then it passes. Considering that containers should generally not be used to deploy databases, I’m glad for this. My recommendation is to take that results-oriented definition style and use it as a template to go back and rework the other sections.

    1. 4

      I think it works best if you take it as a list of areas that a company can be competent-or-not in, rather than a list of areas that every company should try to be competent in. Even when a concern such as database management doesn’t apply, the very least a company should try to do is be aware that they are avoiding this concern and that they might have a need for it in the future.

      1. 2

        So, you point out a thing here which is exactly sort of the thing that makes submissions like this worth discussing. You’d suggested things like:

        I’d add “High Availability/Redundancy”, “Auditable Access Control”, “Disaster Recovery”, “Geographic Distribution”.

        Now, I agree with those–if you’re a business at scale, and it solves problems you have.

        For example, if I’m a small ecommerce company doing custom manufacturing for, say, local clients, then Disaster Recovery (yay tapes and stanby hardware) is still useful to me. Geographic dstribution, though, is not.

        A lot of times I fee like these matrices are targeting companies that have followed a particular (GOOG wannabe) trajectory, and they tend to condemn perfectly healthy shops that don’t happen to fit that mold.

        1. 1

          Sure, it was a top-of-head list. Upon reflection, geographic distribution should probably be part of high availability/redundancy. Indeed, following my own advice of being results-oriented, it shouldn’t be called out as a separate thing at all. And “High Availability” is a consideration for anyone with a web presence, even if it’s just putting up a static page when the site’s unavailable – even if your customers aren’t awake, search engine crawlers are.

          Likewise, “Access Control” should still apply. Once you have more than a couple employees, you don’t want all of them to have root access to the production systems. And if you do only have a couple employees, it’s expected that you’ll be on the lower levels of most of these categories.

          1. 1

            if you’re a business at scale

            You have to be a large business to depend on or benefit from a reliable service that’s up when users need it? I doubt it. There’s a lot of potential to differentiate on quality as quite a few small to mid-sized players do. Plus, you never know when the big moment of use for your customers will be. A cheap, HA solution is always good default for mission critical apps.

            I’ll add this might be true to a degree for “problems you have.” I’ve seen quite a few post mortems of startups about how much business they lost when they got a surge of customers their systems couldn’t handle or recover from when crashed. This may be one of those problems you want to make sure you never have. It does depend on the company, though, as there’s a ton of stuff where customers will tolerate downtime.

            EDIT to add: geographic distribution might help you if your local HA solution failed due to electric company or local networking issue. Let’s say they couldn’t access it on the web over local lines but could over a mobile connection that used totally-different connection to your remote site that also used different connection. The remote site may still be up and local users over different network may also still be up. This is why the other box using totally different networks & utility company is a good default vs shared. It can sometimes even cost less than HA in different, local building if the area just has cheap access to the backbone with lots of competition among providers. For instance, my part of the Mid-South is going to have terrible prices for Internet but Chattanooga TN has 10Gbit for $300/mo.

        2. 2

          I disagree with a lot of these sorts of things (mostly because they make too many assumptions about best-practices and business motivations), but this is a decent one. The linked Google Spreadsheet is a really good strawman to think on.