1. 16
  1.  

  2. 8

    This is a very cool presentation (@kir0ul has thoughtfully provided a link to the original blog post and slides for those of us who aren’t bothered with video unless there’s really no alternative :P) and a lot of the literature it quotes is top-notch.

    There is – and this is in no way diminishing the usefulness or quality of this talk! – a caveat with many of these papers, though, and it’s worth keeping it in mind: they are academic papers, in narrow fields, which therefore (rightfully!) focus on narrow aspects, from which extrapolation is risky. The line between heeding the lessons of a well-researched paper and cargo culting is extraordinarily narrow. And, unfortunately, it doesn’t help that in the name of PowerPoint slide brevity, the slides occasionally draw slightly more general conclusion than a paper would warrant.

    Take, for example, this one. Its findings are as follows:

    Only 6.1%-16.7% of parameters set by majority of users

    54.1% of parameters are rarely set by any user

    Only 1.8%-7.8% of parameters are configured by more than 90% of users

    Up to 48.5% of config issues are about difficulty finding or setting parameters

    Up to 53.3% of config errors are due to users’ staying with default values incorrectly

    Searching user manuals by keywords is not efficient to help users identify parameters

    from which the author concludes that “developers create a lot more configuration options than people use”.

    Which is technically true. There are a bunch of caveats about these figures related to how data was obtained (which the paper acknowledges), but there’s also an extra factor to consider, which is inherent to the sample and methodology, and is sort of acknowledged but not discussed in too much detail (because academic papers are that way, I’m not implying the author is somehow dishonest).

    The paper uses a sample of four tools: Apache, MySQL, an unnamed Storage-A, and Hadoop. With the exception of Hadoop, all of these had been in development for more than twenty years at the time the paper was written, and they’re all infrastructure tools. Any twenty year-old infrastructure tool is bound to have hundreds of configuration options that are no longer relevant for an infrastructure deployed today, but are critical for infrastructures that were deployed 15 years ago and has been in continuous operation ever since.

    Lots of default settings that nobody bothers to touch anymore are things that are now de-facto standards but haven’t always been, and are relevant for interacting with legacy systems (e.g. file name encodings), or standard practices that are nonetheless not universal.

    Extrapolating these findings to non-infrastructure, user-facing software is a lot more complicated, which the authors explicitly acknowledge. But even extrapolating the right lessons for infrastructure software is tough, and the authors of the paper are, again, very careful to point that out exhaustively. While this data seems to suggest that you could just yank out half of the config knobs and everyone would still be happy, it also seems to suggest that successful software, especially infrastructure software, values backwards compatibility and is willing to trade development complexity for not breaking existing, successful deployments. There’s a trove of modern, simple, fast, opinionated web servers/database systems/distributed file systems out there that almost nobody uses, precisely because opinion about what’s relevant is a very poor alternative to real-life deployments when it comes to driving development agenda, and a development agenda that sacrifices the interests of existing users for a hypothetical new user (or, worse, technical purity) tends to drive people away easily.

    1. 7

      Original blog post and slides.

      1. 1

        Thanks, highly recommend the slides.