1. 19

(This started out as a comment on https://lobste.rs/s/vwhaig/configuration_files_suck .)

I’ve been thinking for the past week about @tedu’s “Features are Faults, Redux”, and it occurs to me that perhaps not all features are faults. Features related to runtime configuration are particularly prone to being faults, because they are a sign of a seam between zones of ownership. When I bring up Vim today it runs code straddling at least three distinct zones of ownership: the code in Vim itself, the system-wide vimrc bundled with my OS, and my own personal .vimrc. Vim’s configuration language mediates these transitions.

Runtime configuration is a symptom that the person doing the configuration is different from the person building the software. All of @tedu’s examples arose because the connection between the two grew gradually baroque over time (“could you just give me this one teensy little feature so I don’t have to look at your yucky code?”), eventually metastasizing into security issues like Shellshock.

This line of reasoning gives me a desire for a system shipped as a single zone of ownership. Ship with just two languages, one compiled and one interpreted. Users who wanted to change how a program behaves would have to edit its sources. The sources would be close at hand, though, shipped with each system. A single command would recompile the entire system. Programs would be organized to put configuration data in one place that’s easy to find.

OpenBSD is already pretty close to this ideal. You can tell from @tedu’s posts that OpenBSD maintainers are often treating it as a single zone of ownership and making pervasive changes. But what if we go further, just as a thought experiment? Rip out /etc. Stop parsing configuration entirely in the base system. Assume that it’s intended for a desktop or server used by a single user who has some programming knowledge. Get rid of make. Entirely.

I’ve been poking at httpd a bit lately, and I notice that you can turn on TLS on a per-server basis. But the most common use case is a single user on a system who wants to run a handful of sites, all with TLS. The extra flexibility of per-server TLS configuration is obsolete, a holdover from the days when people shared common servers.

We still need some runtime configuration, say for text editors. I want to use tabs in my Go projects and spaces for everything else. And people may still end up building in runtime configuration on top of the base system, either for themselves or for non-programmers. But as a community we need to be querying all cases of runtime configuration much more antagonistically. “System-wide runtime configuration” should be an oxymoron.

(Earlier version of this comment from last week.)

    1. 8

      On the “just two languages” front, you may want to investigate the Inferno operating system : ANSI C implements the low-level modules(source is shipped with an install, uses a modified version of the Plan 9 compiler suite for regeneration at any time), and everything else uses the Limbo language through the memory-safe Dis VM.

      This paper is a good high-level historical reference, while this post shows using the module-based shell(without Limbo) for both program development and configuration at the base user level.

    2. 7

      You forgot vim zone four: modelines within the file being edited. :)

    3. 6

      But the most common use case is a single user on a system who wants to run a handful of sites, all with TLS

      Sure, you want every site to have TLS, until you don’t. You still need non-TLS server blocks to redirect people to the https site, and to answer Let’s Encrypt challenges for acme-client.

      Less commonly, I have a non-TLS server block in my httpd.conf for a .onion site, although granted that’s a niche use-case it’s still a valid one, and there are possibly other equally valid use-cases for having a mixture of TLS and non-TLS servers that having it as a configuration option makes sense.

      1. 2

        Acme includes dns and tls-sni challenges, http isn’t required now.

        1. 1

          OpenBSD’s acme-client(1) only implements the http challenge.

    4. [Comment removed by author]

    5. 4

      I had an instinctual negative reaction to this, and I’ve had to think about why. I think part of it comes from $WORK, where we have a service (for one of the Monopolistic Phone Companies) where due to service level agreements, we have redundant servers in redundant geographical locations. So at the very least, you are talking about at least four different IP addresses (more actually) required to run the program.

      Second, having a development system on a production server would probably not pass security—anything to make it harder to further exploit a break-in. Even including unnecessary code (such as in a shared library that includes all functions whether used or not) makes me uneasy.

      Third, having to muck with the source code for configuration complicates version control. I view config files as the variants to an invariant program, like an IP address.

      1. 1

        Editing the source code of the program really shouldn’t be harder than editing a config file. In my workflow, either change requires building a self-contained image, testing it on canaries, and then deploying that image.

        1. 2

          At $WORK, we have a repository for source code and another one for configuration. Configuration can change (IP address, hostname, location of other services) without having to update the already tested code, and the code can change (bug fixes, new features, etc) without having the configuration change (assuming the code changes don’t require an update to the config, which at $WORK, is most often actually).

          With your method, does that mean your canary boxes are identical to production? How is that possible?

          1. 1

            Canary boxes are practically identical to production because the same requests going to production boxes are mirrored to canary. Responses from the canary can be thrown away in most cases.

            That said, they don’t have to be identical for my argument to work: Config and code changes can both cause failures so they should be tested the same way.

    6. [Comment removed by author]

      1. 5

        The zones of ownership here are Reverse proxy server (Apache/Nginx/… -> Bash -> User CGI scripts using Bash. Bash was not designed to be run from inside a webserver. Reverse proxies when implementing CGI didn’t really concern themselves with what language CGI scripts were written in. That’s two disconnects.

        It’s a valid criticism of my proposal to point out that Shellshock happened precisely because we were using an interpreted language rather than a simple config file :)

        1. 5

          I’m not sure I really see the connection either. I agree about the zones of ownership in Vim: Vim’s code, the system vimrc, and the user vimrc. There are three parties involved: author/upstream, distro/packager, and end user.

          But can you give an example of a security vulnerability that results from that? I’m sure there are some, but they don’t seem like ShellShock.

          ShellShock seems different to me. It’s a bug in bash where it interprets data as code. Mixing up code and data is always bad from a security point of view. That’s why people want to find buffer overflows – to convince you to treat their data as code.

          ShellShock is almost an “intentional” buffer overflow-like confusion in bash. There was a format you could export functions in with “export -f”. But there was no corresponding “import -f”. It would treat EVERY SINGLE environment variable as a source of code! That’s insane.

          Yes, bash wasn’t intended to operate with Nginx and data over the network. But you can exploit this bug locally too. If you have alice and bob on a single machine, alice just has to convince bob to USE any data provided by her as an environment variable.

          It seems more related to bash than anything about config files or zones of ownership. I agree that there are security concerns with config files, but they don’t seem related to ShellShock.

          EDIT: The more common security issue is evaluating an untrusted config file. For example, people often use (abuse) Python as a config language, because it has nice data literals. But you can’t do that in a secure way, because the user can always find a way to os.system(“rm -rf /”) from the config file.

          But that’s different than what’s going on with ShellShock, because bash is NOT being used as a config language in that setting. In one setting, it’s being used as programming language to generate a web page, etc.

          1. 2

            Yes, I wasn’t thinking clearly about the distinction. Shellshock isn’t related to config files.

            I still think it’s related to zones of ownership, though I don’t expect to persuade anybody of a counterfactual. Shellshock happened because three different tools were used pervasively together without their designs actually being aware of each other at all. Shellshock wouldn’t have happened if the three had been co-designed by a single person. If that seems like cheating I’ll fix some of the moving parts: It also wouldn’t have happened if Chet Ramey happened to have designed the mechanism by which Bash was used to serve webserver requests. The issue would have been spotted long before it became such a big issue. Nobody would have heard of it.

            1. 3

              I think you’re trying to express an idea similar this:

              http://static.usenix.org/events/woot08/tech/full_papers/drewry/drewry_html/

              Regular expression engine use is a strong example of the lack of survivability engineering practices in modern Internet-connected software. Legacy software libraries are used without regard for their original context or the risks that are introduced through the high level of connectivity. The disregard for context is only magnified by the lack of any risk mitigation techniques. This behavior undermines the survivability of the entire software system, and in many cases, the entire computing platform.

              With the addition of new security contexts, new and dangerous attack vectors will continue to be introduced.

              Basically regex engines used to assume that they were all part of a single “zone of ownership”, but then they started being exposed to the Internet and untrusted data. It was kind of unclear whether they are code or data. In a compiled C program, they’re code. To grep/sed/awk, they’re data from “argv”.

              I would call this “context switch” idea an example of “confused software architecture”. People are just picking up huge pieces of code and putting them in new places without that much thought.

              But even so, I STILL think that ShellShock is an especially egregious bug. I don’t think it even has this “excuse”.

              Unix has always been a multi-user system. It is the moral equivalent of intentionally inserting a buffer overflow into your code.

              1. 3

                Yes, I agree with both these points.

                I think the root of the disagreement here might be a failure of articulation on my part: when I say “Shellshock wouldn’t have happened”, I don’t intend to mean that the vulnerability couldn’t possibly have happened. I’m claiming something weaker: that it would have led to a CVE (or a dozen) long ago, much sooner after it was brought into being, when the net was much smaller. By now it would be just scar tissue. It wouldn’t have merited a marketing name and a special domain. Is there a name for super-bugs like Heartbleed and Shellshock? I’m claiming it would be ‘just’ a bug, not a super-bug. There’s still room for other complementary practices to prevent such bugs altogether.

        2. [Comment removed by author]

          1. 2

            What alternative would you recommend?

            1. [Comment removed by author]