1. 70
  1.  

  2. 14

    Another random one that I once doubted, but now do not doubt, is the link between the standard 80-column line length/terminal/etc. and the size of circa-late-1880s US paper currency.

    (the 1890 US census was the first to use punched-card tabulating machines, developed by Herman Hollerith; the cards were stored in surplus currency boxes supplied by the Treasury, which capped the size of the cards; Hollerith’s company later merged with others to form IBM; IBM improved the density of the punched cards – still the same size – to 80 columns; similar to the changes in the linked-list question, today whether to keep or abandon 80-character line lengths in code is usually debated using reasons which have nothing to do with the size of IBM punch cards or the historical cause of their size)

    1. 7

      At first blush this seems like evidence of unthinking herd behavior on the part of 80-column purists, but I don’t it is. Rather, I think the original punch card 80 columns, the 80 column standard in code, and the number of characters per line in most books (45-75) all reflect the range our visual system can comfortably work with.

      Of course there’s nothing magical about the exact number 80 – anywhere from 65 to 90 would probably work fine – but standardization has benefits, this one is already there, and so there’s some logic to keeping it.

    2. 8

      Great article. Another related point is just how long the shadow of haphazardly made decisions can be. I’ve seen this myself on shorter (but still surprisingly long) timescales. That little kludge or “mess we’ll clean up later” hanging around for 5 years, infecting other parts of the system.

      1. 5

        I don’t think these examples are haphazard though; it all made perfect sense at the time, and it’s hard to predict what will happen in the future (and if your prediction is wrong you’re likely to end up with a much bigger mess).

        And you often don’t really know what the longevity of a system will be; Bill Joy was just a grad student programming for the fun of it when he wrote vi and csh (and, in his own words when explaining some of csh’s oddness, “I didn’t really know what I was doing at the time”). I don’t think he would have expected we’d still be looking back to his ADM-3A all these decades later.

        1. 3

          Completely agree. Haphazard didn’t necessarily imply a value judgement. It might, but it might also mean “unthinking because at the time we didn’t think it would be important” (and that judgment was sound at the time), or it might mean “we knew it wasn’t great but decided we’d clean it up later”, or “we thought it was good and only later realized our mistake”, or any number of variations on that theme. The point is early decisions in successful software have a way of echoing through the years in unpredictable ways, whether you want them to or not.

          Even knowing this won’t save you from it – at least not always.

          1. 11

            Whenever somebody at work tries to introduce technical debt at the initial design stage (with “it doesn’t really matter” or “we’ll change it later”) I tell them that while implementation bugs are often fixed, protocol bugs typically get stratified as soon as anything at all uses the protocol. And then I try to remind them of the thirty or so actual examples, in our company, over the past ten years, that happen to come to mind off the top of my head, of somebody writing an internal service in a short-sighted way to save fifteen minutes of design time because “it’s only a demo” and it costing us millions of dollars in engineer hours to replace or circumvent years later.

            The major problem with thinking through designs is not that well-thought-out designs are harder to implement (they tend to be simpler, rather than more complex) but that anybody who hasn’t thought much about design concerns will have a hard time understanding the importance of particular elements, so explaining to junior developers or non-technical management (or “technical management” who haven’t written a line of code in 10 years) why some vaguely-similar third-party library or some shallow hack is a bad idea takes a lot of time and effort.

            Thinking through designs is a specialized skill, and people who have learned this skill can (when properly motivated) think through designs pretty thoroughly pretty quickly & come up with high quality solutions that are straightforward to implement. The thing is, low quality solutions are traps: they hide complexity behind a veneer of simplicity, distract you with tangential concerns, and if you get caught by their hypnotic song then by the time you recognize their real cost, you have already begun to pay it.

            1. 2

              but that anybody who hasn’t thought much about design concerns will have a hard time understanding the importance of particular elements, so explaining to junior developers or non-technical management (or “technical management” who haven’t written a line of code in 10 years) why some vaguely-similar third-party library or some shallow hack is a bad idea takes a lot of time and effort.

              This is so spot on. It’s like trying to explain the importance of washing your hands to someone unfamiliar with germ theory.

              1. 5

                I should also note that some folks are really good about carefully designing their own projects and then will confidently give terrible design advice (or confidently impose the outlines of a terrible design by fiat) on other people’s projects because they don’t understand the circumstances well but don’t realize it – so this isn’t just a matter of “good designers” and “bad designers” but people who have versus have not applied the skill of design. (One of the common ways not to apply the skill of design is not to have it, but there are other failure modes too.)

              2. 1

                Are there any good books to read, on the development of the skill of thinking through designs, especially protocol design?

                1. 3

                  Not that I know of, and I’m not sure that a book can do much to teach it either. Really, what’s necessary is experience – spend five or ten years making sure you really try to consider all the possible upsides, downsides, misuses, potential performance issues, and security and scalability concerns of everything you write before you write it (maybe setting aside an hour a day to write it all out exhaustively and a few more hours to argue about it with more experienced colleagues) and then revisiting those predictions once you’ve begun implementation and again after testing and launch.

                  Specifically with protocol design, you can learn some theory from books. Amateurs will often design protocols that are inconsistent, too rich, or not rich enough, or use the wrong existing tool for the job, because they don’t know about things like weird machines and they don’t know how lexers and parsers work. But that’s not really what I mean by protocol design per-se, because most of the time what’s actually happening is that somebody is exposing functionality through some existing tech (like http + json) and they’ve factored it like it’s their first draft of an internal function but actually network latency will make it totally unusable and poor factoring will make it a pain in the ass to develop on top of, but once something is built on top of it the design is locked in and nobody is willing to approve project time to make both layers work properly.

                  1. 6

                    Four step ‘intuition building’ tactic I have found very useful in the past:

                    1. Reverse engineer projects in the same space from low level (execution traces) to high (source), Thereafter compare against their own spec (open projects only and all that for legal reasons). Make note of complex states, transitions and inaccuracies.

                    2. Setup measuring and anomaly collection (i.e. monitoring), make sure they can accurately plot steady state, warmup and heavy load.

                    3. Write fuzzing harnesses that target the previously noted areas. With a protocol, multiple simultaneous connections and delay spikes after warmup states (login, …) are especially glossed over.

                    4. Foreach finding from 3: reference against source implementation, study until you understand why the thing happened, compare to initial spec and implementation and try to explain why someone let it happen.

                    1. 2

                      This is a great list!

                      I have never tried to intentionally follow it, but working with large inherited codebases at work has meant that I’ve basically had to do this as a core part of debugging. (“Under circumstance X, why does 6-million-lines-of-java-code-written-10-years-ago-by-contractors Y lock up/run out of ram/rapidly fill the disk with empty temp files/slowly leak open sockets/go from O(log(n)) to n^5 time complexity/whatever”)

                      I’m sure if I put my mind to analyzing code in this way, I’d learn a lot more a lot quicker :). Instead, I’m always disappointed when I find that reading the code won’t get me an answer in time and I have to reverse-engineer behavior to meet deadlines.

                    2. 1

                      Thanks.

            2. 5

              That little kludge or “mess we’ll clean up later” hanging around for 5 years, infecting other parts of the system.

              Try more like 20 years. The first real product that I designed from the ground up is almost 19 years old now, and some aspects of the database schema are still intact from the beginning.

              1. 4

                Totally. I’ve seen a casual decision about splitting responsibilities across a couple of managers, made early in a project, grow into split orgs with 100’s of people in each. What was a quick decision based on the two specific individuals becomes embedded so it takes a formal reorg to change later.

              2. 6

                My theory around Java’s original Date class is that they got to a month before the launch date and forgot they still needed to write it. All the experienced devs were busy working on bytecode verification or garbage collection or some actually-difficult stuff so they had an intern write the Date class, thinking “how hard can it be?” Then by the time they realized their error it was too late, and not only can it never be removed, but its idiocy has infected other platforms.

                1. 3

                  This is a well written essay! I have seen people poke fun at months starting from zero in JavaScript a few times and always responded with “its a little more complicated than you might think.” Every decision has a history, usually the answer is “because it was a good idea at the time.”

                  1. 3

                    Every decision has a history, usually the answer is “because it was a good idea at the time.”

                    I disagree with this; I think it’s more “because the people who made the decision did an insufficient job of understanding the breadth of the impact of their decision”.

                    1. 2

                      Things change, technology advances and make possible better solutions. It’s easy to point at a decision with hindsight and say it should have been made differently.

                      1. 4

                        Sure; in some cases they never could have known how long we would be dealing with the fallout of a short-sighted decision. In every case they were wrong, but in some cases they couldn’t have been expected to be right.

                        But in the case of Java specifically, they had decades of experience shouldering the burden of decisions that lingered long after they made sense. They implemented the JVM largely in C++, for pete’s sake. Every line of code in the JVM was typed on a keyboard whose design reflected hacky workarounds for constraints that had not been relevant for nearly half a century. Every person on that team had a tangible reminder of this kind of problem sitting on their desk that they spent hours every day touching.

                        Learn from history.

                        1. 3

                          I think a better example might be the relationship between the 1990s fad for “interactive TV” setups where you’d have a box that you plugged in to the television set, and Python’s GIL.

                          Python predates Java (Python in 1991, Java in 1995). And it came out of the UNIX scripting-language world where if you needed to write a program that could multitask, you did it by forking new processes. Then Java came along, and one of its original targeted use cases was constrained systems like… those “interactive TV” boxes, which generally could not do true process-based multitasking. So Java went with threading, and threading became a hugely popular model.

                          At the time, several things were true:

                          • A huge part of Python’s popularity was the relative ease with which it could wrap and interface to C. There wasn’t an ecosystem of Python libraries, so much as there was an ecosystem of C libraries for which people had written wrappers to allow access from Python.
                          • Neither the Python interpreter, nor most of that wrapper/C code, was thread-safe.
                          • Most people who would want threading would be using it for I/O-bound tasks like writing network daemons.
                          • Most people, even if they were writing multitasking code, still had only single-processor/single-core hardware on which to run it.

                          So a decision was made to open up the ability to write threaded code in Python: there would be a lock which gated access to the interpreter and its C API, and only a thread – and only one at a time – which held the lock could execute Python bytecode or use the interpreter’s C API. Thus, the Global Interpreter Lock.

                          This was a perfectly reasonable decision then, and for quite some time into the future, and allowed people to keep using Python and all those handy wrapped C libraries as before, but with the addition of threading. And since your hardware was unlikely to be able to actually execute multiple threads simultaneously, and you’d almost certainly have lots of threads idling, doing things like waiting for data from a socket, it didn’t really restrict you that much or impose too much performance penalty compared to the normal overhead of just using Python itself.

                          But fast forward a few decades. Now we all have multiprocessor/multicore hardware, even in the phones we carry around in our pockets. And Python grew from a UNIX-y scripting language to something that’s used in scientific computing with heavy CPU-bound number-crunching loads.

                          And now we have endless wailing and gnashing of teeth about the GIL, and endless projects that try (and, so far, fail) to remove it, because making the whole thing thread-safe without the GIL today would still require a huge rewrite of all the legacy non-thread-safe code people wrote and wrapped in the past.

                          (incidentally, the implementation of the GIL has evolved to try to reduce the overhead for some now-popular types of work, and number-crunching libraries like NumPy are careful to delineate – there are macros for this – which parts of their C code require access to the interpreter and thus the GIL, and which parts don’t and can release the GIL while continuing to do work that doesn’t involve the Python interpreter)

                          1. 3

                            Well, they’re kind of … two examples of opposite things, I guess? java.util.Date is a classic “come on, you could have seen that one coming” situation because examples of precedent were abundant, while the Python GIL is a good example of a “difficult to predict the impact” situation because multi-core CPUs sounded like science fiction at the time.

                            1. 3

                              A lot of people react to hearing the historical design tradeoffs behind the GIL the same way you’re reacting to the date thing, though.

                              (ask me how I know, or, on second thought, you probably don’t want to)

                              1. 2

                                Yeah, the difference between “come on, you could have seen that one coming” and “difficult to predict the impact” is how good you are at prediction (which itself is built on domain knowledge & familiarity with history).

                                Also, generally speaking, when we see discussion about these decisions from the time they were made, it turns out that often somebody DID see that one coming & was shouted down on the grounds of “being practical” and “meeting deadlines”. It’s fascinating to read TBL’s mailing list posts from the early 90s about how important it is to create a URI/URL distinction and a reliable distributed URI to URL resolution system in order to avoid domain squatting and broken links.

                          2. 2

                            Absolutely agree, it would seem in some cases people have just kept with the status quo without questioning why it exists and whether those reasons are now obsolete; I would argue that this is more often than not because in order to do so the decision maker would have to think outside of their narrow field of vision which is easier said than done.

                            1. 1

                              Case in point: http://lists.busybox.net/pipermail/busybox/2010-December/074114.html

                              The /bin vs /usr/bin split (and all the others) is an artifact of this, a 1970’s implementation detail that got carried forward for decades by bureaucrats who never question why they’re doing things.

                          3. 1

                            Yeah, a better phrasing might be “it sounded like a good idea at the time” (where “sounding like a good idea” includes both reasonably-predictable and unpredictable future problems). A lot of people think things sound good because they haven’t actually considered them, and instead are doing shallow pattern-matching on words. Some of these people are in charge of major technical decisions that directly impact millions of users.