1. 48

  2. 15

    Haha yesssss the historian club grows

    1. 7

      You would think this would be something really obvious you could just look up in The Big Book of Computer History, Chapter 3: Mid-Twentieth Century Hardware Choices. There have been lots of books about aspects of computing history, but why is there no definitive account?

      1. 9

        My guess is that a lot of it happened in conversations within companies like IBM, like the Brooks work mentioned, which isn’t easy for authors to access. A lot of is oral history.

        As an analogy, the Computer History Museum did a 5 hour interview with Guido van Rossum, and I learned a bunch of stuff about Python’s choices (on top of the fact that I worked with Guido for a few years!).


        Also learned a bunch of about the history of NumPy from a Lex Fridman interview with Travis Oliphant. That work was done 20+ years ago and is reverberating in machine learning today. (e.g. he purposely unified the APIs of 2 different libraries, one of them was “Numeric”)

        I’d say there’s the same issue with (for example) Google today. All its work on the browser (HTML5, video, etc.), Internet protocols, mobile operating systems, and Cloud/Kubernetes seems to be very influential outside of Google, and I think they will have inertia for decades. But a lot of the design choices were just random historical accidents inside Google.

        Here’s my contribution :) https://lobste.rs/s/yovh5e/they_re_rebuilding_death_star_complexity#c_mx7tff

        1. 5

          Just a note, HTML5 largely happened before Google hired Ian Hickson, so I wouldn’t call it Google’s work.

          1. 3

            Yes definitely, I meant more generally that a lot of it is locked up in private conversations that you need oral history to access (edited the comment slightly).

            e.g. WHATWG produces the HTML5 specifications, which in theory contain “all you need to know” to implement a browser. But if you want to know WHY a decision was made, the real explanation probably lies in some e-mails or meeting minutes between Mozilla, Apple, Google, Opera, etc. engineers / representatives. Back in the HTML5 days, for any given issue, I’d guess there were probably a dozen or less people that had input.

        2. 5

          Sadly, there’s hardly any account of how virtually any unit like the byte came to be. The byte is an unit akin to the foot or the pound. Most of these come to be as follows:

          1. People needed to divide a handful of things into fixed sizes that corresponded to some capability or physical characteristic (how many bits a unit processes, how long someone’s foot is) because it’s just the natural thing to do, you do need to measure things somehow, if only in order to make sure all the computer parts fit together/the fur you bought fits your bed at home.
          2. Then at some point someone set in and chose a particular value for whatever reason made sense (this many bits needed to store a character, people keep bickering about in the agora so let’s just say a foot is whatever Alcibiades says Socrates’ co foot’s length was) and that choice turned out to be particularly good or, more often than not, particularly enforceable (penalty of death or whatever – I’m not sure what mechanism was used for the byte?).
          3. And when legal and/or international convention via a third-party body that worked by law that someone else enforces, rather than penalty of death or whatever administered by decree was finally necessary, they just chose whatever value was in common use at the time, because otherwise everyone would’ve ignored them.

          I bet that’s what happened with the byte, too. “How many bits an I/O unit can process” is a pretty natural hardware design convention, so a natural thing to break down a handful of bits into. Everyone went nuts for a while but eight bits just happened to be a value that was big enough to be useful, small enough for hardware engineers not to lose their minds with timing and, let’s not forget, what IBM was using for the 360 so the most influential value – and thus the one most people were familiar with – and the one that was easiest to steal borrow designs for.

          1. 6

            well said and I agree in general, but it is worth noting that there were plenty of computers, early on, which used non-eight-bit word sizes. there’s a reason all the IETF RFCs say “octet” - they needed to be explicit about the size. there is certainly no exact chain of causality to trace, but there is real history here that does seem like it ought to be better documented before too many primary sources die.

            (edit: momentarily forgot to use singular pronouns - see pluralpride.com for an explanation)

            1. 3

              there is real history here that does seem like it ought to be better documented before too many primary sources die

              Right, absolutely. Even something as trivial as “IBM is doing eight-bit bytes so we’re doing eight-bit bytes now, too” is still history, and still relevant in many ways. But I bet there were real technical reasons at IBM, and anyone else who independently decided to settle on eight-bit bytes.

              FWIW, the reasons why I suspect this was essentially a matter of convention is that a) there is really nothing special about eight (or even powers of two in some contexts) as far as storage-related hardware is concerned and b) weird bit counts were present all over early hardware, in all sorts of places, just because relays, lamps, and transistors were big, unreliable (by modern standards), expensive and heated up. Especially relays and lamps.

              You can see this today, too, in all sorts of places. Half the switching/routing ASICs I’ve ever used have CAM counts that aren’t round – 1,000, 5,000, 10,000 entries, probably the kind of numbers that make sense in marketing/sales fliers, and there’s no problem there. There’s a Xilinx Artix-7 (random Google) that has 134,600 6-input LUTs, beautifully organised in 16,825 CLBs which isn’t even an even number. Similarly, eight bits is nothing special. It could’ve easily been 9. Actually, the PDP-10, my favourite Turing tarpit, allowed for multiple byte sizes at runtime, and the C implementation specifically insisted that a byte was 9 bits. It was nuts. See for yourself, section 2.11.

              I suspect the real historical process was actually even more complex than that, and involved two separate changes, which may have been witnessed in parallel, which is another reason why we don’t have a good enough account:

              1. On the one hand, hardware designers gradually abandoned runtime-variable bytes, and I’d speculate that IBM’s lessons (who got badly burned by the Stretch) played a big role here. Runtime-variable bytes went the way of the dodo like many other things at the time – one-plus-one addressing, tagged memory and so on. This further cemented the ability to use byte as a meaningful unit of measurement, and allowed “byte” to start meaning “a fixed number of bits” and be useful to end-users, rather than “an atomic unit of bits” that was only useful to hardware designers. I.e. we stopped using “byte” to mean “whatever number of bits the ALU takes in decimal mode” and “whatever number of bits the index field in a memory-indexed instruction means” without the two numbers being necessarily equal.

              2. On the other hand, enough big players settled for a specific number of bits (eight), for whatever reason. I suspect software pressure may have played a role here (ironically, via IBM again).

              #1 is kind of a prerequisite for #2, but the two processes were inevitably seen as occurring in parallel. DECSYSTEM-20, the last of DEC’s 36-bit machines with byte pointer operations was basically contemporary with the first VAX which was very much an 8-bit byte machine with barely any mainframe-era ballast. DECSYSTEM-20 was the last of a 13 year-old product line though. Machines from the two “eras” of usage of the term “byte” clearly coexisted (along with machines that used fixed-width bytes but disagreed on the size). That’s also how people first wound up in that odd boat, where “byte” meant a fixed amount of bits and was a useful unit of measurement for a computer’s memory, but nobody could agree on how many bits there were in a byte.

              That also makes “why are there eight bits in a byte” a somewhat ahistorical question – not wrong or anything, just failing to capture the entire story behind it. It’s kind of like asking “why were there seven Anglo-Saxon kingdoms in England”. There are doubtlessly very real reasons why there were seven of them, like hell knows, the Mercians were notoriously fussy with their king and Mercia could’ve easily ended up in East Mercia and West Mercia and we could’ve had eight, but someone invited the guy that could’ve made it happen to a wedding and they killed everyone and it never happened and we got stuck with seven. But the real story is more about geography and the dynamics of power and about how come there was more than one in the first place than about the specific number.

              1. 1

                wow! I somehow managed to miss this very thorough reply until now. I really appreciate it <3


                  Oh, thanks! There’s a button in my brain labeled “Start History Rave” but the label is really tiny and sometimes people press it by mistake – I’m glad it went well this time :-D.

          2. 2

            It took me about an hour of googling to find the definitive account. See my reply elsewhere about “project Stretch”.

          3. 6

            Wikipedia says that the 8 bit byte was invented by IBM’s “project Stretch”. The project started in 1954, the architecture was designed during 1956-1958, and the IBM 7030 computer was delivered to the first customer in 1961. https://en.wikipedia.org/wiki/IBM_7030_Stretch

            The 1962 book “Planning a Computer System: Project Stretch” contains the earliest use of the word “byte” I can find on google ngram. This book is a goldmine for people interested in the origins of computer architecture. https://archive.org/details/planningcomputer0000inte/

            “Planning a Computer System” defines the word “byte” more loosely than we do today. Generally a byte is a sequence of bits large enough to hold a character. Quote:

            The natural length of bytes varies. Decimal digits are most economically represented in a 4-bit code. The commonly used 6-bit alphanumeric codes are sufficient when decimal digits, a single-case alphabet, and a few special characters are to be represented. If this list is extended to a two-case alphabet and many more special characters, a 7- or 8-digit code becomes desireable. A 3-bit octal code or a 5-bit alphabetic code is occasionally useful. There would be little use for bytes larger than 8 bits. Even with the common 12-bit code for punch cards, the first processing step is translation to a more compact code by table-lookup.

            Page 65:

            The 8-bit byte was chosen for the following reasons:

            1. Its full capacity of 256 characters was considered sufficient for the great majority of applications.
            2. Within the limits of this capacity, a single character is represented by a single byte, so that the length of any particular record is not dependent on the coincidence of characters in that record.
            3. 8-bit bytes are reasonably economical of storage space.
            4. For purely numerical work, a decimal digit can be represented by only 4 bits, and two such 4-bit bytes can be packed in an 8-bit byte. …
            5. Byte sizes of 4 and 8 bits, being powers of 2, permit the computer designer to take advantage of powerful features of binary addressing and indexing to the bit level.
            1. 3

              Beware partial quotes :-). The IBM Stretch is one of those peculiar machines that had variable-length bytes. That’s why they used the “looser” definition of byte. The quote on page 65 refers specifically to the byte length used for the character set and character-based I/O.

              The mind-boggling complexity of every single aspect of the Stretch is one of the reasons why it kind of flopped. See this gem further on, on page 78 (I have the old, 1962 edition – page numbers may vary, this is section 7.4):

              The natural length of bytes varies. Decimal digits are most economically represented in a 4-bit code. The commonly used 6-bit alphanumeric codes are sufficient when decimal digits, a single-ease alphabet, and a few special chaiwters are to be represented. If this list is extended to a two-case alphabet and many more special characters, a 7- or 8-bit code becomes desirable (see Chap. 6). A 3-bit octal code or a 5-bit alphabetic code is occasionally useful. There would be little use for bytes larger than 8 bits. Even with the common 12-bit code for punched cards, the first processing step is translation to a more compact code by table look-up, and during this process each column is reated as a 12-bit binary field. There would be no direct processing of longer fields in the 12-bit code […] The 7030 is unique in that the byte size is completely variable from 1 to 8 bits, as specified with each VFL instruction. Bytes may also overlap word boundaries.

              The Stretch did not use a consistent byte size for its modules and was specifically designed to accommodate multiple byte sizes. It predates “fixed” byte-length designs, although it did happen to use 8-bit bytes for some of its units (character and tape I/O, confusingly enough the sign byte – not a typo – in binary operation mode etc.).

              1. 1

                The Stretch was a flop, but it was a big influence on the design of the IBM 360, which was a huge success, and which was influential. The “8 bit byte” design decision that I cited was also the rationale for why the IBM 360 had 8 bit bytes.

                1. 1

                  The “8 bit byte” design decision that I cited was also the rationale for why the IBM 360 had 8 bit bytes.

                  Definitely, I just wanted to point out that the Stretch itself did not use 8-bit bytes, and that at the time “byte” did not (yet!) refer to a particular fixed amount of bits on a given platform.

            2. 5

              Some random additions:

              • People used to represent binary numbers as octal instead of hex; I guess because it didn’t involve any funny extra digits, and because an 18- or 36-bit word size divided better into groups of 3. But octal was still the default on the PDP-11, which was 16-bit. You can still see remnants of octal in C’s 0-prefixed integer syntax and in POSIX file permissions.

              • The very-influential PDP-11 was byte-addressable (“ The PDP-11’s 16-bit addresses can address 64KB” -Wikipedia) but apparently still conceptually 16-bit-word based (“ the amount of memory attached to a PDP-11 is always stated as a number of words”.)

              • My best guess about “ on x86 the word size is 16 bits, even though the registers are 64 bits” is that “word” here refers to something defined in the original 8086 architecture that can’t change, like something particular instructions operate on. But in general, “word” means the CPU’s register size. (I wanted to say “data bus size”, but IIRC some CPUs had narrower data buses to save on pins.)

              I came of age just early enough to think of bytes as natural and obvious — the first computers I wrote assembly or C on were 6502s and Z-80s. The first computers I used were PDP-11s, but that was only in BASIC, which didn’t generally expose bytes (I don’t think RSTS BASIC had PEEK or POKE: not advisable on a timeshared machine without an MMU!)

              1. 3

                Except for the PDP-11, all of the DEC minicomputer word sizes were a multiple of 6. They were 12, 18, 24 and 36 bit machines.

                I looked at a PDP-8 manual, and it used 6 bit character codes. The card reader, line printer and teletype used different character sets, but all were 6 bits. This meant you could pack 2, 3, 4 or 6 characters into a machine word, depending on the model.

                Octal was the natural encoding for reading memory dumps. 2 octal digits for 1 character.

                Packing multiple characters into a single machine word meant that string manipulation was a PITA. 6 bit characters were also not compatible with the increasingly popular ASCII. Hence the PDP-11, which was byte addressable, with 8-bit bytes. Simplified text processing was a selling point.

                When the PDP-11 was designed at DEC, octal must have been entrenched in the culture, because the instruction set was designed to be human readable when looking at an octal memory dump. There are 8 GP registers and 8 addressing modes: both encoded as 1 octal digit in a memory dump, etc.

                1. 1

                  The same was true of one of the first “computers” I used, an early programmable calculator by Litton that was obsolete by the time I encountered it in middle school. It was the size of a desktop PC, had a Nixie tube display, and read programs from punch cards (with perforated holes like some voting cards, so you didn’t need a card punch.) It was slightly more powerful than my dad’s HP-25 pocket calculator, but not much.

                  Anyway, its instructions were all 9 bits [holes], described as 3 octal digits, and in many instructions one of the digits was the number of a register.

                  God help me, I still remember that the RTS (return from subroutine) instruction was 740.

                2. 2

                  Yeah, the 8086 had 16-bit registers, a 16-bit bus, but was uniformly byte-addressable (addresses are in bytes, and unaligned word access isn’t an error, it just wastes time while the CPU does two different word accesses for you and puts things together appropriately). There were two different kinds of loads and stores: “byte” and “word”. When the 386 came along and added 32-bit we got “double word” (dword) and then later “quad word” (qword) rather than redefining “word” in an incompatible way.

                  1. 1

                    But in general, “word” means the CPU’s register size. (I wanted to say “data bus size”, but IIRC some CPUs had narrower data buses to save on pins.)

                    True, and a famous example of a machine with a data bus narrower than its word size is the Intel 8088, used on the original IBM PC: It had sixteen-bit words and an eight-bit data bus.

                  2. 3

                    Computerphile has a great account by Professor Brailsford of why there are 8-bit bytes:


                    1. 2

                      Also, I wonder if BCD is where the term “nibble” for 4 bits comes from – in the context of BCD, you end up referring to half bytes a lot (because every digits is 4 bits). So it makes sense to have a word for “4 bits”, and people called 4 bits a nibble. Today “nibble” feels to me like an archaic term though – I’ve definitely never used it except as a fun fact (it’s such a fun word!).

                      While the term might not be as widely used as it once was, the half-byte is still a commonly used unit of information! Any time we print a number or data as hex, each character represents a nibble.

                        1. 1

                          I think I saw it spelled with an i instead of a y for the first time less than a decade ago. My assumption was that it was written by someone who had heard nybble but not seen it written down. I’d be curious if anyone has a source for the etymology, the only references that I found said that both spellings were in use, but were written comparatively recently.

                        2. 1

                          yea I think that’s the most important connection to make, 0x0 through 0xf inclusive is a nibble, which is cool. 2^4 = 16, all a bit easier to grok in your head (if you’re not a dyed in the wool CS person) than 2^8 = 256.

                        3. 2

                          My priors remain unchanged that it is because 8 is a power of 2 so that bit ops built into CPU microcode will be efficient. At least, that is probably why everyone converged on 8 bits.

                          1. 1

                            the byte size is the smallest unit you can address.

                            This is only true on some hardware, although compatibility with existing code written for byte-addressable hardware will cause C compilers, for example, to emit enough shift-and-mask code to make word-addressable systems look like they’re byte-addressable to software.

                            Apparently on x86 the word size is 16 bits, even though the registers are 64 bits.

                            Thou hast defeated me. I am well and truly confused by what she could possibly mean here.

                            Word size is register size is pointer size on modern general-purpose hardware.

                            1. 11

                              I also thought that word size = register size = pointer size until today, but I looked at the Intel® 64 and IA-32 Architectures Software Developer Manuals (section 4.1, “FUNDAMENTAL DATA TYPES”) and it says pretty clearly that a word is 16 bits. So now I’m not sure what to think.

                              1. 6

                                I think it is just a historical accident. They used the word “word” in various places in the 16 bit days, then the 32 bit (and later, 64 bit) extensions had to maintain compatibility and that included those old names and documentations.

                                1. 3

                                  Yeah I think that’s likely … that’s why we’re all confused about what a “word” is – because it doesn’t really matter for modern computing :-)

                                  I’m pretty sure that “word” means nothing in C / C++ / Rust as well. The language specs don’t need that concept. They do need to know about bytes and pointer alignment, etc. but not “words”.

                                  1. 1

                                    Also, remember that for some time after the first 32-bit x86 chip, the OSes most widely used (i.e. MSDOS and Windows) continued to run in 16-bit mode. It wasn’t until 1995 that a non-16-bit environment became commonplace. By that time, people had been using “word” to mean 16-bit values for so long that re-defining it to just mean a 16-bit value was actually the path of least resistance.

                              2. 8

                                “Word” is casually redefined by every single CPU architecture out there to be “whatever size we feel like for the context of this CPU”. For x86_64 this means “registers were 16 bits on the 8086 so a “word” is 16 bits forevermore.” For ARM this means “registers were 32 bits on the first ARM chip so a “word” is 32 bits forevermore”. For Itanium, if I’m reading the right source, this means “registers are always 64 bits so a word is 64 bits, except registers have an extra flag for invalid values so they’re really more like 65 bits.” For RISC-V this means “a word is 32 bits because we say so”. On the PDP-10, of course, a “word” was 36 bits, and registers were 36 bits wide, but a pointer was 18 bits.

                                It’s a pretty dumb bit of legacy terminology, and I wish hardware people would stop using it.

                                1. 4

                                  the byte size is the smallest unit you can address.

                                  This is something that’s kind of correctly quoted, only without context and really outdated, hence the trouble you’ve mentioned :-D.

                                  First, for the modern use: a byte is just a group of bits treated as a unit. That’s the IEC definition, which doesn’t hinge on any practical implementation, because the IEC learned that’s a really bad idea a long, long time ago, and we’re still trying to disentangle some of the mess they made before they learned that.

                                  As for the addressing part…

                                  The first instance of use of the word “byte” as we know it today (i.e. which evolved into what we now call a byte) comes from IBM. It referred specifically to the smallest unit that the I/O unit would process. The shift matrix of the IBM 7030 (described in section 7 here) operated in units smaller than a word; the smallest such unit had 8 bits. Anything less than 8 bits would be padded. A memo (Memo 45) from the same project kind of clarifies that:

                                  The maximum input-output byte size for serial operation will now be 8 bits, not counting any e r r o r detection and correction bits. Thus, the Exchange will operate on an 8-bit byte basis, and any input-output units with less than 8 bits per byte will leave the remaining bits blank. The resultant gaps can be edited out later by programming

                                  (Emphasis mine).

                                  This is how the “smallest addressable unit” came to be. A few years later, when Brooks (that’s the same Brooks in the Mythical Man-Month), Blaauw and Buchholz published “Processing Data in Bits and Pieces”, which was the first public use of the term and what we now use today, that was the “internal” meaning it had at IBM.

                                  Reading that paper is extremely confusing to a modern audience (I can’t link it here since it’s not publicly available anywhere but that site which carries scientific papers may or may not have it, wink wink):

                                  A data-handling unit is described which permits binary or decimal arithmetic to be performed on data fields of any length from one to sixty-four bits. Within the field, character structure can be further specified: these processing entities, called bytes, may be from one to eight bits long. Fields may be stored with or without algebraic sign.

                                  (Emphasis mine). Some of the sub-modules of the data-handling unit work with eight-bit bytes. Others, like the arithmetic unit, use eight-bit bytes for binary arithmetic, and four-bit bytes for decimal-coded arithmetic. There’s actually a figure in that paper that literally shows a word composed of 10 bits that aren’t addressed as a unit so they aren’t part of a byte, seven one-bit bytes, four 4-bit bytes, and five 6-bit bytes.

                                  A few years later, the same authors published a book that was really well-received (Planning a Computer System - Project Stretch), which talked about the IBM Stretch at length. This was the definition of byte for the next thirty years or so. It also spelled out the definition of word:

                                  Byte denotes a group of bits used to encode a character, or the number of bits transmitted in parallel to and from input-output units. A term other than character is used here because a given character may be represented in different applications by more than one code, and different codes may use different numbers of bits (i.e., different byte sizes). In input-output transmission the grouping of bits may be completely arbitrary and have no relation to actual characters.

                                  For practical reasons, the number of bits transmitted in parallel to and from I/O units was also the smallest addressable size on many computers of that era. Addressable units smaller than what the I/O units could handle was kind of pointless (you’d just waste extra hardware to zero out unused bytes, otherwise things like the zero flag would be ambiguous), and addressable units larger than what the I/O units could handle was efficient for some operations but not all.

                                  So in practice, lots of hardware designers in that age just stuck to byte-addressable memory. Even systems with word addressing, like the PDP-6, had special instructions that tl;dr allowed you to address variable-length bytes, although not as efficiently as a word.

                                  Consequently, lots of reference manuals of that era referred to a byte as either “whatever it takes to encode a character” or “the smallest unit we can address”. Both were effectively referencing Brooks’ definition, which was presumably sufficiently well-known that it didn’t need a separate reminder.

                                  For extra shits and giggles, that is also how we ended up with a word. The Project Stretch book also said…

                                  A word consists of the number of data bits transmitted in parallel from or to memory in one memory cycle. Word size is thus defined as a structural property of the memory.

                                  That’s why we can have registers and words of different sizes. In the beautiful golden age of CISC there was no problem having 32-bit registers for 16-bit words – but it took (at least) two cycles to load one.

                                  At some point, though, that… just kind of lost significance, just as it happened with the byte. A byte, or a word, was whatever the reference manual said it was.

                                  1. 2

                                    I prefer the definition of a byte as the larges unit for which the in-memory representation is not observable. You can’t tell if bytes are stored most or lesser significant bit first. You can tell the order of bytes within a word.

                                  2. 1

                                    The confusion in the intel word != register size is a by product of x86 using the word “word” or “byte” in the instruction names. Given the requirement for compatibility, the instruction name had to stay the same when the logical word size increased, and hence double word and quad word were invented.

                                    Learning from such mistakes other ISAs use instructions including the bit width of the operation.

                                    Before we all sh*t on intel again, C has the same problems with a bunch of its typed - and similarly newer languages use bit widths in the type names.