1. 32

  2. 16

    There are so many other key things for the VAX that it’s a bit sad that this article skipped straight to VMS and NT. A few off the top of my head:

    • I don’t think VAX was the first machine to support paging but it was the machine that defined the structure of modern virtual memory systems.
    • The 512-byte page size on VAX is the reason that we ended up with 512-byte block sizes on disk for so long (it was a convenient size to move to and from memory as a single chunk on a VAX, 4096 bytes is more convenient for any post-1990 system but the transition to 4096-byte block sizes for disks is still not finished).
    • The userspace-at-the-bottom-kernel-at-the-top-because-signed-numbers-are-like-hard-and-stuff convention that all modern operating systems use came from the VAX.
    • The VAX was the first system to segregate kernel and userspace page tables. It actually supported a form of the nested paging that modern hypervisor extensions provide: the userspace page-table walk looked for pages in the kernel’s virtual address space and so had to refer to the kernel page table (or TLB) to find the address of userspace page table pages.
    • The original BSD virtual memory subsystem was written for the VAX. The VAX versions of the kernel binary with virtual memory were called vmunix (the kernel without virtual memory support we called unix). This lives on today in Linux’s vmlinux / vmlinuz (compressed version) binary names.
    • 4.4BSD replaced the VM subsystem with the one from Mach and modern BSDs (including XNU) use a direct descendent of this code. The VM subsystem for Mach was written for the VAX port of Mach.

    The VAX is one of the most influential architectures of all time.

    1. 4

      I don’t think VAX was the first machine to support paging but it was the machine that defined the structure of modern virtual memory systems.

      I’d say it would either be PDP-6 with the pager module or 360/67 that’d be the influential early option, though I don’t know if they’d be the first.

      Also, x86 took its ring model almost directly from VAX. VMS actually uses all 4 rings, but most x86 stuff didn’t.

      1. 6

        Also, x86 took its ring model almost directly from VAX. VMS actually uses all 4 rings, but most x86 stuff didn’t.

        Apparently, Intel picked 4 rings for the i386 because DEC said they absolutely needed four rings for VMS and wouldn’t port it to x86 without four. DEC then ported it to Alpha (with two rings). VMS was eventually ported to x86-64, where it uses only rings 0 and 3.

      2. 2

        This lives on today in Linux’s vmlinux

        It’s somewhat amusing that FreeBSD kernels are called kernel while the younger Linux project has adopted that funky naming. What is the story behind that?

        1. 2

          I don’t know for sure, but I suspect this came out of the AT&T lawsuit. Everything in *BSD that used the name ‘unix’ had to be renamed. Having a kernel binary called vmunix would probably have been problematic and if you’re going to rename it then kernel is a pretty obvious name for the kernel binary. FreeBSD would have picked up this name. Linux began just before the lawsuit was settled (in a large part because BSD wasn’t available on i386 hardware as a result of the lawsuit) and so presumably copied whatever *NIX variant Linus was most familiar with.

        2. 2

          Oh my word! That’s fascinating – thanks for that.

          I did not know most of that, I must confess. I was commenting to someone over on HN who admitted that they did not get the importance or significance of the hardware. I was trying to give something between an “ELI5” and “Explain it like I’m a CS undergrad” high-level view, not so much of the architectural details of the particular machines, but what later hardware and software designs they influenced.

        3. 13

          Nuclear take: I think it’s interesting so many “computer engineering/enthusiast” types (for lack of a better term) tended to gravitate towards DEC systems when their design is full of bonkers mistakes no EE should repeat: PDP-10’s recursive indirect addressing, PDP-11’s segmentation and PC in memory (ok, DSPs do this, but that’s an acceptable optimization for a DSP, not a general-purpose CPU), the absurd CISCiness of VAX, etc. (Alpha was pretty reasonable.) I say this as someone who likes VMS.

          I think 360/370 is much better designed, and the influence in modern CPUs design is more obvious (lots of GPRs, clean instruction formats, pipelining, virtualization, etc.). Plus they had the also influential ACS/Stretch to draw from. I can’t say the same for many DEC designs. It’s amusing Unix types are so obsessed with VAX when Unix would feel far more at home on 370.

          1. 5

            I suspect a variety of factors are to blame:

            IBM in the ’70s and ’80s had the reputation that Microsoft had in the ‘90s and 2000s and Google, Amazon, and Facebook are competing for now: the evil empire monopolist that the rest of the industry stands against. There’s a story around the founding of Sun that they got a visit a few months in from the IBM legal department inviting them to sign a patent cross-licensing agreement and showing six patents that Sun might be infringing. Scott McNealy sat them down and demonstrated prior art for some and that Sun wasn’t infringing any of the ones that might be valid. The IBM lawyers weren’t phased by this and said ‘you might not be infringing these, would you like us to find some that you are?’ Sun signed the cross-licensing agreement. This kind of thing is why IBM’s legal department was referred to as the Nazgul. To add to this, IBM was famously business-facing. They required programmers to wear suits and ties. The hacker ‘uniform’ of jeans and t-shirts was a push-back against companies like IBM in general and IBM in particular and hacker culture in general was part of a counter-culture rebellion where IBM was the archetype of the mainstream against which they were rebelling.

            The DEC machines were so closely linked to the development of UNIX. IBM’s biggest contribution with the 360 was the idea that software written for one computer could run on another. This meant that their customers were able to build up a large amount of legacy software by the ’80s so IBM had no incentive to encourage people to write new systems software for their machines: quite the reverse, they wanted you locked in. DEC encouraged this kind of experimentation. Universities may have had an IBM mainframe for the admin department to use but the computer science departments and research groups bought DEC (and other small-vendor) machines to tinker with.

            Multics was developed for the GE45, which had all manner of interesting features (including a segmentation model that allowed a single-level store and no distinction between shared libraries and processes), Unics was written for the tiny PDP in the corner and it grew with that line.

            There were a lot of other big-iron systems suffered from the rise of UNIX. I’m particularly sad about the Burroughs Large Systems architecture. The B5000 was released at almost the same time as the 360 and had an OS written in a high-level language (Algol-60), with hardware-assisted garbage collection, and provided a fully memory-safe (and mostly type-safe) environment with hardware enforcement. Most modern language VMs (JVM, CLR, and so on) are attempts to emulate something close to the B5000 on a computer that exposes an abstract machine that is basically a virtualised PDP-11. I wish CPU vendors would get the hint: if the first thing people do when they get your CPU is use it to emulate one with a completely different abstract machine, you’ve done something wrong.

            Oh, and before you criticise the VAX for being too CISCy (and, yes, evaluate polynomial probably doesn’t need to be a single instruction), remember that the descendants of the 360 have instructions for converting strings between EBCDIC and unicode.

            1. 2

              I think you exaggerate about the IBM. There is a general 1:1 table based translate which can do EBCDIC to ACII or Unicode, and there are different instructions for converting between the different Unicode flavours. It can’t do it in one instruction, that I know of.

              But anyway, those and VAX POLY aren’t the problem. You can happily use microcode or just trap and emulate and no one will care.

              The problem with the VAX is that the extremely common ADDL3 instruction (to name just one) can vary in length from 4 to 19 bytes and cause half a dozen memory references / cache misses / page faults.

              x86, for all its ugliness, never uses more than one memory address per instruction for common instructions e.g. code generated from C. Same for S/360. Both have string instructions, but those are not a big deal, and relatively uncommon.

            2. 3

              That’s an interesting observation.

              I think there would be a lot to learn from comparing the two engineering cultures. I would specifically include the management style and the kind of money each company was dealing with. When IBM was developing ground-breaking products like the Stretch and the Selectric typewriters, half of the company’s income came from incredibly lucrative military contracts.

              The kinds of pressures on an engineering team and the corner/cost-cutting they may take is dramatically different when they are awash with money.

              1. 3

                To elaborate, I feel the DEC did more influence to product segments than they did engineering. The PDP-8 and then PDP-11 redefined minicomputers, but the PDP-8’s influence was short-lived and the PDP-11’s influence….would have rather not been felt (i.e x86).

            3. 4

              In addition to those mentioned, the TI MSP430 and Hitachi SuperH are clearly heavily PDP-11 influenced, with hacks (primarily reducing the number of addressing modes, especially on the dst) to extend them from 8 to 16 registers while sticking to a 2-address format in a 16 bit instruction. The SuperH also extends the PDP11 to 32 bits.

              1. 3

                The SuperH felt more like the 68000 to me. But the 68000 was heavily influenced by the PDP-11. So here we are ;-)

                1. 4

                  I can’t see that.

                  The defining characteristic of the 68k/ColdFire compared to the PDP-11 is that it keeps the 3 bits per operand for register and 3 bits for addressing mode, but some instructions/operands imply a reference to a data register and some to an address register. This is how they double the register set from 8 to 16.

                  In the SuperH, every instruction that has an explicit register operand can use all 16 registers for that operand. They get the encoding space for this by confining arithmetic to simple register-to-register (like a RISC), and putting all the indirect, autoincrement etc addressing modes only on the MOV instruction.

                  68k and SuperH depart from the PDP11 starting point in quite different directions.