1. 15
    1. 7

      Original author here.

      Back in September, I saw a recommendation for “Showstopper!: The Breakneck Race to Create Windows NT and the Next Generation at Microsoft” in some comment thread in HN. Learning about the history of Windows NT sounded interesting, in part because I lived through it but was too young to understand what was going on, so I bought the book right away and have been slowly reading through it over the last three months.

      The story in the book seemed so exciting (and messed up) that I ended up writing a “book review”, although this article is mostly my commentary on various topics that caught my attention. Beware that I didn’t take detailed notes as I was reading the book so the specifics might be a little wrong in places…

      Anyhow. Not too long after that, I saw the interview with Dave Cutler show up in the front page here (https://lobste.rs/s/jerg92/windows_longhorn_cairo_more_dave_cutler) so I thought that this article might be of interest to you as well. But… actually: some of you usually provide some super-insightful comments about “the good old days” and about systems internals, so I’m curious to see what other cool details you might add!

      1. 6

        I am about the same age, but was excited by NT 4 (which I had a copy of! My father got a free one at a launch event and I ran it as my OS on the first new PC I ever owned!).

        Yes, DOS and Windows were the most popular choices for personal computers, but there were other contenders such as OS/2, BeOS, QNX… as well as the myriad different commercial Unix derivatives

        BeOS, in particular, highlights that you could get investment if your pitch was ‘we are going to sell an OS’. They also sold hardware, and ‘we are going to sell a new kind of computer with an incompatible software stack’ is an even harder sell. I mean, that’s what my current employer’s pitch is as well, but we’re targeting the embedded space and aiming for a high level of source compatibility. Expecting an entire ecosystem of desktop application software to materialise around your new platform was totally plausible in the ’80s and mid ’90s. Less so by the late ’90s, though MS was absolutely terrified that web apps on Netscape Navigator would make this realistic again.

        Contrast this to the open-source BSDs and Linux, which were already a reality by NT’s launch in 1993. Linux and the original 386BSD were “hobby projects” for their creators.

        386BSD, perhaps, but the BSD family in general was big business by then. The AT&T lawsuit wouldn’t have happened if people weren’t making big money from selling BSD systems and not paying AT&T anything.

        SunOS, DYNIX, Ultrix, True64, and others were all serious workstation / server operating systems built on a BSD core. The 386 port was largely regarded as a toy because PCs were regarded as toys by serious men with serious grey beards.

        Linux is the basis for almost all mobile devices and servers, and macOS is derived from those original BSDs.

        iOS now has >50% market share in the US (and UK) and is not based on Linux (it’s the same XNU kernel as macOS). The lineage of XNU is a bit complicated. The original NeXTSTEP kernel was a single-server 4BSD running on a Mach microkernel. It did not inherit from 386BSD at all. OS X 10.0 updated the BSD bits from FreeBSD 5 (widely known as the worst ever FreeBSD release, though I have fonder memories of it). It’s since diverged again.

        as a regular FreeBSD and NetBSD user, I churned out multiple new builds of these whole OS per day

        I bet you weren’t doing multiple builds of everything multiple times per day. Even in the early 2000s, building GCC was job for several hours on a 500MHz system, building the Mozilla Suite or Firefox was overnight. Building FreeBSD, gcc, XFree86, KDE, and Mozilla was an overnight (and most of the next day) job.

        These days we take x86-based personal computers as the one and only possibility,

        In this room, I have four computers: a phone, a tablet, a laptop, and a desktop. They all have AArch64 processors. I think the printer under the desk may be a MIPS R4K derivative. Counting phones (as things that run a multitasking OS and multiple applications from different vendors as the baseline for a personal computer), there are more Arm-based personal computers than x86 ones.

        NTFS on its own is fine,

        There are interesting parallels between the development of NTFS and BFS. Both were filesystem projects that were given a short timescale and an unstable set of requirements. Both condensed their requirements to the same set of key things: a filesystem is a dictionary from names to data, it must be efficient for data ranging from a few bytes to a few gigabytes. Everything else can be layered on top. They solved this problem in very different ways, but the core abstractions are the same, and so they can store arbitrary metadata with files because they are efficient at storing small objects and large ones. NTFS grew a lot of things for making use of the small data.

        Bill Gates was obsessed with performance and routinely asked about it, because the prospective user base of NT did not have the means to buy high-end computers. Which is great… but then, I just don’t know where performance has gone wrong at Microsoft because that doesn’t seem to be the focus anymore.

        The focus on performance rarely led to good performance even then. See ‘Andy and Bill’s Law`, which dates back to this period and was true for most of the ’90s. NT4 was fast on a 133 MHz processor on a clean install. By the time you had Service Pack 3 and IE4 it was painfully slow.

        So, while NT was designed from the ground up to support multiple personalities (or… subsystems, like WSL 1),

        WSL 1 doesn’t use most of the subsystem infrastructure (in part, they were too isolated). It uses the picoprocess mechanism that was added for Drawbridge (a spin out from the next time Microsoft decided to do a clean-slate desktop / server OS).

        Most developers would go against the latter, claiming that those are bugs in the apps and they deserve to break. Apple has taken this path. Open source tries to take this path. Microsoft didn’t.

        It’s an interesting problem with success here. Apple mostly broke backwards compatibility while rapidly growing market share. Most x86 Macs were sold to new Mac users, not to existing ones. If they lost 20% of existing users, that was bad but the things that they could do when losing these people let them pick up a lot more users. In contrast, Windows 3.11 and 95 were ubiquitous. If switching to NT had been as hard as switching to a Mac and they’d lost 20% of their install base, that’s more than their competitors’ total market share and would have been a disaster.

        But the other huge migration was the jump from Windows 9x to Windows NT.

        And it was a very rocky migration. NT 4 shipped after 95. As I recall, it didn’t run Win16 applications at all and didn’t run a lot of Win32 apps that made 9x-specific assumptions (and, because it was expensive, most developers didn’t test on NT).

        Later in life, I had a hard time seeing how the Windows NT 4 that ran in our high school computer lab differed from the Windows 95 I had at home: to me, it just seemed slower and heavier.

        There were some silly segmentation things that happened here. NT4 shipped with OpenGL support and a driver architecture for 3D acceleration. Direct3D shipped for Windows 95 with a different and incompatible driver model for acceleration. I don’t think ’95 had a software path for OpenGL or a driver model, but it was later added (possibly by third parties?). Microsoft wanted Direct3D for games, OpenGL for workstation apps and kept them in their respective boxes. This made things that used Direct3D initially not run, and then run with only software rendering support (Service Pack 2) in NT 4.

        The weirdest thing in NT4 was the decision to move the graphics server into the kernel. This is one of those decisions that made sense only for a narrow time window. The legacy, all of the GUI stuff in Win32K, has been a security disaster and it isn’t even faster anymore. Modern graphics stacks do all of their rendering either in the client applications or with direct access to a GPU command queue from the applications, because adding kernel transitions costs performance. Win32 makes a bunch of system calls for GUI rendering. The client-server design in NT 3.x would have been a better fit for modern hardware, it was just a worse fit for mid ’90s hardware.

        1. 3

          By the time you had … IE4 it was painfully slow.

          Agreed. At the same time, IE 4 was being required by lots of Microsoft software. That kept me out of the MS ecosystem for years.

          It’s an interesting problem with success here. Apple mostly broke backwards compatibility while rapidly growing market share.

          Also agreed. One detail people often overlook is that Windows compatibility between 1.0 - 3.1 wasn’t that good. Software written for 2.x would get a giant warning in 3.x protected mode (then frequently crash), 3.1 dropped real mode, and 95 will just refuse to run it without trying. NT dropped support for 2.0 resources in an NT 3.1 service pack, and programs need resources to function. Only after a critical mass of software was written did compatibility become a big deal.

          As I recall, it didn’t run Win16 applications

          NT had ntvdm which ran DOS programs, and cooperatively multitasked 16 bit programs. By default these all shared an address space. That was very different to 95 which ran them “natively.” Different people would have had very different experiences with this, depending on what software they ran - for me, it worked shockingly well considering how radically different it was. In some cases, like Visual C++ 1.5, the same binaries run but they’re detecting the environment and behaving differently (no vxd support on NT.)

          I don’t think ’95 had a software path for OpenGL

          It arrived in OSR2. I don’t know how the driver side of it worked.

          1. 2

            Counting phones (as things that run a multitasking OS and multiple applications from different vendors as the baseline for a personal computer), there are more Arm-based personal computers than x86 ones.

            Indeed. By at least an order of magnitude, maybe two.

            I wrote 2 reviews of the ThinkPad X13S, an Arm laptop. Still readers don’t get it and ask if they can run DOS on it, or why it won’t boot Ventoy, etc. They can’t grasp that it’s not an x86 machine.

            And in a way that’s a good thing. Apple has determinedly scrubbed over the lines with both feet. M1 is quick enough to run an entire ¼-of-the-way-into-C21 bloated-as-fsck x86-64 OS in a VM under emulation and make it usable. This sort of stuff shouldn’t be visible to users any more. Compare how OpenFirmware used to be able to run x86 device firmware in emulation (AIUI).

            NT 4 shipped after 95.

            True.

            As I recall, it didn’t run Win16 applications at all and didn’t run a lot of Win32 apps that made 9x-specific assumptions

            Not true. NT4, like NT 3.x, was 32-bit only, and has a WOW Win16 subsystem. They could happily run DOS and Win16 apps… even on RISC CPUs. It contained x86-16 emulation. DEC added x86-32 emulation to its editions for Alpha, because Alpha was that damned fast.

            All it could not run was x86-32 device drivers.

            The weirdest thing in NT4 was the decision to move the graphics server into the kernel. This is one of those decisions that made sense only for a narrow time window.

            Oh yes. To those with a bigger picture view it was disastrous even at the time. I said so in print in PC Pro Magazine and got some very very hostile feedback as a result.

            The client-server design in NT 3.x would have been a better fit for modern hardware, it was just a worse fit for mid ’90s hardware.

            Strongly agreed.

            1. 1

              All it could not run was x86-32 device drivers.

              I remember some problems with Win16 apps on NT, but my memory is somewhat hazy. I had Windows 3.11 installed and dual-booted DOS to run a couple of things that didn’t work in NT 4. Perhaps it was something to do with thunks to real-mode components that expected to talk directly to the hardware? It’s been a very long time since I ran it though (and even longer since I cared about any Win16 apps, though I still have fond memories of ClarisWorks 1.0 for Windows 3.1).

              1. 2

                That’s fair enough.

                My “it couldn’t run device drivers” was too curt: it couldn’t run VxDs and DOS device drivers, but it wouldn’t run DOS and Win16 code that hit the metal. The difference being that the latter was theoretically possible but would have compromised system stability too much. Remember that NT predates microprocessors directly supporting VM instances of themselves; it even predates VMware achieving this with clever trickery (running software emulation of x86 on x86 to safely execute ring 0 code).

                OS/2 ≥ 2 allowed DOS VMs, essentially x86-16 instances using the 386’s Real mode. These could even boot from floppy. But then again, OS/2 ≥2 was a lot less stable than NT, even NT 3.1, and that’s without considering that while NT didn’t have DOS VMs or a Win3.x VM, it did have networking, which is rather more important and useful.

                But, yes, on all x86-32 and RISC versions of NT from 3.1 to Win10, you can run clean, legal, compliant DOS and Win16 apps that don’t hit the metal.

                This was only dropped in the first x86-64 edition, which I think was the x86-64 edition of XP (for a few people) or Vista (for the mainstream).

                1. 1

                  Remember that NT predates microprocessors directly supporting VM instances of themselves

                  Kind of. The 386 introduced VM86 mode, which ran real-mode VMs. I think these could trap on access to MMIO regions (they sat on top of paging, so you could map some pages no access and trap accesses) and so you could emulate hardware access. I don’t know if anyone actually did this (possibly iRMX?). Windows 3.1 (and, apparently, the 386 version of Windows 2, which I never used) used this to run DOS programs safely on a 386. This was one of the weird things in Windows 3.x: memory-safety bugs in DOS programs were isolated into the VM86 instance, memory-safety bugs in Win16 programs were not and could cause blue screens. Apparently the NTVDM subsystem, which ran DOS things, also used VM86.

                  VM86 only emulated an 8086 though, so you couldn’t run Win16 software that expected 386 features. You also couldn’t run DOS things that switched to protected mode, unless they used some of the pass-through functionality that let them delegate to another memory manager (I never saw that work, but apparently it was meant to).

                  This was only dropped in the first x86-64 edition

                  As I recall, x86-64 uses the same call gates for transitions between VM86 mode and protected mode as it does between protected mode and long mode, so you can’t (without some extreme hackery) have both long mode and VM86 mode in the same environment. You can now use VT-x to create a 32-bit VM and then enter VM86 mode there to get a virtualised 8086, but that was too late for Windows XP and Vista (both predated VT-x becoming mainstream).

                  1. 1

                    The 386 introduced VM86 mode, which ran real-mode VMs

                    I know. I used it (a lot.) That is why I said “of themselves”.

                    The 80386 could start what were effectively software controlled 8086 VMs, but it could not start 80286 or 80386 VMs.

                    So, what started as a doctoral thesis and became VMware ran software-emulated x86 instances on x86. This developed into detecting ring transitions and running ring 1/2/3 code – well, only ring 3 really – code on the metal, but trapping transitions into ring 0 and running them in the software x86-32 ISA emulator.

                    Result: although x86-32 did not meet the Popek and Goldberg virtualisation requirements, it could run VMs, as I wrote about here: https://www.theregister.com/2011/07/11/a_brief_history_of_virtualisation_part_one/

                    That did so well that Intel added what is effectively ring -1 underneath ring 0, enabling hardware x86-32 VMs on x86-32.

                    Saying that… I didn’t know the stuff about call gates and why x86-64 dropped real mode. Thank you!

                    1. 3

                      Result: although x86-32 did not meet the Popek and Goldberg virtualisation requirements, it could run VMs, as I wrote about here

                      I, too, have written a little bit on this topic.

                      This developed into detecting ring transitions and running ring 1/2/3 code – well, only ring 3 really – code on the metal, but trapping transitions into ring 0 and running them in the software x86-32 ISA emulator.

                      Xen on 32-bit x86 worked by placing the kernel in ring 1 and itself in ring 0. This, apparently, caused a lot of fun to the Novell folks who ported NetWare to run on Xen, because NetWare used all four rings. I don’t think any other mainstream things did. I’m told that x86 has four rings because VMS used four rings on VAX and DEC asserted that it was absolutely essential. Then they ported it to Alpha, with only two.

                      That did so well that Intel added what is effectively ring -1 underneath ring 0, enabling hardware x86-32 VMs on x86-32.

                      The history there is actually much more depressing. The main reason that VT-x exists in its final form is that the Linux kernel maintainers viewed virtualisation as a niche use case and didn’t want to maintain two in-tree virtual memory layers for virtualised and non-virtualised cases. Adding a small amount of hardware support for shadow page table mode would have been much easier in hardware, given better performance (shadow page tables make updating page tables more expensive, nested page tables make TLB misses more expensive. The latter is a few orders of magnitude more common than the former), but required guests to be enlightened. Now, bare metal is practically a nice use case for server Linux (and even mobile: most Android devices now run with a hypervisor underneath).

                      The amount of power that’s burned as a result of the short sightedness of the Linux kernel maintainers is more than a percentage point of total datacenter power consumption. The carbon emissions resulting from that decision are astounding.

                      1. 2

                        I, too, have written a little bit on this topic.

                        Oh nice! :-)

                        The Reg piece – an early, freelance one – did turn into chapter 1 of a short Kindle-only book, which looked great on my CV but is sadly now gone.

                        https://liam-on-linux.livejournal.com/45467.html

                        Xen on 32-bit x86 worked by placing the kernel in ring 1 and itself in ring 0.

                        I think I recall hearing about that…

                        This, apparently, caused a lot of fun to the Novell folks who ported NetWare to run on Xen

                        I didn’t know there was a Xen Netware…

                        because NetWare used all four rings.

                        Netware 4.x normally rang NLMs in ring 1 but you could tell it to load them in ring 0 for performance, at the price of reduced stability.

                        ISTR reading similar but different about VirtualBox and OS/2.

                        OS/2 3.x used all 4 rings and I thought it was the only thing. This broke hypervisor assumptions, like forcing from 0→1.

                        E.g. see in the comments here: https://www.os2museum.com/wp/os-2-2-0-spring-91-edition/

                        I don’t think any other mainstream things did.

                        I didn’t know about Netware using >2; OS/2 was all I had heard that did.

                        I’m told that x86 has four rings because VMS used four rings on VAX and DEC asserted that it was absolutely essential. Then they ported it to Alpha, with only two.

                        Sounds plausible considering some of the history… but what did DEC care about x86? Cutler only joined MS and gained influence over Wintel in the ’90s, surely…?

                        Linux kernel maintainers viewed virtualisation as a niche use case and didn’t want to maintain two in-tree virtual memory layers

                        Do tell?

                        practically a nice use case

                        “niche” I presume?

                        1. 1

                          I’m told that x86 has four rings because VMS used four rings on VAX and DEC asserted that it was absolutely essential. Then they ported it to Alpha, with only two.

                          Are multiple rings something that can be implemented in PALcode?

                          1. 2

                            If you have some registers that are accessible only from PAL mode (I think there are, but I haven’t read the Alpha manual for about 20 years), yes. You’d have a PALcode instruction to drop to a lower ring and one to trap to a higher ring (which would jump to a specific point), and you’d track the current ring in a dedicated register (actually, you could probably track it in memory). Ring transition would need to be a TLB flush and then TLB fills would need to check the current ring before populating them from a software handler. That could be delegated to ring 0, with the PAL code providing the current ring to the OS TLB miss handler, which could then walk page tables (or equivalent) and see if the current ring is allowed to access the page.

              2. 2

                Thanks for the very detailed answer, as usual. I was kinda looking for it when I sent the article ;) I’ll have to run my future posts by you before publishing them!

                Just a couple of clarifications/thoughts, which don’t change your reply at all anyway.

                I bet you weren’t doing multiple builds of everything multiple times per day.

                Yeah, that’s right. I was referring to the base system specifically, which for NetBSD + X11 I think took 2-3 hours on my machine. Building KDE and the like took forever though, but those were not part of base so they “didn’t count” hehe.

                In this room, I have four computers: a phone, a tablet, a laptop, and a desktop.

                In the sentence this refers to, I explicitly chose to say “personal computer” to refer to PCs, not arbitrary devices because yeah, if you count everything that’s not a PC, x86 doesn’t dominate anymore.

                And it was a very rocky migration.

                Also here, when I wrote Windows 9x to Windows NT, I was specifically referring to Windows XP, which is the first NT descendant that could really unify both. But I chose the word “NT” because I was referring to the underlying technology.

                There were some silly segmentation things that happened here.

                I never used Windows NT per se (other than briefly at school, but that was just for Word and Excel). My first exposure at home and a real chance for tinkering was Windows 2000. I think by then Direct X did work, right? But I do recall issues with games and the like.

                1. 2

                  In the sentence this refers to, I explicitly chose to say “personal computer” to refer to PCs, not arbitrary devices because yeah, if you count everything that’s not a PC, x86 doesn’t dominate anymore.

                  Do you at least count the laptop and desktop as personal computers? They’re both Arm as well.

                  I never used Windows NT per se (other than briefly at school, but that was just for Word and Excel). My first exposure at home and a real chance for tinkering was Windows 2000. I think by then Direct X did work, right? But I do recall issues with games and the like.

                  Windows 2000 was supposed to do that. Windows ME (I think, maybe 98SE) introduced the new driver model and so ME and 2000 could share drivers and the hardware certification required that drivers were tested on both (this was the rockiest thing for a lot of users: 9x and NT had totally different driver models and so most hardware didn’t work with NT, even if the software did).

                  XP ended up introducing a lot of compatibility hacks. 2000 was the last version I used regularly before I started at MS. The tellytubby UI in XP put me off upgrading and I switched to a mixture of FreeBSD and OS X.

                  I never used Windows NT per se (other than briefly at school, but that was just for Word and Excel). My first exposure at home and a real chance for tinkering was Windows 2000. I think by then Direct X did work, right? But I do recall issues with games and the like.

                  DirectX in 2k had parity with 9x, which was great. Most compatibility headaches there came from assuming things like HKLM in the registry and Program Files were writeable by normal users. XP introduced a load of compat hacks to work around this bad behaviour. I can’t remember whether it was XP or Vista that introduced the registry shadowing model that let users write keys to HKLM and actually stored them in HKCU and did some similar overlay filesystem things for system directories.

                  1. 1

                    Do you at least count the laptop and desktop as personal computers? They’re both Arm as well.

                    Yeah, I did too, but Apple is the only significant player with non-x86 computers. So, whether you want to call their computers “PCs” is one question, and whether Macs are a significant proportion of desktops and laptops in the world is another. I know they are big in the US; I wasn’t aware of their market share in the UK; but I don’t think they are significant elsewhere… yet?

                    In any case, I did think about this… but my choice of words was waaaaay too subtle.

                    The tellytubby UI in XP put me off upgrading

                    Heh. Fortunately you could still enable the classic UI. It didn’t look as great as 2K though; 2K was peak UI for me.

                    1. 1

                      Apple is the only significant player with non-x86 computers.

                      I believe the most popular current computing platforms (not counting phones or consoles) are:

                      • Intel / AMD / Microsoft
                      • Apple Mac
                      • Google Chromebook
                      • Raspberry Pi

                      2 and 4 are ARM and 3 is mixed ARM / Intel. 2 is Unix, 3 and 4 are Linux, 1 has optional Linux.

                      Apple is selling about 2 million Macs a month; Raspberry Pi is expecting to sell 1 million a month. Slightly different price points!

                2. 1

                  as a regular FreeBSD and NetBSD user, I churned out multiple new builds of these whole OS per day

                  I bet you weren’t doing multiple builds of everything multiple times per day.

                  I seem to remember in 1999 or 2000 getting a buildworld down to about a kilosecond on a quad xeon monster, but I am not sure my if memory deceives me. (maybe it was a buildkernel?)

                  1. 3

                    Sounds about right if buildworld was building gcc only once. I didn’t know anyone with access to a quad Xeon back then, and only knew one person with a dual processor system, so I suspect you had some very jealous friends.

                3. 1

                  some of you usually provide some super-insightful comments about “the good old days”

                  Oh dear hypothetical deities. :-(

                  I deployed NT 3.1 in production in my mid-20s. It was about the 20th or more OS I learned; nudging the 10th server OS in my professional life alone.

                  Y’know some of this is actually motivation for why I write. To recount how it was from someone still actually in the business, before it’s myth and legend.