1. 15
    1. 5

      Great post. I expected this to just be a DJGPP intro, running “hello, world”-tier code on DOS. But it’s actually a really interesting dive into how it provides a GNU-ish interface despite all of DOS’s limitations.

      1. 3

        Exceeded expectations :) Glad you found it interesting.

        1. 2

          Another thing that I recall having to-do in DJGPP was if I needed to redirect say stderr to a file, IIRC, you had to run a redirection binary that then executed your binary.

          DJGPP is where I cut my teeth on C and it will always have a soft spot in my heart. Coming from Turbo C and not having to worry about dealing with 16 bit memory access was very liberating.

        2. 4

          In fact, from reading about the historical goals of the project, I gather that a secondary goal was for DJ to evangelize free software to as many people as possible, meeting them where they already were

          Worked on me too. I also came to DJGPP from Turbo. I never really got any mileage out of it because it was too different from what I was used to, and a bit awkward to make work (even with RHIDE). But it was one of the things that led to me finding out that this free software thing existed (the other was UCBLogo), which led to me installing Linux on a 486 ThinkPad, which led to… pretty much my whole career and a fair chunk of my hobbies since then.

          1. 3

            Great article, I learned a lot that I’m probably better off not knowing!

            The answer is the transfer buffer

            Is this what DJGPP called it? The normal term for this is a bounce buffer. They’ve cropped up in a bunch of places. It’s an interesting historical curiosity that the first IOMMU was built to solve this problem: Sun wanted to ship cheap 32-bit network cards with workstations with 8 GiB of RAM, but didn’t want to incur the performance penalty of bounce buffering. Security uses of IOMMUs were something of an afterthought. Interestingly (to me, at least) adoption in the PC world happened the other way around and the forerunner of modern IOMMUs in PCs was AMD’s device exclusion vector, which did protection but not translation, and so could be used for security but could not avoid bounce buffering.

            That’s right: file read and write operations are restricted to 64 KB at a time because the number of bytes to process is specified in the 16-bit CX register. Which means that, in order to perform large file operations, we need to go through the dance above multiple times in a loop

            It’s a nice affordance that read does this, but it’s not actually required. POSIX allows read to return less than the requested size, so it would be entirely conformant to just truncate the request size and zero-extend the response.

            Leaving aside the fact that the DOS API is… ehem… bad

            I’d say the same thing about the UNIX one. Glob expansion is done in the shell because the original UNIX didn’t have shared libraries and statically linking the glob function into every tool made things too big. The down side of this is that the called binary can’t see the pre-expansion version. This is why in DOS, you can write RENAME *.txt *.old but in UNIX this gets expanded to a list of files and can’t associate patterns in the source and destination globs. It also causes problems when you want wildcard expansion over something that isn’t the filesystem, because you have to escape it.

            I wish this had gone away in *NIX as soon as shared libraries arrived.

            This is because longcmd1.exe now knows that longcmd2.exe understands the transfer buffer arrangement and can send the command line to it this way.

            But how does it know?!? Does it read some magic header or is there a different entry point or similar?

            1. 4

              I’d say the same thing about the UNIX one. Glob expansion is done in the shell because the original UNIX didn’t have shared libraries and statically linking the glob function into every tool made things too big.

              Except it wasn’t originally done in the shell, the shell called the ‘glob’ program to do it. It then exec’ed the program with the expanded arguments.

              See the v6 ‘glob’ and ‘sh’ source.

              Now in theory, a different convention could have been used. Say whereby the called programs scanned for one of the ‘glob’ characters - i.e. ‘*’, ‘?’, ‘[’ in their arguments, then themselves did the fork/exec dance to feed those args via a pair of pipes to a different form of glob program, reading back the expanded args.

              However, I guess the approach taken was as much for ease (‘by programmers, for programmers’), and convenience in writing the various programs executed, as for any other reason.

              1. 2

                Is this what DJGPP called it?

                Yep, that’s the name used in the codebase. You’ll find it abbreviated as TB.

                It’s a nice affordance that read does this, but it’s not actually required. POSIX allows read to return less than the requested size, so it would be entirely conformant to just truncate the request size and zero-extend the response.

                That’s true. The reason I included this in the text is because a previous version of the draft described a different idea: pre-DPMI versions of the code would switch to unreal mode to issue the DOS read and then executed all reads in one go, copying the individual chunks into extended memory without incurring a full protected/real mode transition for every short read. I didn’t actually look at the code for this, but when DJ reviewed the draft, he didn’t say it was wrong hehe, just that it wasn’t done this way anymore.

                This is why in DOS, you can write RENAME *.txt *.old

                Oh that’s an interesting point. But at least in Unix you have the option of escaping those wildcards so that they reach the program, and mv could be made to understand those and do the rename…

                I still don’t think I like the idea of a flat command line simply because then you can have inconsistencies in parsing behavior. Even if there had been a shared library to implement this consistently for all programs, I’m sure someone wouldn’t have liked it for their CLI and implemented their own version.

                But how does it know?!? Does it read some magic header or is there a different entry point or similar?

                Take a look at dosexec.c and look for the interpreters table. Based on file extensions, it determines which exec function to use. For exe files, it calls go32_exec, which does in fact inspect the file type to know whether it’s a COFF or not. And if it is a COFF, it tries to use “the proxy thing” I briefly mentioned in the article. If I understood it right (but I was tired to research this further when writing the text), it sets a environment magic variable ( !proxy with a leading space!) and then passes the address of the transfer buffer in argv as separate 16-bit segment and offset quantities.

                Thanks for reading!

                1. 3

                  I still don’t think I like the idea of a flat command line simply because then you can have inconsistencies in parsing behavior

                  You have that in UNIX as well. Some of it is even specified in POSIX. Take a look at how find defines patterns, for example. You also have some tools that want globs and some that want (different flavours of) regular expressions, an some of the latter use these regular expressions for filename matches, and require you to escape them. The Linux rename utility is completely inconsistent with pretty much anything.

                  In the VMS world, this expansion is done via a shared library function and so everything that’s doing the same kind of matching uses the same library routine and is consistent. The same is true of most mainframe operating systems.