This flow (compiler, assembler, linker) predates UNIX by quite a long time. It was created by Mary Allen Wilkes in the early ‘60s. She’d switched careers to law before UNIX came along.
It’s worth noting that the Plan 9 toolchain (which Go uses) has slightly different layering and even within that flow there’s a lot of variation. For example, the GCC MIPS back end emits a large number of pseudo instructions that the assembler then expands based on different target settings, whereas most compilers generate assembly that generate assembly that maps directly to machine code. The big difference in the Plan 9 toolchain relates to how relocations are handled. Most instruction sets don’t have immediate offsets for load and store instructions that allow you to span the entire address range. The compiler or assembler has the choice of either emitting an instruction with a short offset and hoping that it’s in range (and getting link failures if it isn’t) or a sequence of instructions that can materialise a long range but result in some wasted work if the top half of the offset is 0 (which it almost always is). For jumps, the linker can usually create trampolines to fix up the cases where the compiler did a short jump (2^18 bytes displacement is common for jump instructions) but the target was too big. In the Plan 9 assembler, relocatable instructions remain as pseudos right into the linker. The linker can then choose between expanding them to a multiple-instruction sequence or moving the target closer.
As with so many other things, the early designs were driven by memory constraints. The FORTRAN compiler for the IBM 1401 ran on a system where 30,000 characters of memory was the expensive high-end option. Each step had to maintain as little state as possible and produce streaming output, reading a paper tape or deck of punch cards (if you had bought the really expensive options) and producing output on another tape or deck of cards, which could be consumed by the next step. Splitting the toolchain pipeline like this made it possible to build compilers. This was still a constraint on the PDP-7 and so ended up being baked into the C model (e.g. allowing the C preprocessor to be a separate program, the C symbol resolution model that only searches backwards so you can stream the output from cpp through cc and cc only needs to maintain a symbol table for declarations that it’s seen and state for the current function, streaming assembly for each statement out one at a time if it isn’t doing optimisation).
Modern toolchains have quite different layering. The assembler step is entirely pointless. The compiler must have an internal data structure representing the instruction set, generating a sequence of encoded instructions in binary form is not any harder than generating a text serialisation. As a result, LLVM skips this intermediate step entirely. With LTO, the late stages of the compiler all live in the linker.
If a new language doesn’t have the same file-is-compilation-unit abstraction, it’s possible to combine a lot of these steps and gain some optimisation advantages without losing the ability to do parallel compilation.
The assembler step is entirely pointless.
The assembler step is entirely pointless.
However, having the option to emit assembler is very useful every so often. Being able to reliably disassemble binaries is useful, too, but disassembly doesn’t preserve all of the information, especially if you’re tracking down a bug which only appears in release builds, as opposed to debugging builds.
That’s true, though if you were to design the output specifically for that purpose, it would likely be more verbose than the assembler that compilers generate. The LLVM MCInst dumps are often more interesting for understanding behaviour of the generated code. It includes things like which values are undefined (when the compiler is just reusing a register or stack slot because it’s allowed to use any value and so the generated assembly will just have register reads that don’t make sense) and the expected data flow. The pre-regalloc MCInst dumps are sometimes the most informative. If assembly output is intended for debugging, rather than as part of the compilation flow, there’s a lot that you can do.
GCC does have an -fverbose-asm flag that provides a fair amount of extra information interspersed with the generated assembly (source lines, temporary-value numbers…), for what it’s worth. (And if you really want to get into its guts there are of course lots of -fdump-* options to inspect its IR at whatever point you choose.)