1. 39
  1. 4

    It would probably be easier to start with classic Pascal, and a hand-built p-code interpreter.

    Although it’d be funnier to start with a c-subset compiler written in BASIC :)

    1. 1

      I think this is misunderstanding the problem - the goal is to get from the simplest possible thing that can be written in assembly and verified up to something substantial. There’s no way to solve this by starting with Pascal or BASIC - those can be intermediate steps, but only by first having a tiny handwritten implementation of those. It is true that BASIC used to be handwritten assembly, so getting a tiny C compiler in 6502 BASIC might be a valid route.

      The article touched on the idea that assembly is just an interface edge to hardware and the hardware cannot be intrinsically trusted. I think the next step which it missed is that bootstrapping can start on any platform, and if it’s possible to get a Z80 (or other simple CPU) to cross compile a minimal toolchain, that can be a valid path. As soon as that toolchain runs on a “modern” CPU its output cannot be trusted, although if the goal is to emit code that runs on a “modern” CPU the final result can’t really be trusted either. Finding the bottom turtle still doesn’t solve the ME problem.

      1. 2

        “I think this is misunderstanding the problem - the goal is to get from the simplest possible thing that can be written in assembly and verified up to something substantial.”

        That’s close to what they did with P-code. The Pascal/P system used a combo of a Pascal compiler, a standard library, and P-code they both targeted. Each port just had to port the P-code interpreter. It was easy enough for non-compiler experts that Pascal was ported to 70+ architectures in two years per Wirth.

        They could’ve hand-coded the compiler and standard library in P-code for traceability if they wanted. Pascal itself was designed partly to make the compiler easy to build. Pascal-S was a variant focusing on compiler simplicity. There’s signifigant overlap here with goals of bootstrapping. Let’s test that by looking at what I proposed for Pascal-inspired bootstrapping.

        Like Lisp taught, we work top down with HLL code while bringing bottom layer upward toward expressing HLL concepts. Start with a C interpreter with minimal features of it: scalar data, expressions, structs, functions, and basic I/O. Implement it in minimalist C followed by assembly hand-matched against it. Use this to express first C compiler (eg TCC). You can either write a compiler from C to this language or hand-convert code to it.

        Like Pascal/S, the compiler can be ultra-simple in design but easier since target is HLL. You don’t even write a tool for generating assembly or binary: you just use existing, simified designs made for bootstrapping targeted to the C-like VM and/or get it from the compiler after porting the whole thing. Had I stayed in it, my plan was to cheat by making an untrusted, automated converter from TCC’s C to that VM language that was visually inspectable by non-experts. If it passed inspection, trust was reduced to an interpreter that could be someone’s weekend project. All much simpler, more democratic, and more consistent with C skills than Scheme or hex.

        Far as hardware, use a verified CPU like VAMP (DLX-style) synthesized with a method like Baranov’s using FSM’s. The algorithms could be implemented on same VM approach. My method also allows diverse hardware with outputs having to match on each. Standardized computation on diverse hardware, esp pre-1999, negates most concerns about hardware backdoors. Even more if computations run on embedded MCU’s that had no transistors to spare for anything in supplier’s commercial interest, much less compiler subverters. :)

        Note: My VM concept itself can be implemented in hardware. Burroughs B5000 implemented a high-level ISA in hardware partly for security back in the 1960’s.

        I’ll tag @cadey, @sjamaan, @Sophistifunk to include them in this in case they find any of it useful.

        1. 2

          Burroughs B5000 implemented a high-level ISA in hardware partly for security back in the 1960’s.

          For years I’ve felt that the loss of this branch of hw is a crying shame.

          1. 3

            It’s not lost, it just slept for a while. CHERI is heavily inspired by the B5500 architecture and my goal with it was to produce a RISC version of the Burroughs architecture: you don’t need the hardware to differentiate between floating point and integer types, because it doesn’t affect security and you can do it in software, but you do need it to enforce pointer provenance and memory safety.

            Arm is building us a test chip that just taped out, so we’ll get to experiment with this in a high-end pipeline soon.

            1. 1

              This is excellent news.

            2. 1

              Indeed. Until maybe return-oriented programming, it would’ve automatically prevented most code injections that hackers used against sloppy code. Even pro code that occasionally overlooks something would have same benefit.

              1. 2

                ROP is also blocked in many forms. In a CHERI system, the return address that you spill on the stack is a tagged value. If an attacker injects data, they can’t inject pointers, they need to find other tagged pointers that they can copy over a return address (spolier). If the tagged values also provide permissions (as CHERI does but the B5500 doesn’t) then it’s even harder: they need to find an existing executable pointer (and, with Morello, one that is permitted to be used as a return address) and copy it over a spilled return address.

      2. 4

        Much as I love Scheme, I’m a bit surprised they mess around with mutually self-hosting C and Scheme compilers. It would make more sense to build something minimal up from scratch directly in Assembly, like a Forth. A Forth core probably is small enough to assemble manually, and you can build it out from there, so you have code that’s verifiable without messing around too much. Then you could implement a basic C compiler in Forth and go up to TCC and GCC from there.

        1. 1

          According to the author’s comment on hackernews, they found forth harder: https://news.ycombinator.com/item?id=27577123