1. 36
  1. 10

    It’s an ASM to LISP to full, language design. A prior submission didn’t have much discussion. Figured some new Lobsters might like the concept esp for incremental building of compilers/languages. Hacker News thread below had some interesting comments, too.

    https://news.ycombinator.com/item?id=9699065

    1. 2

      Very much so, thanks! I’ve been ruminating on Ken Thompson’s trust problem… I suppose starting in asm is one way around that issue.

      1. 5

        There’s two angles here: trusting your compiler or tools to not be subverted; trusting your hardware not to be. They require the same concepts to solve.

        Paul Karger invented the so-called Thompson attack in the MULTICS evaluation he sent to Thompson and the others. The solutions that he, Roger Schell, & other inventors of INFOSEC proposed were called high-assurance security. I just noticed via your link that they actually deployed first version of that (in SCOMP) before Thompson even wrote about that one problem. Karger et al didn’t play. The solutions collectively became the certification criteria for security called the TCSEC with its “Rainbow Books.”

        https://www.acsac.org/2002/papers/classic-multics.pdf

        https://en.wikipedia.org/wiki/Rainbow_Series

        So, here’s a summary of it restricted to relevant stuff. The system, including all privileged stuff, has to be formally specified in terms of requirements, design, & precise definition of safety/security for it. Some methods like VDM are close to programming languages if other stuff is hard. Basic points are no ambiguity is allowed, all states (including failure) are accounted for, and specs/models can map to the source code so their analyses are meaningful. The source itself is modular (info hiding a la Parnas), has interface checks for its assumptions (Dijkstra & Hoare), keeps trusted part to minimum (Hansen), is layered in a way with no looping of controls (predictability), uses safe language/CPU if possible (Barton), individual artifacts traceable to specific requirements/design (no dead code or backdoors), and everything tested against spec & security policy. This needed configuration management with physical and technical protection with all changes by any developer checked for failure or malice. System was cryptographically or physically transmitted to customers with option of on-site generation from source using standard, local tools w/ rerun of tests, proofs, etc.

        Those systems had low odds of subversion on top of NSA pentesters not accomplishing anything with 2-5 years of hitting a few. Quite the contrast to how Karger et al shredded MULTICS on every level. Methods proved out. Today, we have multiple paths to go from those lessons learned: (a) high-assurance compilers like CompCert or CakeML that go from source to object code with mathematical proof of correctness like above; (b) apply formal or rigorous informal methods to simplified interpreter like Wirth’s P-code, Mini-ML, or Scheme the rest is built in; © pencil and paper method of the same where it’s done in simple functions or FSM’s checked by computers then produced from source by hand; (d) any of above on diverse hardware from mutually-suspicious parties checking all results are the same for probabilistic security. After initial compiler is bootstrapped, use it to compile extended version of itself with optimizations added in modular, optional way. That it’s laborious work using unpopular, development style & tools is why most “solutions” in FOSS for anti-subversion use hugely-complicated compilers written in C/C++ w/ maybe reproducible builds. Many ways available, esp with optimization passes, to sabotage source of apps esp disabling security checks. Accidental and intentional were published as proof.

        For hardware, you have a similar problem but maybe easier to handle on formal verification side due to boolean & FSM’s. Been done several times from CLI’s stack (FM9001) to Rockwell’s AAMP7G to open-ish VAMP (DLX-like). My solution is a JOP- or Forth-like processor built on 350-500nm node that’s inspectable with a microscope against the supposed layout. It’s specified as abstract state machine, refined into FSM’s, and eventually into gates. Equivalence check each step formally and with tests. Design it formally or with careful, manual methods. The software to do the trusted steps of this process is verified so that just the initial input and final results verified by eye for hardware part. Run it in diverse toolchains on diverse hardware as in (d). Order 30-50 chips with random sample tested with logic analyzer & torn down for visual inspection. Rest probably OK. Physically protect them in many places so they can be trusted to make next set of hardware and software with each person’s pile of vetted chips adding to a distributed, assurance case when they output identical analyses or binaries. Do all the above in fast, state-of-the-art hardware & software for fast iterations with the safe stuff basically acting as a checker & root of trust for final production.

        So, that’s how you do high-assurance systems that counter subversion at hardware and software level. It’s how it’s been done since the 70’s in software with hardware assurance at CLI starting in the 1980’s. Commercial and FOSS tools available to help with the task along with even free books online. Commercial & FOSS products/projects used them successfully. Supplier diversity, obfuscation, & equivalent results most important on hardware side. Strong, design-for-review with storage integrity & transmission security on software end built on simple, safe tooling. Prior work includes CompCert, Verisoft (VAMP + C + OS), Rockwell (models + SPARK + AAMP7G), Myreen et al’s stacks, the OP link in terms of comprehensibility, and so on. People just got to apply what’s worked & improved since the 1960’s if you count basics of design & implementation. It sounds tough but you’re countering human malice in hardware and software plus regular failures.

        Meanwhile, I’m working on high-level schemes to do (b) to (d) above since my broken brain can’t handle learning formal verification just yet. I have three, potential ways to do it so far. Still revising them. I’ll pass it onto specialists once it looks doable.

        1. 2

          So if I got it right, even if you can corrupt the compiler, there will be a lot more barriers to pass in order to harm anything.

          Thanks for this context presentation and explanations.

          1. 2

            Ideally. Right now it’s opposite: they have tons of things they’ll break before trying to corrupt a compiler. The compilers will also break your stuff on occasion as a side effect of how they’re developed. Even Karger focused on all that stuff since it happens a lot. The compiler-compiler attack was a brief “that too” comment.

            You can message me if there’s anything you’re unclear on. I try to respond to all inquiries if they don’t take too much work.

        2. 2

          Welllll…consider if the processor is designed to recognize memory pages or instruction sequences and to do the needful.

          There’s no escape from this existential horror. Sorry. :( :(

          1. 1

            I’m hopeful that free and open implementations of Risc-V will help solve that problem. :)

            1. 1

              What do you mean by that?

              1. 5

                I understood it to be the suggestion that there could be a CPU backdoor designed to not be visible to tools at higher layers.

                Ken Thompson’s Reflections on Trusting Trust described a plausible mechanism to create a backdoor in a compiler which would notice when the compiler was compiling itself, and make sure the back door was inserted. By design, once this was present in the binary, it could be removed from the source while continuing to exist in all future binaries.

                For a CPU, the source code is VHDL, and it doesn’t compile itself, but certainly it’s present as a substrate… and at any rate, most CPUs are closed-source. Due to the difficulty of recognizing what high-level program is running from such low level inspection, a CPU backdoor would probably not try for the self-perpetuating property, but rather focus on implementing whatever actual payload the creator wanted.

                In the short term, I’d be more worried about the front doors of Intel Mangement Engine and the AMD equivalent. As long as there’s openly a mechanism for running signed, unauditable code on a separate core where the OS can’t see or interfere with it, I would expect any adversary who’d have the resources to create a back door won’t bother.

                1. 4

                  I thought as much but wanted to make sure before big reply. The two possibilities you brought up are common enough concerns that they’re worth addressing immediately. I posted main reply upthread so it or any replies don’t get buried in nesting.

                  https://lobste.rs/s/nognrl/creating_language_using_only_assembly/comments/0wqbrv#c_0wqbrv

        3. 6

          Wow, very cool and impressive. I feel like such a hack now, implementing my language in C. ?

          1. 4

            Your comment on thread for personal projects is why I submitted it. I was going to message you but figured you’d see it anyway. Maybe it help you.

            1. 3

              Thank you! Yes, this has given me new ideas. :)

          2. 3

            This is a great example of bootstrapping from assembly.

            Stories with similar links:

            1. Creating a language using only assembly language via Foxboron 8 months ago | 35 points | 9 comments
            2. Creating a language using only assembly language via pushcx 7 years ago | 34 points | 3 comments