1. 9
  1.  

  2. 3

    I disagree with the premise of “Undefined Behavior != Unsafe Programming”. Regehr argues:

    Undefined behavior is the result of a design decision: the refusal to systematically trap program errors at one particular level of a system. The responsibility for avoiding these errors is delegated to a higher level of abstraction. For example, it is obvious that a safe programming language can be compiled to machine code, and it is also obvious that the unsafety of machine code in no way compromises the high-level guarantees made by the language implementation. Swift and Rust are compiled to LLVM IR; some of their safety guarantees are enforced by dynamic checks in the emitted code, other guarantees are made through type checking and have no representation at the LLVM level. Either way, UB at the LLVM level is not a problem for, and cannot be detected by, code in the safe subsets of Swift and Rust.

    I agree with the problem presented: some operations have pre-conditions on safety, and there needs to be a mechanism to separate checking those pre-conditions (whether dynamically or statically) and executing the operation itself. I disagree that C-style UB is the solution here.

    What would I propose? Extending LLVM’s simple type system to dependent types, and having compilers actually produce proof terms (which can be erased for no performance penalty) to justify use of unsafe operations. If the check needs to be done dynamically, then pipe the proof produced from the check to where the operation is being done. If the check is done statically, then at some point the compiler proved that this was safe–so modify the compiler to actually produce that proof! Don’t just throw it away!

    The FP community went through a similar phase in the 90’s and mid 00’s. After an extremely complicated type inference step, frontends would throw away all evidence constructed that the program was type safe, and would operate on an untyped intermediate form (I think OCaml still works like this, but I’m not sure). It turns out this is a bad idea, you lose nothing by keeping the proof terms around, and you gain a lot of confidence in your compiler if it can preserve type-safety through N stages of optimization. A similar argument applies to LLVM: by keeping proof terms around, you can detect (more) ICEs, which would be quite beneficial for LLVM.

    1. 2

      and having compilers actually produce proof terms (which can be erased for no performance penalty) to justify use of unsafe operations.

      Not quite sure how that would work, because often users of the language will write deliberately-unsafe code for doing low level things or whatever, and then the burden of providing a proof is on them, which needs language changes.

      Besides, by adding dependent types here there’s a good chance it becomes even harder to transform the code. LLVM IR is an IR that is meant to be shuffled around and such. The limitations of the IR typesystem limit its ability to be shuffled around. This is both a good and bad thing, because this may prevent invalid optimizations, but it may also prevent valid ones. I’m not sure how easy such a dependently typed system is to transform; in such a case they may end up with a second-level IR which is easier to transform, which reintroduces the same problem.

      But yeah, it’s an interesting solution, which could work (it’s just not immediately obvious to me that it will).

      1. 1

        Not quite sure how that would work, because often users of the language will write deliberately-unsafe code for doing low level things or whatever, and then the burden of providing a proof is on them, which needs language changes.

        Generally dependently typed systems have an escape hatch, Coq has the “Admit” statement, Idris has the believe_me term. Unsafe languages can use these to bypass proof obligations, and safe languages can use them to incrementally move over to a safer intermediate representation.

        Besides, by adding dependent types here there’s a good chance it becomes even harder to transform the code. LLVM IR is an IR that is meant to be shuffled around and such. The limitations of the IR typesystem limit its ability to be shuffled around. This is both a good and bad thing, because this may prevent invalid optimizations, but it may also prevent valid ones. I’m not sure how easy such a dependently typed system is to transform; in such a case they may end up with a second-level IR which is easier to transform, which reintroduces the same problem.

        It does indeed make transformations more difficult, because your manipulations essentially have to encode their own proof of type safety. You’ll need some sort of logic/proof subsystem for manipulating open terms and yeah the limitations of that logic will limit valid optimizations. Although you can again fall back to “admit” and fill in the gaps of your logic/transformation later.

        1. 1

          Generally dependently typed systems have an escape hatch, Coq has the “Admit” statement, Idris has the believe_me term. Unsafe languages can use these to bypass proof obligations, and safe languages can use them to incrementally move over to a safer intermediate representation.

          This loops back into the original problem. Now you have a statement marked believe_me – what transformations are allowed on that? You would still need some kind of undefined behavior for this, though perhaps at a higher level.

          Then again, “some kind of UB at a higher level” is something we have already, so it’s still a strict improvement. Hmm.