Nice idea! The thing I love about C is that all sorts of programming language can talk to its ABI and that it fits well with low-level programming.
However, it is a cumbersome and unsafe language. This seems like a very nice solution. You can write in a much safer language while producing C code which does not look too alien relative to what you wrote
I have been interested in Rust but thinking it looks a tad too complicated. This may be a happy inbetween.
Just like Javascript transpiled languages are good for the frontend web people who don’t want to use Javascript but can’t move from Javascript, and VMs are good for the operating systems people who don’t want to use Unix but can’t move from Unix, ZZ is good for the closer-to-hardware people who don’t want to use C but can’t move from C. Never mind the imperfect analogy, is there a name for this class of ‘solutions’?
The only thing I wish about it, is that the syntax is more C-like and less Rust-like. I also wish it didn’t depend on the Rust runtime, other than that it’s nice, and I’ve been thinking about doing something similar for a while, although syntax-wise more in the C-with-ML direction rather than the Rust direction.
I actually was thinking about how a language like this would like some days ago, kudos for your work.
The standard library is fully stack based and heap allocation is strongly discouraged. ZZ has convenience tools to deal with the lack of flexibility that comes with that, such as checked tail pointers.
I wonder how straight forward it would be to put optional borrow checking and emit destructors in C.
I see it uses SMT solvers to check the annotations. I know that Z3 is quite impressive, but am interested in how scalable this would be. Does the language design ensure these checks are kept ‘local’ (i.e. adding N definitions/calls adds O(N) time to the build), or can the constraints interact over a longer distance (potentially causing quadratic or exponential time increase)? I’d also like to know if there are common failure-modes, e.g. where some common code pattern can’t be solved without extra annotations.
For example, the classic “hello world” for dependent types is to define Vector n t (lists containing n elements of type t), and the function append : Vector n t -> Vector m t -> Vector (plus n m) t whose output length is the sum of the input lengths. Woe betide anyone who writes plus m n instead, as they’d have to prove that plus commutes!
With SMT based verification, each property is proved separately and assumed true when proving others. So there’s no problem with locality. The problems are very different vs. dependent types. SMT is undecidable, so unless the tool is very careful about using decidable logic (like Liquid Haskell is), you’ll very often see that the solver just times out with no solution, which is pretty much undebuggable. It’s difficult to prove anything through loops (you have to write loop invariants), etc.
In this case, SMT isn’t undecidable: a friend pointed out the write!(solver,"(set-logic QF_UFBV)\n").unwrap(); line to me, which means “quantifier free with uninterpreted functions and bitvectors”. It’s decidable, just super hard.
Yeah, if you can get away with quantifier-free, everything is much better. But you can’t do much with C code without quantifiers. zz probably doesn’t support verifying loops.
Does the language design ensure these checks are kept ‘local’ (i.e. adding N definitions/calls adds O(N) time to the build), or can the constraints interact over a longer distance (potentially causing quadratic or exponential time increase)?
I’m also interested in this in terms of API stability. If interactions happen at a longer distance it can be difficult to know when you are making a breaking change.
Z3 “knows” addition commutes, so that’s no problem. Usual trouble with dependent type is that you define addition yourself, so it is hard for Z3 to see your addition is in fact addition.
Yes, I gave that example for dependent types (the type requiring a different sort of branching than the computation) since I’m more familiar with them.
The general question was about the existence of pathological or frustrating cases in this SMT approach (either requiring much more time, or extra annotations) arising from very common assertion patterns, or where the/a “obvious” approach turns out to be an anti-pattern.
This looks like a slightly more palatable alternative to ATS. (I suspect ATS enables more powerful proofs.) I will definitely be looking into it. I love the idea of ATS, but have found actually writing code in it rather daunting.
(While we’re on the subject of practicality of safe C alternatives: I love Rust, but in many cases where I’ve tried to use it, I’ve found that there’s a few days’ worth of yak shaving up front trying to wrap existing C libraries/APIs that I need to use to solve whatever problem it is; then wrapping those unsafe interfaces with safe, rustic libraries, etc. - in the end I’ve often given up and banged out some C++ in a day or two and made up for it with fuzzing, ASan, UBSan, and unit tests. Of course, this is probably to a large extent a function of trying to use libraries that other rustaceans haven’t used [much] and having 20 years of C++ experience…)
In theory, you could do it with any static language if you write a verification, condition generator to integrate with Why3 or Boogie. They do the proving. A language with metaprogramming might need an initial pass that does compile-time evaluation. Metaprogramming and dynamic languages are difficult in general. Worst case, you can use subsets and/or annotations to aid the analyses.
That reminds me of the different approaches to handling declarative dependencies in Nix (in my case that’s mostly Haskell libraries with version bounds):
One approach is to have our Nix function (e.g. buildHaskellPackage) implement a constraint solver, which reads in version bounds from each version of each dependency, and picks a mutually-compatible set of dependencies to build.
A more practical approach is to just shell-out to an existing solver (cabal in Haskell’s case) and parse its output.
Whether such build-time analysis is performed “within” the language via macros, or externally as a separate step of the build process, the same solvers can be called and the end result is the same (for checkers like this, there’s also nothing to parse: if a solution is found then we throw it away and carry on to the next step, if not we exit with an error message).
I used to dislike the thought of performing I/O at compile-time, but I’m seeing more and more compelling use-cases: shelling out to competent solvers is one; “type providers” like F# are another (where types can be generated from some external source of truth, like a database schema; ensuring out-of-date code fails to build). One I’ve used recently was baking data into a binary, where a macro read it from a file (aborting on any error), parsed it, built a datastructure with efficient lookups, and wrote that into the generated AST to be compiled. This reduced the overhead at runtime (this command was called many times), and removed the need for handling parse errors, permission denied, etc.
Yeah, integration with external tool can help in all kinds of ways. The simplest is static analysis to find code-level issues the compiler can’t find. I really like your idea of baking the data into a binary. It’s like the old idea of precomputing what you can mixed with synthesis of efficient, data structures. That’s pretty awesome.
Actually, I’ve been collecting, occasionally posting, stuff like that for formal methods. Two examples were letting someone specify a data structure in functional way or modify/improve loops. Then, an external tool does a pass to make an equivalent, high-performance, imperative implementation of either. Dumps it out as code. Loop and data structure examples. Imperative/HOL’s technique, if generalizable, could probably be applied to languages such as Haskell and Rust.
Because formally provable is too vague:
ZZ guarantees that compiled programs:
The module system, tail types and the syntax are quite nice, imo :)
Nice idea! The thing I love about C is that all sorts of programming language can talk to its ABI and that it fits well with low-level programming. However, it is a cumbersome and unsafe language. This seems like a very nice solution. You can write in a much safer language while producing C code which does not look too alien relative to what you wrote
I have been interested in Rust but thinking it looks a tad too complicated. This may be a happy inbetween.
As an aside, what’s this called?
Just like Javascript transpiled languages are good for the frontend web people who don’t want to use Javascript but can’t move from Javascript, and VMs are good for the operating systems people who don’t want to use Unix but can’t move from Unix, ZZ is good for the closer-to-hardware people who don’t want to use C but can’t move from C. Never mind the imperfect analogy, is there a name for this class of ‘solutions’?
The only thing I wish about it, is that the syntax is more C-like and less Rust-like. I also wish it didn’t depend on the Rust runtime, other than that it’s nice, and I’ve been thinking about doing something similar for a while, although syntax-wise more in the C-with-ML direction rather than the Rust direction.
Does it depend on the Rust runtime? The readme states explicitly that it just depends on having a C compiler for the target platform.
We need “top” implementation in it.
Arvid saw this joke on the Reddit thread, and went ahead and implemented it:
zztop
Awesome!
I actually was thinking about how a language like this would like some days ago, kudos for your work.
I wonder how straight forward it would be to put optional borrow checking and emit destructors in C.
I see it uses SMT solvers to check the annotations. I know that Z3 is quite impressive, but am interested in how scalable this would be. Does the language design ensure these checks are kept ‘local’ (i.e. adding N definitions/calls adds O(N) time to the build), or can the constraints interact over a longer distance (potentially causing quadratic or exponential time increase)? I’d also like to know if there are common failure-modes, e.g. where some common code pattern can’t be solved without extra annotations.
For example, the classic “hello world” for dependent types is to define
Vector n t
(lists containingn
elements of typet
), and the functionappend : Vector n t -> Vector m t -> Vector (plus n m) t
whose output length is the sum of the input lengths. Woe betide anyone who writesplus m n
instead, as they’d have to prove thatplus
commutes!With SMT based verification, each property is proved separately and assumed true when proving others. So there’s no problem with locality. The problems are very different vs. dependent types. SMT is undecidable, so unless the tool is very careful about using decidable logic (like Liquid Haskell is), you’ll very often see that the solver just times out with no solution, which is pretty much undebuggable. It’s difficult to prove anything through loops (you have to write loop invariants), etc.
In this case, SMT isn’t undecidable: a friend pointed out the
write!(solver,"(set-logic QF_UFBV)\n").unwrap();
line to me, which means “quantifier free with uninterpreted functions and bitvectors”. It’s decidable, just super hard.Yeah, if you can get away with quantifier-free, everything is much better. But you can’t do much with C code without quantifiers. zz probably doesn’t support verifying loops.
I agree, unless ZZ has its own induction system for loops, using SMT only to prove the inductive hypothesis. It’s a lot of work though.
I’m also interested in this in terms of API stability. If interactions happen at a longer distance it can be difficult to know when you are making a breaking change.
Z3 “knows” addition commutes, so that’s no problem. Usual trouble with dependent type is that you define addition yourself, so it is hard for Z3 to see your addition is in fact addition.
Yes, I gave that example for dependent types (the type requiring a different sort of branching than the computation) since I’m more familiar with them.
The general question was about the existence of pathological or frustrating cases in this SMT approach (either requiring much more time, or extra annotations) arising from very common assertion patterns, or where the/a “obvious” approach turns out to be an anti-pattern.
This looks like a slightly more palatable alternative to ATS. (I suspect ATS enables more powerful proofs.) I will definitely be looking into it. I love the idea of ATS, but have found actually writing code in it rather daunting.
(While we’re on the subject of practicality of safe C alternatives: I love Rust, but in many cases where I’ve tried to use it, I’ve found that there’s a few days’ worth of yak shaving up front trying to wrap existing C libraries/APIs that I need to use to solve whatever problem it is; then wrapping those unsafe interfaces with safe, rustic libraries, etc. - in the end I’ve often given up and banged out some C++ in a day or two and made up for it with fuzzing, ASan, UBSan, and unit tests. Of course, this is probably to a large extent a function of trying to use libraries that other rustaceans haven’t used [much] and having 20 years of C++ experience…)
Could something like this be accomplished in Zig via comptime?
In theory, you could do it with any static language if you write a verification, condition generator to integrate with Why3 or Boogie. They do the proving. A language with metaprogramming might need an initial pass that does compile-time evaluation. Metaprogramming and dynamic languages are difficult in general. Worst case, you can use subsets and/or annotations to aid the analyses.
That reminds me of the different approaches to handling declarative dependencies in Nix (in my case that’s mostly Haskell libraries with version bounds):
buildHaskellPackage
) implement a constraint solver, which reads in version bounds from each version of each dependency, and picks a mutually-compatible set of dependencies to build.cabal
in Haskell’s case) and parse its output.Whether such build-time analysis is performed “within” the language via macros, or externally as a separate step of the build process, the same solvers can be called and the end result is the same (for checkers like this, there’s also nothing to parse: if a solution is found then we throw it away and carry on to the next step, if not we exit with an error message).
I used to dislike the thought of performing I/O at compile-time, but I’m seeing more and more compelling use-cases: shelling out to competent solvers is one; “type providers” like F# are another (where types can be generated from some external source of truth, like a database schema; ensuring out-of-date code fails to build). One I’ve used recently was baking data into a binary, where a macro read it from a file (aborting on any error), parsed it, built a datastructure with efficient lookups, and wrote that into the generated AST to be compiled. This reduced the overhead at runtime (this command was called many times), and removed the need for handling parse errors, permission denied, etc.
Yeah, integration with external tool can help in all kinds of ways. The simplest is static analysis to find code-level issues the compiler can’t find. I really like your idea of baking the data into a binary. It’s like the old idea of precomputing what you can mixed with synthesis of efficient, data structures. That’s pretty awesome.
Actually, I’ve been collecting, occasionally posting, stuff like that for formal methods. Two examples were letting someone specify a data structure in functional way or modify/improve loops. Then, an external tool does a pass to make an equivalent, high-performance, imperative implementation of either. Dumps it out as code. Loop and data structure examples. Imperative/HOL’s technique, if generalizable, could probably be applied to languages such as Haskell and Rust.
[Comment removed by author]