Good write-up. Early on, I’m not buying the second part. Things like this:

“For example, if you have a typed functional language with integers and function types, then building an abstract interpretation to do a sign analysis is simplified by the fact that you don’t have to worry about function values flowing into the integer domain.”

My reaction is that the things the author describes with type systems look like heuristics/annotations/constraints, static analysis can have these, and so it can do the same things. Then, we’re back to them being instances of the same concept like in the first set of claims.

I’m pretty sure I’ve seen tools use heuristics to look into or ignore certain kinds of things. Still prefer a specialist that builds these tools double check my claim about constraints.

It’s confusing why you would compare a type system which can fully describe a thing, to a heuristic which is partial coverage. The thing I would expect you to not buy is that one could actually do the labor to represent the complexity of a domain like rust with types. However if you did it, you would know because the type system could check for soundness. In the same way that if a function can take an A | B, you only have to handle the A, and B case and you have a fully well described function.

Now perhaps you might still want a specialist to come in afterwards to essentially check the soundness of the compiler, however their attack surface is vastly reduced. It’s like trying to find buffer overflow exploits in C# that doesn’t use the unsafe keyword. Its possible yes, but its a great deal harder than finding it in C code and happens when there are bugs in the compiler and not your specific program.

The thing I would expect you to not buy is that one could actually do the labor to represent the complexity of a domain like rust with types.

I don’t know math like proofs use or type theory. I work with those papers abstractly on a meta-research level across all the papers I read. If I was a hands-on practitioner, I would be calling out different things most likely. Just throwing that in there in case you haven’t seen me say it.

I’m honestly surprised that you would find any value in writings like this, then. The entire premise revolves around a deep connection between types and abstract interpretation, and the blog post starts out by saying that it is collecting context and folklore around that connection besides that in Types as Abstract Interpretations.

Again, let’s start by imagining a simply-typed lambda calculus whose types are integers and functions. First, the easy idea: we can generalize integers by any property of integers, and we can mechanically derive the corresponding abstract interpretation. Some properties:

Integers are negative, zero, or positive. This yields the domain of signs, mentioned before and in the post. This domain is a classic example, and sometimes is even useful.

Integers are even or odd, yielding a Boolean property.

Integers are prime or non-prime. This domain can be useful occasionally.

In fact, every integer has a unique prime factorization, which can again be abstracted over.

But there are also, as the blog post discusses, ways to view types as grammatical, as limiting not just the possible operations and information, but also the possible valid ways to use the values of types. Again on the integers:

Integers can be fixed to any finite domain by taking some natural number as a modulus. In particular, when we fix powers of (powers of) two, we get words, as in the typical CPU register.

Integers can be highly predictable or not. The typical integer obeys a form of Zipf’s Law for predicting its bits, but some integers are high-entropy and defy this law. This leads to type-driven analysis for whether a value has been hashed or is otherwise cryptographic material.

Combining earlier bullets, the Booleans can be defined on the integers by fixing 2 as a modulus; this is the even/odd property.

Does this make sense? Instead of storing integers, we store only their signs, and then run the interpreter as normal, getting an abstract result.

The main insight of types as abstract interpretations is that we can start from an untyped lambda calculus, rather than a simply-typed lambda calculus, and then imagine abstract interpretations as mappings from untyped values to typed values. We can send every untyped value to either an integer, a function, or some other thing. Thus we can send the entire interpreter too! Instead of storing untyped boxed values, we store only integers, functions, or opaque tags indicating some mysterious value.

Perhaps this sounds familiar, if you know the internals of any Smalltalk relative. They tend to have class-based designs, where there is a class of integers, a class of Unicode strings, a class of sequences, a class of maps, a class for each builtin behavior, and finally a class for user-defined values and behaviors. This seemingly-ad-hoc hierarchy is actually precisely an abstract interpretation of an object-oriented un(i)typed domain. When Python or Ruby print user-defined objects as opaque identifiers, they are indicating that the abstract values that they represent are not rich enough to have a printing behavior, because the type system is not rich enough to force them to have such a behavior, because the abstraction isn’t rich enough. (I did not mistype here; this is an if-and-only-if, two-way correspondence.)

Ah thank you yes that makes a great deal more sense. I’m a lay person who dabbles in the type theoretic space, but the foundations are relatively sound. Propositions as types are well evidenced, and code as propositions is also well evidenced (curry howard correspondence). If I understand the article, and I may not… given that we have a sound foundation for Code -> Propositions, and Propositions -> Types, we can therefore represent Code -> Types. This allows us to do the kind of static checking we do with types but on code, and would add additional layers of static analysis that were previously intractable.

Suppose that we have a typed functional language where every type is either the type of integers, or a function from one type to another. Say, a simply-typed lambda calculus. Then what the example paragraph is saying is that we can also have a typed functional language where every type is either the type of integer signs (positive, negative, zero), or a function from one type to another, and we can make a straightforward structured mapping which sends the first language to the second, by sending every integer to its sign and every function to a recursively-remapped version of itself. Correctness is not a concern; the mapping is correct up to abstract interpretation, and we’re trying to do abstract interpretation, so we’re done.

I think that the reason that types-as-grammar isn’t popular is because popular programming languages don’t have much room for additional abstract properties. Proving arbitrary facts usually requires runtime involvement, and so not everything can be done by the compiler and linter. Taken in combination, these two ideas suggest that the typical popular programming language (say, C++) might have limited ability to prove custom facts to the compiler (constexpr) but lacks a general-purpose subsystem for creating entirely new type systems.

If 3(i)is undefined, then we should assume f(x){for(i=0;i<69;++i)x(i)} to be undefined for f(3) but what then if we define 3(i)[1]? Do we just have this problem someplace else?

[1]: q does this; applying to an integer means write (to file descriptor/handle)

If we have a 3(i) then we have a 3(i) and it’s not undefined. Perhaps 3(i) applies i 3 times, giving (i + i + i). Depends on how you defined what ℤ(x) is.

I only really understood about 80% of this by volume, but I’m still glad I read it.

Good write-up. Early on, I’m not buying the second part. Things like this:

“For example, if you have a typed functional language with integers and function types, then building an abstract interpretation to do a sign analysis is simplified by the fact that you don’t have to worry about function values flowing into the integer domain.”

My reaction is that the things the author describes with type systems look like heuristics/annotations/constraints, static analysis can have these, and so it can do the same things. Then, we’re back to them being instances of the same concept like in the first set of claims.

I’m pretty sure I’ve seen tools use heuristics to look into or ignore certain kinds of things. Still prefer a specialist that builds these tools double check my claim about constraints.

It’s confusing why you would compare a type system which can fully describe a thing, to a heuristic which is partial coverage. The thing I would expect you to not buy is that one could actually do the labor to represent the complexity of a domain like rust with types. However if you did it, you would know because the type system could check for soundness. In the same way that if a function can take an A | B, you only have to handle the A, and B case and you have a fully well described function.

Now perhaps you might still want a specialist to come in afterwards to essentially check the soundness of the compiler, however their attack surface is vastly reduced. It’s like trying to find buffer overflow exploits in C# that doesn’t use the unsafe keyword. Its possible yes, but its a great deal harder than finding it in C code and happens when there are bugs in the compiler and not your specific program.

I don’t know math like proofs use or type theory. I work with those papers abstractly on a meta-research level across all the papers I read. If I was a hands-on practitioner, I would be calling out different things most likely. Just throwing that in there in case you haven’t seen me say it.

I’m honestly surprised that you would find any value in writings like this, then. The entire premise revolves around a deep connection between types and abstract interpretation, and the blog post starts out by saying that it is collecting context and folklore around that connection besides that in Types as Abstract Interpretations.

Again, let’s start by imagining a simply-typed lambda calculus whose types are integers and functions. First, the easy idea: we can generalize integers by any property of integers, and we can mechanically derive the corresponding abstract interpretation. Some properties:

But there are also, as the blog post discusses, ways to view types as

grammatical, as limiting not just the possible operations and information, but also the possible valid ways to use the values of types. Again on the integers:Does this make sense? Instead of storing integers, we store only their signs, and then run the interpreter as normal, getting an abstract result.

The main insight of types as abstract interpretations is that we can start from an untyped lambda calculus, rather than a simply-typed lambda calculus, and then imagine abstract interpretations as mappings from untyped values to typed values. We can send every untyped value to either an integer, a function, or some other thing. Thus we can send the entire interpreter too! Instead of storing untyped boxed values, we store only integers, functions, or opaque tags indicating some mysterious value.

Perhaps this sounds familiar, if you know the internals of any Smalltalk relative. They tend to have class-based designs, where there is a class of integers, a class of Unicode strings, a class of sequences, a class of maps, a class for each builtin behavior, and finally a class for user-defined values and behaviors. This seemingly-ad-hoc hierarchy is actually precisely an abstract interpretation of an object-oriented un(i)typed domain. When Python or Ruby print user-defined objects as opaque identifiers, they are indicating that the abstract values that they represent are not rich enough to have a printing behavior, because the type system is not rich enough to force them to have such a behavior, because the abstraction isn’t rich enough. (I did not mistype here; this is an if-and-only-if, two-way correspondence.)

Ah thank you yes that makes a great deal more sense. I’m a lay person who dabbles in the type theoretic space, but the foundations are relatively sound. Propositions as types are well evidenced, and code as propositions is also well evidenced (curry howard correspondence). If I understand the article, and I may not… given that we have a sound foundation for Code -> Propositions, and Propositions -> Types, we can therefore represent Code -> Types. This allows us to do the kind of static checking we do with types but on code, and would add additional layers of static analysis that were previously intractable.

Suppose that we have a typed functional language where every type is either the type of integers, or a function from one type to another. Say, a simply-typed lambda calculus. Then what the example paragraph is saying is that we can also have a typed functional language where every type is either the type of integer signs (positive, negative, zero), or a function from one type to another, and we can make a straightforward structured mapping which sends the first language to the second, by sending every integer to its sign and every function to a recursively-remapped version of itself. Correctness is not a concern; the mapping is correct up to abstract interpretation, and we’re trying to do abstract interpretation, so we’re done.

I think that the reason that types-as-grammar isn’t popular is because popular programming languages don’t have much room for additional abstract properties. Proving arbitrary facts usually requires runtime involvement, and so not everything can be done by the compiler and linter. Taken in combination, these two ideas suggest that the typical popular programming language (say, C++) might have limited ability to prove custom facts to the compiler (

`constexpr`

) but lacks a general-purpose subsystem for creating entirely new type systems.If

`3(i)`

isundefined, then we should assume`f(x){for(i=0;i<69;++i)x(i)}`

to be undefined for`f(3)`

but what then if we define`3(i)`

[1]? Do we just have this problem someplace else?[1]: q does this; applying to an integer means write (to file descriptor/handle)

If we have a 3(i) then we have a 3(i) and it’s not undefined. Perhaps 3(i) applies i 3 times, giving (i + i + i). Depends on how you defined what ℤ(x) is.