1. 4

This seems like a super interesting article and I’m currently digging into it. At the outset, it isn’t clear to me why there is a space between the name of the function and its arguments.

The article says “Functions are called with a space between the name and parenthesis. This is done to simulate calling functions by juxtaposition, and it’s helpful for calling curried functions.” but that doesn’t actually help me understand anything. What does “calling functions by juxtapostiion” mean, and why is the space helpful for calling curried functions?

1. 3

Hey, glad you find it interesting. The space isn’t required, and you could call it without a space there, but in languages like Haskell where currying is more common than JavaScript, functions are called by “juxtaposition”, where arguments are separated by a space. In JavaScript, calling a function requires parenthesis, and calling curried functions would look like `f(x)(y)(z)`. I added that note there just to clarify the notation because it’s not common to call functions this way, but I think `f (x) (y) (z)` looks nicer.

I’ll edit it to make it clear that it’s not required and purely there for aesthetic purposes, thank you!

1. 2

It’s not much. In some math texts it is common to write “f x” instead of “f(x)” - that is the juxtaposition which means just “put things next to each other”. That notation is nice, less typing and easy to read, except when there are multiple arguments and function valued expression or other forms of ambiguity. Does f x g y mean f(x,g,y) or f(x)(g (y)) or some other variant. “Currying” just means packing multiple arguments into one as a form of notational simplicity. e.g. f(x,y,z) is f(q) where we’ve done something to stuff x,yz into q or into f itself so add(a,b,c) could be add’(c) where add’(x) = add(a,b,x) or something. Seems pointless in this case.

If you can’t do int x; x = f(0); x = g(x); because your programming language does not have variable assignment, you need to do either g(f(0))
which can become too awkward with complex formulas or have a fancy ; so f(0) ;; g(_)
means “I will avert my eyes while the compiler does an assignment to a temp variable and pretend it didn’t happen”

1. 4

“Currying” just means packing multiple arguments into one as a form of notational simplicity

I don’t think that is accurate. Currying is about transforming functions that get multiple parameters into a sequence of functions, each of which takes one argument.

That is still a somewhat imprecise description, but I think an example would be more clarifying than going deeper into the theoretical details: If we take sum as non-curried function, it takes two integers to produce another one, the type is something like `(Int x Int) -> Int`, and in practice you call it like `sum(1, 2)` which evaluates to 3.

The curried version would be one function that given an integer `a` returned a function that given an integer `b` returned `a+b`. That is, sum is transformed into the type `Int -> (Int -> Int)`. Now to calculate 1 + 2 we should first apply sum to get a function that sums one, like `sum1 = sum(1)`, where `sum1` is a function with type `Int -> Int`; and then apply this new `sum1` function to 2, as in `sum1(2)` which returns 3. Or in short, dropping the temporary variable `sum1`, we can apply `sum(1)` directly as in `(sum(1))(2)`, and get our shiny 3 as result.

If your language uses space to denote function application, then you can write that last application as `(sum 1) 2`. Finally, if application is left associative, you can drop the parenthesis and get to `sum 1 2`, which, is arguably, pleasant syntax.

1. 3

because your programming language does not have variable assignment

Just use a different variable as you should in the first case as well and it works just fine.

``````let x = f 0
y = f x
in ?somethingWithY
``````

Also that’s a bad explanation of currying.

I will avert my eyes while the compiler does an assignment to a temp variable and pretend it didn’t happen

You have a negative reaction to FP for some reason which leads you to write these cheap jabs that are misinformative.

1. 2

I have a negative attitude to mystification. I don’t like it when reasonably simple programming techniques are claimed to be deep mysterious mathematical operations. Your example, is, of course, an example of why these “fancy semicolons” are needed when it is scaled up. Imagine how hard it would be to track a state variable, say an IO file descriptor around a program if we had to do fd_1, fd_2, fd_n every time there was an i/o operation - keeping these in order would be painful. The “combinator” automates the bookeeping.

The explanation of Currying is perfectly correct, I think, but I’d like to hear what you think I got wrong. There’s not much to it.

In much of mathematics, all of this is just a notational convention, not a endofunctor on a category of sets. The author, who is at least trying to make things clear, could have simply written:

The notation is simpler if we write “f x” to indicate function composition instead of “f(x)”, otherwise it gets cluttered with parenthesis. To avoid ambiguity with multiple arguments, only single argument functions are allowed and we take care of multiple arguments by using functions that produce functions as values so, instead of f(x,y,z) the value of f1 x is a map f2 and f2 y produces a value f3 and f3 z then is equal to f(x,y,z). Equivalently, f(x,y,z) = ((f1 x) y) z.

I think FP is an interesting approach. I wish it could be treated as a programming style instead of as Category and Lambda Calculus Notational Tricks To Impress your Colleagues.

1. 2

Functions take only one argument. So it’s really taking a (x * y * z), or an ‘x’ and a ‘y’ and a ‘z’. Currying is taking a function that takes an (x * y * z) -> w and transforms it to (x -> (y -> (z -> w))) aka (x -> y -> z -> w) this is useful because it allows us to create little function generators by simply failing to provide all the arguments. This isn’t simply a notation difference though because they are completely different types and this can matter quite a bit in practice. While yes there is a one to one correspondence between the two, that’s not the same as a “notational difference”. Tuples are fundamentally different objects than functions that return functions and this difference matters on a practical level when you go to implement and use them. You can say that it’s simply a notational difference but it’s untrue, and implicit conversions from one to the other does not mean they are “the same thing”.

1. 1

In Haskell functions may take only one argument. In mathematics and in programming languages, it depends.

1. 1

https://math.stackexchange.com/questions/2394559/do-functions-really-always-take-only-one-argument

In essence, it doesn’t depend. The notation depends but the notation represents one thing. f(x,y,z,w) is an implicit tuple of x,y,z,w. Without this there’s no concept of like domain of a function. This is all philosophical waxing but once you talk about currying it starts to matter because it affects what things are possible because not all arguments are provided at once. You could argue that objects and mutability sidestep this but I’d also argue that mutation within a function begins to deviate from a well defined “mathematical” function. That may be fine for you, and that’s okay. However since this conversation is primarily about definitions and we talked specifically about the mathematical way of modeling programming with functions yknow it matters.

For example in javascript the difference between f(x)(y)(z) and f(x,y,z) is literally different computations, and while there are times you can convert between the two freely, there are things that f(x)(y)(z) can do that f(x,y,z) cannot. For example I can use f(x)(y), to create a callback to defer computation with because f(x)(y) returns a function which takes a “z”. That’s genuinely useful. Now you can meaningfully argue that with objects you can do the same thing and that’s great but this is about functions and function passing. So it is actually meaningful to describe what you can and cannot do with these things.

1. 2

This is all philosophical waxing but once you talk about currying it starts to matter because it affects what things are possible because not all arguments are provided at once. You could argue that objects and mutability sidestep this but I’d also argue that mutation within a function begins to deviate from a well defined “mathematical” function.

Very little in programming is a mathematical function. Even something like `(==) : a -> a -> bool` isn’t a function, as its domain would be the universal set.

Also, I don’t think all mathematical functions are curryable? Like consider the function `f[x: R^n] = n`, which returns the number of arguments passed in. I don’t think there’s a way to curry that.

1. 2

A year or two ago I wrote an Idris function that took an arbitrary number of (curried) arguments and put them in a list (and gave you the argument count). From what I remember it used a type class with one instance for `(->)` (the accumulator) and one for the result type, and a lot of type inference. I’ll dig it out later.

Edit: That was an already-curried function. You could potentially automatically curry `f[x: R^n]` using the same technique since, in Idris, tuples are constructed like lists: `(1, 2, 3)` is the same as `(1, (2, (3, ())))`, so you could deconstruct a cartesian product while simultaneously producing a chain of `(->)` function constructors.

1. 1

Both of these points while interesting, and I think important questions to ask, don’t affect the position that currying is not simply a notational difference.

2. 1

In essence, of course it depends, but it’s trivia. The course notes for the mulivariate calc course at MIT begins “Goal of multivariable calculus: tools to handle problems with several parameters – functions of several variables.” I’m amazed that some Haskellian or Curry Fan has not intervened to correct the poor fool teaching that course all wrong - “Excuse me, but those are not multiple variables, they are vectors! Please write it down.” If you did say that, and the Professor was in a good mood, she’d probably say: yes, that’s another way to look at the same thing.” And if you then insisted that “a well defined mathematical function” has a single variable, well … As usual, the problem here is that some trivial convention of the Haskell approach is being marketed as the one true “mathematics”.

I have no idea what a “mutation within a function” would mean in mathematics. Programs don’t have mathematical functions, they have subroutines that may implement mathematical functions. There is no such thing as a callback in the usual mathematical definition of a function. Obviously, within a programming language, there will be differences between partial computation or specialization of a function and functions on lists. You seem to be mixing up what programming operations are possible on subroutine definition with the mathematical definition of a function - unless I’ve totally missed your point. Obviously, for programs there is a big difference between a vector and a specialized function. But so what?

1. 2

You have confused operads with categories, I think.

1. 1

You can be upset about it, but currying isn’t simply a notational difference from multivariate functions.

1. 9

I may be simplifying it too much, please let me know if I am making a straw man out of this, but my interpretation of the argument is:

P1. IO is the interface towards users

P2. In order to be considered useful for user facing programs a language has to make IO easy

Conclusion: Haskell is not useful for writing user facing programs

I would not discuss P1, we can take it for granted

P2 seems fairly arbitrary to me, without deeper reasoning, it seems to me that is a weak premise. I can imagine parallel ones that are quite obviously unsound. “Your feet are the most important part of your body since they are the ones interfacing with the ground”, “Sales is the most important department of any company as they are he ones bringing in the money”, etc.

P3 Is subjective, whether it is difficult or easy depends on how well you know the tool (haskell), and what are you trying to use it for (which problem). However, let’s just focus on a supporting premise for that one, picked from the tweet that is linked

In other words, “open file a, then open file b” is one of the harder programs to write in Haskell.

I am interpreting “harder” as most difficult, which again depends on who is doing it. It wasn’t really hard at all for me at least:

``````import System.IO

main :: IO ()
main = openFile "/tmp/foo" ReadMode >> openFile "/tmp/bar" ReadMode >> return ()
``````

A complete beginner that hits the right tutorials would likely do something like this even before knowing that there is something called monad

``````import System.IO

main = do
return ()
``````

And someone that really hates monads and would like them not to exist could write

``````import System.IO

main =
let ops = openFile <\$> ["/tmp/foo", "/tmp/bar"] <*> [ReadMode]
in sequence_ ops
``````

The last one is just to make the point that the relationship between monads and IO is weaker than people superficially claim, once you put some time into it.

Now, some people would have a more difficult time writing any of these, but now that you know, it is not that difficult, is it?

So, if you don’t want to use a particular language, that is fine. If you want to use a particular language, that is fine too. If you want to claim that a particular language is useless, you are starting from a very difficult to sustain position. People have written usable and useful software in virtually any language, including Brainfuck and Malbolge, both of them explicitly designed not to be of practical use.

Aside

I did a PhD in the type of math that necessitates a lot of category theory, and I have looked at your use of category theory, and judged it to be unnecessary and pretentious and mainly focused on making you look smart while being entirely trivial. But this is not that kind of blog post.

Well, if it is not that kind of blog post, then don’t write this. Some people have worked a lot on these pretentious and unnecessary use of category theory. It is quite unfair to claim that they just focused on looking smart while being entirely trivial unless you have pretty strong evidence (not the kind, “I worked on this before, so I know) to support it.

1. 6

This article is yet another indication that the Clang/LLVM developer culture is seriously broken. The level of reasoning and technical accuracy would be noticeably bad in an HN rant. The basic premise is nutty: nobody invested billions of dollars and huge amounts of effort and ingenuity to cater to the delusions of C programmers. Processor development is extensively data driven by benchmarks and simulations that include code written in multiple languages. Just to mention one claim that caught my eye, caches are not a recent invention.

Consider another core part of the C abstract machine’s memory model: flat memory. This hasn’t been true for more than two decades. A modern processor often has three levels of cache in between registers and main memory, which attempt to hide latency.

Wow! “More than two decades” is right. In fact caches were even present on the PDP-11s and - they were not a new idea back in 1980. Poor Dennis and Ken, developing a programming language in ignorance of the effects of cache memory. The rest of it is scarcely better.

The root cause of the Spectre and Meltdown vulnerabilities was that processor architects were trying to build not just fast processors, but fast processors that expose the same abstract machine as a PDP-11. This is essential because it allows C programmers to continue in the belief that their language is close to the underlying hardware.

WTF? Where is the editorial function on ACM Queue? Or consider this explanation of ILP in processor design.

so processors wishing to keep their execution units busy running C code rely on ILP (instruction-level parallelism). They inspect adjacent operations and issue independent ones in parallel. This adds a significant amount of complexity (and power consumption) to allow programmers to write mostly sequential code. In contrast, GPUs achieve very high performance without any of this logic, at the expense of requiring explicitly parallel programs.

Who knew that pipelining was introduced to spare the feelings of C coders who lack the insight to do GPU coding?

1. 4

nobody invested billions of dollars and huge amounts of effort and ingenuity to cater to the delusions of C programmers

People that worked in hardware often said the opposite was true. Mostly due to historical circumstances combined with demand. We had Windows and the UNIX’s in C. Mission-critical code went into legacy mode more often than it was highly-optimized. Then, optimization-oriented workloads like HPC and especially gaming demanded improvements for their code. In games, that was largely in C/C++ with HPC a mix of it and Fortran. Processor vendors responded by making their CPU’s really good at running those things with speed doubling every 18 months without work by software folks. Compiler vendors were doing the same thing for the same reasons.

So yeah, just because people were using C for whatever reasons they optimized for those workloads and C’s style. Weren’t the benchmarks apps in C/C++, too, in most cases? That would just further encourage improving C/C++ style along with what patterns were in those workloads.

“Who knew that pipelining was introduced to spare the feelings of C coders who lack the insight to do GPU coding?”

There were a lot of models tried. The big bets by CPU vendors on alternatives were disasters because nobody wanted to rewrite the code or learn new approaches. Intel lost a fortune on stuff like BiiN. Backward compatibility with existing languages and libraries over everything else. Those are written in C/C++ that people are mostly not optimizing: just adding new features. So, they introduced other ways to speed up those kind of applications without their developers using alternative methods. This didn’t stop companies from trying all kinds of things that did boost numbers. They just remained fighting bankruptcy despite technical successes (Ambric), niche scraping by (Moore’s chips), or priced too high to recover NRE (eg FPGA’s w/ HLS or Venray CPU’s). Just reinforced why people keep boosting legacy and high demand systems written in stuff like C.

1. 5

You are not going to find anyone who does processor design who says that.

1. Processors are highly optimized for existing commercial work loads - which include significant C and Java - true
2. “processor architects were trying to build not just fast processors, but fast processors that expose the same abstract machine as a PDP-11. This is essential because it allows C programmers to continue in the belief that their language is close to the underlying hardware.” - not even close.

First is a true statement (obvious too). Second is a mix of false (PDP-11??) and absurd - I’m 100% sure that nobody designing processors cares about the feelings of C programmers and it’s also clear that the author doesn’t know the slightest thing about the PDP-11 architecture (which utilized caches a not very flat memory model, ILP etc. etc. )

Caches, ILP, pipelining, oo execution - all those pre-date C. Spec benchmarks have included measurements of Java workloads for decades, fortran workloads forever. The claim “so processors wishing to keep their execution units busy running C code rely on ILP (instruction-level parallelism). “ is comprehensively ignorant. ILP works with the processor instruction set fundamentals, not at the language level. To keep a 3GHz conventional processor busy on pure Erlang, Rust, Swift, Java, Javascript loads, you’d need ILP, branch prediction, etc etc as well. It’s also clear that processor designers have been happy to mutate the instruction set to expose parallelism whenever they could.

“The key idea behind these designs is that with enough high-level parallelism, you can suspend the threads that are waiting for data from memory and fill your execution units with instructions from others. The problem with such designs is that C programs tend to have few busy threads.”

Erlang’s not going to make your thread switching processor magically work. The problem is at the algorithmic level, not the programming language level. Which is why, on workloads suited for GPUs, people have no problem writing C code or compiling C code.

Caches are large, but their size isn’t the only reason for their complexity. The cache coherency protocol is one of the hardest parts of a modern CPU to make both fast and correct. Most of the complexity involved comes from supporting a language in which data is expected to be both shared and mutable as a matter of course.

Again, the author is blaming the poor C language for a difficult algorithm design issue. C doesn’t say much at all about concurrency. Only in the C11 standard is there an introduction of atomic variables and threading (it’s not very good either) but this has had zero effect on processor design. It’s correct that large coherent caches are design bottleneck, but that has nothing to do with C. In fact, shared memory multi thread java applications are super common.

etc. etc. He doesn’t understand algorithms or processor design, but has a kind of trendy psycho-babble hostility to C.

1. 3

Good counterpoints. :)

1. 2

The article argues that modern CPU architecture spends vast amounts of die space supporting a model of sequential execution and flat memory. Near as I can tell, he’s absolutely correct. You, otoh, seems to have not understood that, and moved straight to ad hominem. “He doesn’t understand algorithms or processor design”; please.

1. 5
1. The claim that “that modern CPU architecture spends vast amounts of die space supporting a model of sequential execution and flat memory” is totally uncontroversial - if you have sensible definitions of both sequential execution and flat memory.

2. The claim that “The features that led to these vulnerabilities [Spectre and Meltdown] , along with several others, were added to let C programmers continue to believe they were programming in a low-level language,” is absurd and indicates a lack of knowledge about processor design and a offers a nutty theory about the motivations of processor architects.

3. The claim “The root cause of the Spectre and Meltdown vulnerabilities was that processor architects were trying to build not just fast processors, but fast processors that expose the same abstract machine as a PDP-11.” is similarly absurd and further comments indicate that the author believes the PDP-11 architecture predated the use of features such as caches and instruction level parallelism which is a elementary and egregious error.

4. The claim “Creating a new thread [in C] is a library operation known to be expensive, so processors wishing to keep their execution units busy running C code rely on ILP (instruction-level parallelism)” - involves both a basic error about C programming and a basic misunderstanding of the motivations for ILP in computer architecure. It precedes a claim that shows that the author doesn’t understand that C based code is widely used in GPU programming which is another elementary and egregious error.

5. Those are not the only errors in the essay.

6. “Ad hominem” involves attempting to dismiss an argument based on claims about the character of the person making the argument. If I had argued that Dave Chisnall’s personal attributes invalidate his arguments, that would be ad hominem. I did the opposite: I argued that the gross technical errors in Chisnalls argument indicate that he does not understand computer architecture. That’s not use of ad hominem, but is ad argumentum - directed at the argument, not the person.

Thanks.

vy

1. 2

The claim that “The features that led to these vulnerabilities [Spectre and Meltdown] , along with several others, were added to let C programmers continue to believe they were programming in a low-level language,”

Of course, only the author could say that for sure, but my interpretation of this, and other similar sentences, in the article is not to make the point about modern processors somehow trying to satisfy a bunch of mentally unstable programmers, but rather than they are implementing an API that hides a vast amount of complexity and heuristics. As far as I understand, this claim is out of question in this thread.

The second point, which in my opinion is a bit too hidden in somewhat confusing prose, is that a lower level of programming could be possible, by allowing programs to control things like branch prediction and cache placement for example. That could also simplify processors by freeing them from implementing heuristics without having the full context of what the application may be trying to achieve, and grant full control to the software layer so that better optimisation levels could be reached. I think that is a valid point to make.

I don’t really like the connection to spectre, which I think is purely anecdotal, and I think that the exposition as a discussion about whether C is or is not low level programmer and what C programmers believe muddies what I think is the underlying idea of this article. Most of the article would be equally valid if it talked about assembly code.

1. 1

I think it would be a really good idea for processor designers to allow lower level programming, but if that’s what the author is attempting to argue, I missed it. In fact, his lauding of Erlang kind of points the other way. What I got out of it is the sort of generic hostility to C that seems common in the LLVM community.

1. 1

I think (one of) his point(s) is that if CPU designers abandoned the C abstract machine/x86 semantics straightjacket, they could build devices that utilizes the silicon much more effectively. The reason they don’t do that is ofc not that they are afraid of C programmers with pichforks, but that such a device would not be marketable (because existing software would not run on it). I don’t understand your erlang remark though. I believe a CPU that did not do speculative execution, branch prediction, etc, but instead, say, exposed thousands of sequential processing units, would be ideal for erlang.

1. 3

This was a very interesting machine https://en.wikipedia.org/wiki/Cray_MTA it did a thread switch on every memory load

The success of GPUs shows that: 1) if there are compelling applications that fit a non-orthodox processor design, software will be developed to take advantage of it and 2) for many parallel computer designs, C can easily be adapted to the environment - in fact, C is very widely used in GPUs. Also, both AMD and Intel are rushing to extend vector processing in their processors. It is curious that Erlang has had so little uptake in GPUs. I find it super awkward, but maybe that’s just me.

Obviously, there are annoying limitations to both current processor designs ( see https://lobste.rs/s/cnw9ta/synchronous_processors ) and to C. My objection to this essay was that it it confirmed my feeling that there are many people working on LLVM who just dislike C for not very coherent reasons. This is a problem because LLVM/Clang keep making “optimizations” that make C even more difficult to use.

1. 4

I am not sure what is the point of this post. Is it supposed to be informative? (If it is I expect some harsh feedback). A learning exercise? (if it is, it’d be nice to know what was the goal to provide constructive feedback) Something else?

1. 4

I took it as a post-modern commentary on the state of web development, and the harmonious march towards complication. It serves to juxtapose simplistically defined “utility” vs. a “production grade” rat’s nest of an application, complete with 100s of dependencies, and a multi-node database solution (be it RDBMS, or a NoSQL style).

In other words, the author wanted to share a little hack that they thought of, and the submitter thought it was fun enough to share.

1. 8

erlang uses it to allow runtime update of code with zero downtime. OTP has hooks for transforming the old data formats to new data formats on live systems.

1. 7

Why is this enabled by dynamic typing? In my view it might be possible to achieve the same with static types. For example by restricting code updates to the same type signature as the code they are replacing (for erlang that would be the types of the exported functions) and allowing those hooks to transform one incoming type to a different outgoing type if the types needed to be changed.

1. 2

I think the type signatures might go a bit crazy when you have so many generic functions. If the vm type checked upgrade code against old and new types it could be quite neat.

1. 6

There are a number of approaches to deal with strong static types with hot-swap upgrades. Cloud Haskell (basically an effort to copy Erlang’s actor model features to Haskell) uses Haskell’s Typeable library. It’s a bit messy, but it does allow for type-safe hot replacement of distributed system components without having to e.g. transmit values as JSON or something.

1. 3

Erlang works such that you could make a reference to an anonymous function in a module at version 1, send it over to a process that uses version 2 of the module to wrap it in another one, and pass it to a process that runs version 1 again. For that period of time, until version 1 is unloaded, this is a valid operation that should be possible to execute and unwrap.

2. 6

Haskell, quintessentially strongly typed, has the “plugins” library which allows runtime hot-loading. I think this is sort of orthogonal to how the language is typed.

1. 4

Erlang is strongly typed too. I think you meant static :)

1. 1
2. 3

Java can do it. C can do it (and thus everything using dll/so/dylib libraries). C# can do it. So all mainstream statically typed languages are capable of a runtime update of code with zero downtime. They are not designed to do it, so it does not work as nice as in Erlang, but static typing is not part of the problem.

1. 1

That depends whether you define “problem” as “a property making it impossible to do something” or “a property making it a pain in the ass to do something”. IMO static type systems don’t make it impossible to hot-reload code, but they sure as heck make it hard, which is why statically-typed languages tend to only get that feature implemented once they’re really mature.

1. 3

iiuc, you always have the possibilty of duplication (double-processing) or loss (zero processing), because fundamentally your client code will do either:

process(msg); ack(msg);

or:

ack(msg); process(msg);

and you can always just die (power off) in between the two. This can be made a short interval but it’s always there. (If your process() records the ids of the msgs you’ve processed and ignores ones it has seen before you have the same situation but just lower down - the ack(msg) becomes record_id(msg)).

I think these improvements are reducing the failure modes where the client has called ack(msg) and the rest of the system loses the ack and can cause the msg to be replayed.

[I guess it might be possible if your ‘process’ can do its work and record the id in an atomic step (a txn)?]

1. 2

I think the way it’s supposed to work is, if you commit “achieved work” alongside of wherever it is your processor is at, AND have idempotent operations, then it means that “processing twice doesn’t matter”. I’d argue that in effect this doesn’t really mean “exactly once” delivery. It’s more “exactly-once processing”, like the response linked to by @pushcx: You’d only have completely (as in, successfully) processed any given message exactly once.

That’s how I understand it at least. It’s still pretty good. It doesn’t get around the mathematical impossibility, it gives you the tools to do what you want with less chance of footgunmanship.

1. 1

Yes, reading the ‘redux’ article - basically if your ‘process’ step can be a part of the same transaction as recordin the ack. you can achieve exactly onice.

I don’t think you also need the process step to be idempotent - being part of a transaction (which can also contain the id for de-dupe) is sufficient. That does mean though that the process step be reversible - so the txn can be aborted.

I think if your process step is “fire the missiles”, you’ll never have exactly once.

1. 2

I don’t think you also need the process step to be idempotent - being part of a transaction (which can also contain the id for de-dupe) is sufficient. That does mean though that the process step be reversible - so the txn can be aborted.

Transactions work only if the changes they are doing can be rolled back. That is not exactly the same as requiring idempotence, but your process may need to be restarted after failing at an arbitrary point.

That is the whole disagreement of what “exactly once means”, as far as I understand this. One cannot guarantee “exactly once” on arbitrary conditions, what this (and other recent works that claim to achieve exactly once) do is to provide a set of restrictions under which you have those guarantees.

1. 23

This reads a lot like “Kotlin is better than Java, hence better than any other language”

1. 13

I think it’s because the article has the unstated premise that you’re stuck with Java for one reason or another—because of Android, as in the author’s case, or because the codebase you’re paid to work on is already written in Java. By “whatever dumb language”, I get the feeling that he really does mean precisely Java here.

1. 1

This is a sort of well-known folk trick for property testing. You need to design your parameter distributions to hit “small” values often. In this case, as soon as zero is a reasonable point in parameter space you should explicitly note that by providing positive probability weight to it:

``````h ~ p delta(0) + (1-p) Exp(l)
``````
1. 1

Uh, can you link to a simple example of that applied to real code so the less mathematically inclined can see how it works?

1. 3

For example, the quickcheck implementation for Erlang will start generating small” values first and move up to larger ones. The definition of “size” depends on the values generated, but for integers and floats is proportional to their absolute value. The initial tests are generated with a size of zero, which ensures the usually special case of inputting zeroes in different places in the test is covered. Same as for empty lists, empty strings, etc

Is definitely a bit ad-hoc, but effective in practice

1. 1

That makes more sense. Thanks. I used to do something along those lines with numerical tests. I tried a positive number, a zero, a negative, one at minimum/maximum, and one past that. Even if it should fail, I still wanted to see how it failed. I don’t know if there is a name for that or if it has its own sub-category of research. I just hacked it together to avoid state explosion hoping something would come out of it.

1. 2

I had been involved in a project regarding property based testing in Erlang (you can get some referendes from this report). There we explored several approaches to describe tests in higher level than the common test case approach. The ones that I was more interested on were quickcheck and model checking. Both of them try to express a property of the software (e.g. for all integers A and B plus(A, B) must be equal to plus(B, A)).

Model checking (implemented for Erlang in that project as McErlang is more powerful but quite hard to implement and needs a lot of trickery to cope with the problem space explosion. Quickcheck tries to take the pragmatic approach and just randomly samples the search space. Then the difficulty turns into sampling efficiently. In practice, quickcheck discovers a lot of discrepancies between the specs and the implementation without little tweaking.

For selected references:

1. 9

Second, no; NoSQL does not scale better than SQL. At least not in the manner you think. MySQL is being used from top-traffic websites like Facebook and Twitter, to high traffic websites like GitHub and Basecamp and almost every PHP installation out there. If this is not scaling, then what is?

Facebook’s MySQL setup (and Google’s pre-spanner), as far a I am aware, look NOTHING like your standard rails/PHP crud app setup. They look a lot more like NoSQL schemas.

1. 12

Which suggests that it is flexible enough to obviate the need for many use-cases supposedly geared to NoSQL.

1. 8

Precisely. You only need to build a NoSQL database if you need a different storage backend. For example, Google built BigTable on top of GFS because they needed to control the size of SSTables, and merge and split based on size. Building a NoSQL API is faster and easier, so that’s where they started. Notice they then added back a full sql query interface with Dremel.

If they hadn’t cared about storing SSTables in GFS for MapReduce, they could have easily implemented the whole thing on top of MySQL. You can trivially implement the SSTable interface like so:

``````create table sstable (
k blob,
v blob,
primary key (k)
);
``````

Or you could use `varbinary` instead of `blob`, whatever fits your needs. Or `bytea` in PostgreSQL.

1. 2

Notice they then added back a full sql query interface with Dremel.

But this is just the query interface. It’s not like they wrote Dremel on top of MySQL because MySQL == Scale.

If they hadn’t cared about storing SSTables in GFS for MapReduce, they could have easily implemented the whole thing on top of MySQL. You can trivially implement the SSTable interface like so:

What? Just because you’ve recognized that a key-value store has this schema doesn’t mean it’s easy to implement something that fills the same niche as BigTable on MySQL. BigTable has a lot more semantics around it than just being a key-value store.

1. 4

Bigtable DOES have a lot more semantics around it than just being a key value store. Lets examine.

Routing. BigTable key ranges are managed in a central database that maps key ranges to storage nodes. The client then queries the storage nodes directly for data, avoiding going through this central sharding database for every query. The sharding database is updated when SSTables are split or merged, or shuffled around in general. Storage nodes also keep a history of their recent sharding operations, so they can redirect clients who have out-of-date information. Clients very out of date are redirected back to the central sharding database to get new info.

• the sharding and storage nodes could easily be implemented on top of MySQL
• they aren’t purely because Google also wanted the SSTables in GFS for MapReduce
• originally MapReduce jobs created the BigTable SSTables for the search index
• until Google developed Percolator to incrementally update search indexes in BigTable

Consistency groups. BigTable tables can have certain columns grouped together so they can be updated atomically as a group.

• consistent updates to multiple columns could easily be implemented with transactions

Compression. BigTable SSTables are compressed on disk.

• MySQL supports compressed tables
• even if it didn’t, it would not be hard to run snappy on a blob before you inserted it into MySQL

Timestamp dimension and immutable storage. BigTable is append only, every “update” is actually an insert at the current timestamp. This makes it way easier to manage SSTables, because they can then be made immutable, and then you don’t have to write and test a mutable storage engine. But of course Google wrote LevelDB so they don’t exactly have a problem doing that. Really, the biggest reason SSTables are immutable is because, again, Google wanted to store them in GFS.

• in MySQL just add a `timestamp` column to the table, at the tail of the index
• it wouldn’t be necessary to manage immutable SSTables with mysql
• seriously, they are only like that for GFS
• it was silly that Cassandra did that, since they don’t store in GFS
• it’s been a pain point for them since the beginning
• it’s because they copied Google blindly
• then they moved to leveled storage when they realized their mistake

So you’re right, it doesn’t make it easy to implement BigTable on MySQL. But it’s definitely easier because a lot of the storage layer functionality is there for you. All of the extra features of BigTable can be implemented on MySQL, except immutable storage compatible with GFS and MapReduce. If you have other semantics you want me to address, I will happily explain how they could be implemented on MySQL and why they aren’t because of the GFS requirement.

1. 1

If you have other semantics you want me to address, I will happily explain how they could be implemented on MySQL and why they aren’t because of the GFS requirement.

But I’m not arguing that something could be done. But writing that something is possible in a comment on the internet is quite different than the reality. For example, at the read and writes load of a BigTable cluster I am not convinced that getting the time dimension is just adding a timestamp column and an index. BigTable doesn’t have to deal with the generic case and can optimize just this case for better performance.

One interesting thing you left out was resharding and splitting, which is where some of the dynamic scalability benefits come from. MySQL offers nothing for that, AFAIK.

(A small note on your compression point: BigTable offers block-level compression, which means multiple rows that are similar will compress together, and compress well, which is not the same as compressing each row individually, at petabyte tables that can make a big difference).

Google could have modified the underlying MySQL storage format to do all of the SSTable things as well, but they didn’t. I don’t know why they didn’t. Maybe it says something about MySQL, maybe not. I think you’re being far too optimistic in your belief that because you’ve found a mapping of concepts, the performance and scalability come with it.

1. 3

But writing that something is possible in a comment on the internet is quite different than the reality.

True. Luckily for me, Google, Facebook, Uber, and others have all done this.

For example, at the read and writes load of a BigTable cluster I am not convinced that getting the time dimension is just adding a timestamp column and an index.

It is. You overestimate the throughput per node of BigTable. The whole point massive scalability of storage volume, not massive throughput on individual keys.

You’re also not adding an index, you’re adding the timestamp column to your existing index. Joining the primary index and timestamp index would be ridiculous.

One interesting thing you left out was resharding and splitting, which is where some of the dynamic scalability benefits come from. MySQL offers nothing for that, AFAIK.

No, you’d have to implement it. My point is, it’s easier to implement BigTable-like routing on top of MySQL than implementing all of BigTable. You claimed it wasn’t as easy to build BigTable on MySQL, that’s what I’m addressing here.

(A small note on your compression point: BigTable offers block-level compression, which means multiple rows that are similar will compress together, and compress well, which is not the same as compressing each row individually, at petabyte tables that can make a big difference).

Yeah, but the uncompressed tablet is present on disk on the node that hosts the tablet, and any new keys are stored in memory and a log file until they’re large enough to be stored as a tablet.

Google could have modified the underlying MySQL storage format to do all of the SSTable things as well, but they didn’t. I don’t know why they didn’t. Maybe it says something about MySQL, maybe not.

MySQL’s execution engine requires a mutable storage engine. If you’re rewriting the storage engine and the execution engine, then all that’s really left is the query engine, which you don’t need in BigTable. That’s why they didn’t.

I think you’re being far too optimistic in your belief that because you’ve found a mapping of concepts, the performance and scalability come with it.

Why didn’t ad tech use BigTable? Why did they stick to their MySQL based solution? Performance. BigTable has pretty terrible latency. I’m not being “far too optimistic” about anything, I’m talking from experience working at Google with BigTable, and with other distributed systems elsewhere. MySQL’s InnoDB is highly performant, and scalability comes purely with the distributed layer. There are a lot of other things to optimize before InnoDB is your bottleneck, and BigTable SSTables are not an optimization on that.

2. 4

By this logic, since NoSQL and SQL DBs are implemented on flat files, we should be using flat files instead because they are even more flexible.

This argument is nonsense, if you look at the amount of engineering that went into getting MySQL to work at Facebook scale (given all of the public information), they end up with something that, unsurprisingly, looks a lot like a NoSQL DB. So if that’s what you need, just go with a NoSQL DB that has been battle tested more than an ad-hoc MySQL setup. If you read the Spanner paper, Google was using Spanner to replace the MySQL cluster for their ad system. One of the reasons they were replacing it with Spanner is because resharding the MySQL cluster took years. The companies using MySQL like this pre-date many NoSQL DBs so the fact that they use them might have more to do with the fact that they did not exist rather than they chose not to.

To be clear: I think many people should use a SQL database. But precisely because they probably do not have scale and uptime issues that necessitate something else.

1. 4

This argument is nonsense, if you look at the amount of engineering that went into getting MySQL to work at Facebook scale (given all of the public information), they end up with something that, unsurprisingly, looks a lot like a NoSQL DB. So if that’s what you need, just go with a NoSQL DB that has been battle tested more than an ad-hoc MySQL setup.

This presumes that you wouldn’t end up fighting a lot with the NoSQL solution to get it to work at scale. This has not been my experience with NoSQL solutions – you will end up fighting them just as mush as you fought MySQL, to get them up to something like Facebook scale. Hell, I’ve fought them just to get them to behave reliably at 1/1000th a Google or Facebook scale. Facebook scale is just plain hard, and no silver bullet will get you there without that fight.

It’s not the query interface that’s the problem at Facebook scale, but the lack of one is a hell of a problem at any smaller scale. In my experience, the only real difference between building on an RDBMS vs building on a NoSQL is the flexibility to work well across a range of use-cases. Start with an RDBMS because there’s a 99.99XXXX% chance you’ll never hit Facebook scale. When you do hit that scale, yeah, you’ll fight it. You’ll fight anything at that scale.

1. 1

I agree with you that getting to Facebook scale is a technological marvel now matter how you cut it. Which I think is an even stronger argument for why using it as an example as to why the average company should be using it is ridiculous.

There is a lot of space between your first CRUD app and Facebook though, and I don’t think it’s quite so easy to just claim that SQL fills that entire space. I don’t know what NoSQL DBs you’ve used, however I have used Riak and Cassandra in anger. I have found both of them easier to manage than a SQL database, when their strengths are desirable. That is availability and scalability. Making MySQL operate like Cassandra is simply not worth it and Cassandra has been battle tested enough that scaling it up is well understood. Custom MySQL setup, not so much.

2. 4

By this logic, since NoSQL and SQL DBs are implemented on flat files, we should be using flat files instead because they are even more flexible.

What? Flat files are less flexible, because they don’t have the huge number of features present in SQL databases.

This argument is nonsense, if you look at the amount of engineering that went into getting MySQL to work at Facebook scale (given all of the public information), they end up with something that, unsurprisingly, looks a lot like a NoSQL DB.

Yeah, so it’s perfectly viable to build a large distributed database on top of MySQL. All the extra code to do that—mostly sharding and replication logic—is necessary in a NoSQL database too. So why write an entire storage and network layer? All the distributed sharding logic can be implemented in the client (see BigTable), so just do that and keep MySQL. This is exactly what Google and Facebook did.

It only looks like a NoSQL database because the operations you can do at scale are fairly limited, so when NoSQL later came around they didn’t bother with a SQL front end to implement select from where. Well, except CQL in Cassandra. And even then, SQL came back as the technology to do more complex operations developed, leaving NoSQL databases with a crippled query interface.

If you read the Spanner paper, Google was using Spanner to replace the MySQL cluster for their ad system. One of the reasons they were replacing it with Spanner is because resharding the MySQL cluster took years.

This is not a shot against MySQL. This is a testament to the flexibility of SQL databases: Google used this MySQL based system for many years before the engineering effort of building a new system was worth it.

The companies using MySQL like this pre-date many NoSQL DBs so the fact that they use them might have more to do with the fact that they did not exist rather than they chose not to.

Exactly, SQL databases were flexible enough to use to implement a distributed database. You’ll notice that Google didn’t move ad tech to BigTable once that was available. Instead they continued to use their MySQL based system. Did you ever stop to think why?

To be clear: I think many people should use a SQL database. But precisely because they probably do not have scale and uptime issues that necessitate something else.

Yeah, but your bar is probably way off. You need a really huge database before SQL becomes unviable. Operating a NoSQL cluster is only worth it cost-wise if you truly have a tremendous amount of data. Seriously, an X1.32xlarge instance on AWS is \$35k a year (128 cores, 2TB RAM, 4TB SSD). That’s cents on the dollar compared to the cost of even a single additional operations engineer to manage your NoSQL cluster. And that doesn’t include the time saved by engineers writing SQL queries instead of manually joining and grouping data in their app.

As for uptime, you can only get perfect uptime on a NoSQL database with a full operations team dedicated to managing that cluster. This is worth it if you’re Apple, and you run 10k Cassandra nodes. But it takes a lot to really get to a scale where it’s actually cost-effective.

1. 3

I have some responses to specific comments you made below, but they are more because I can’t help myself and not because they are super important. But I read your post several times and, for the core of your comment, I believe rather than making a complete narrative, you’re responded to my specific claims and pointing out that it’s possible to do what I say, but on MySQL, rather than if one should and what it will cost. I don’t mean to imply you’re doing this on purpose.

If a company came to me asking for recommendations, here is what I would tell them:

Start with a SQL database with replication unless immediately clear your problem does not apply. If you get to the point where either write-availability is extremely important, or you’ve out grown the amount of disk that’s reasonable on a single machine, or you’re hitting write throughput limits on a single machine, migrate the necessary data over to a distributed database such as Riak or Cassandra, or maybe Cloud BigTable. If global consistency is really important, maybe consider Spanner. I would not try to turn my SQL database into something with those semantics because I believe that it is expensive in terms of engineering time and operations as there are a lot of edge cases that are not immediately obvious and those distributed databases have resolved them.

The above is based on my experiences, so for a reader of this it depends on who they trust more. I would urge any reader to deeply investigate what it takes to make a SQL database look and operate like a Riak or a Cassandra. Very likely one can get a 10% solution up and going in a week or two. But the remaining 90%, my experience tells me, will be a lot of work.

Here are some specific comments I had but I don’t think they are as important as the complete narrative I gave above.

What? Flat files are less flexible, because they don’t have the huge number of features present in SQL databases.

Oh, but all it takes is a little bit of engineering to turn a file into a SQL database. Just like it only takes a little bit of engineering to turn MySQL into a highly-available, sharded, multi-master (? depending on your NoSQL flavor) database.

That’s cents on the dollar compared to the cost of even a single additional operations engineer to manage your NoSQL cluster.

I’m not sure exactly what you mean here. It’s not like a SQL database, especially if you plan on implementing some sharded MySQL thing doesn’t need an operations team either. Your comparison seems to be SQL = Free, NoSQL = Cost, but they both cost something and a bespoke SQL setup can cost quite a bit.

And that doesn’t include the time saved by engineers writing SQL queries instead of manually joining and grouping data in their app.

But if you have a sharded, multi-master, highly scaled MySQL setup, you will have to write those joins by hand anyways, unless you have chosen a very particular sharding schema to make them easier but if you ever change it you’re in a long project to reshard. You also have to write and manage a sharded, multi-master, bespoke MySQL setup.

As for uptime, you can only get perfect uptime on a NoSQL database with a full operations team dedicated to managing that cluster.

How is that any different than a SQL database?

This is worth it if you’re Apple, and you run 10k Cassandra nodes. But it takes a lot to really get to a scale where it’s actually cost-effective.

Apple has 100k+ Cassandra nodes, actually. I have been paid to manage Cassandra nodes and the problem we actually ran into was that it was so stable, teams managing their own clusters forgot how to operate them between incidents. In general, operations costs were quite low. We ended up consolidating the operations in a single team in order to keep the knowledge there and fresh. I’m not saying it’s easy, just that you are not painting a picture that matches my own experiences. And the scale I worked was around 1k clusters and around 7k nodes.

1. 2

As for uptime, you can only get perfect uptime on a NoSQL database with a full operations team dedicated to managing that cluster. This is worth it if you’re Apple, and you run 10k Cassandra nodes. But it takes a lot to really get to a scale where it’s actually cost-effective.

If you are thinking of availability, my experience is exactly the opposite. Making a consistent data store highly available requires a lot of engineering, and is ultimately limited by how much can you spread replicas, which is typically not that much given the challenges posed by reaching distributed consensus. A good example of that is spanner which is by far not an easy thing to implement.

A valid question is how many use cases are there for small or mid size companies to consider dropping consistency to achieve high levels of availability, which I presume are not that many.

1. 3

challenges posed by reaching distributed consensus

Sigh. Yes, I agree with you, it is tremendously difficult to do, and only a few teams have done it successfully in a meaningful way that scales. But I think that’s missing the point.

When do you actually need fully automated distributed consensus and failover? Same as anything else you’d consider automating: whenever it’s more efficient to automate than perform manually. For 25k node clusters, definitely necessary. For a 12 node cluster? Give an ops engineer a pager and spend a week doing practice scenarios. It’ll be easier than testing a distributed system.

As for architecture you can get a lot of mileage out of simple sharding and 3 node replica sets. See AWS DynamoDB. That’s literally how it works. Most people don’t need distributed databases, and out of the people who do, most people don’t need Spanner. With 4 shards / 3 replicas each, you can pretty easily make it to 40TB of capacity these days. And don’t forget, 40TB goes a lot farther in a SQL database with packed rows than a NoSQL database with string keys embedded in every row.

A valid question is how many use cases are there for small or mid size companies to consider dropping consistency to achieve high levels of availability, which I presume are not that many.

I definitely agree there. I always see the pattern of engineers thinking they need to scale massively, so they set up a Cassandra cluster or something. On 5 machines roughly as powerful as my laptop. And then spend ages trying to build an application that actually works on top of an inconsistent system. It’s not even frustrating anymore, I just accept the whole sad state of it. Maybe I spend too much time with tech hipsters.

1. 6

This post is no more than a rant with little (if any) effort in providing reasonable support for any claim:

To “support” the claim that SQL is easier than NoSQL (whatever that is, I didn’t know it was a language) it provides one example of a SQL query vs the alleged equivalent in … MongoDB.

It claims that NoSQL (all of them) doesn’t scale better than MySQL (maybe just this one?) because … Facebook, Twitter and some others use MySQL??

It claims that ACID is universally what you want. Without any support for this. ACID is very complex to implement and inevitably, the abstraction leaks in certain situations. A common example is a careless administrative query that creates lots of locks and affects production queries increasing their latency; it turns out that using ACID effectively at scale is not easy either. And this:

with SQL you can be 100% certain that either your exact intent gets reflected in the database or nothing happens at all.

Is simply not true

The truth is that which database you choose depends on your use case. If you choose blindingly, you have higher chances to be wrong. This posts encourages you to blindingly choose MySQL. Posts like this don’t help you choose wisely.

1. 8

I just want to point out that using UUIDs takes almost no thought in any other language.

1. 6

This is the thing that kills Haskell for me. I enjoyed learning the language, and on some level want to like it, but every little thing is an ordeal.

1. 3

as is everything in haskell. Yesod’s still awesome though. I’m so happy to see a Yesod article for once!

1. 1

I’m curious, why is Yesod awesome? In particular, what does it do better than all other web frameworks?

1. 1

fine, you got me there. Probably, just because it’s written in haskell.

I suppose if there was something I could complain about, it’s use of Shakespearean Templates especially made it hard to get moving quickly in.

But how about this, you have more assurances that you are writing a web application in pure functional code than you would be in any other framework, because of the language it’s written in.

2. 1

Yesod is notoriously opinionated (and, particularly their template haskelly ORM). If the author wanted to use them with other Haskell web frameworks (and less magic/opinionated database libraries), I doubt the author would have encountered any trouble at all.

1. 1

“Any other language” is a large set of languages, I am pretty sure there are some more where generating UUIDs requires some coding.

That aside, are you talking about the libraries available, or about the language itself? Generating UUIDs in Haskell is not hard if you don’t need to bend them to the Yesod use case. On the other hand, generating random UUIDs would take some time if you don’t have libraries to deal with pseudorandom sequences and entropy generation, for example.

1. 18

You see, when a Java programmer gets used to TDD, they start asking themselves a very important question: “Why am I wasting time satisfying the type constraints of Java when my unit tests are already checking everything?”

That was not my experience. Rather, I often thought: “why am I wasting my time writing tests for this, which could be avoided if the language had better type constraints?”

And that’s exactly what happened in second half of the first decade of the current millennium. Tons of Java programmers jumped ship and became dedicated dynamic typers, and TDDers.

This seems unsubstantiated, IMO. Bob implies, IMO, that all these Java developers “left” Java for dynamic languages, in order to become TDD developers which seems to me just a teensy bit absurd.

How will this all end?

My own prediction is that TDD is the deciding factor.

IMNSHO that is Uncle Bob’s prediction because Uncle Bob’s income relies on harping on about TDD like a broken record. Often nonsensically—like in this post.

You don’t need static type checking if you have 100% unit test coverage.

This is just plain false. Making sure your function has 100% unit test coverage does nothing to help you if someone calls your function with unexpectedly typed data. Say your function expects two integers, but are called with a dictionary and a list. Or another function. Sure, you can add type checks to your function, and more unit tests to verify that those type checks catch the unexpected types, but that’s a lot of boiler plate to add everywhere.

“Oh, but my function is not useful enough to be included in a library used by others, and I do have 100% coverage of my app, so I assert that it is not called with unexpected types.” Right. Does your app get data from anywhere? Input from a form? Environment variables that people can set before running your app? Are you sure that the input to the function does not come from inadequately sanitised data? If you were using Perl you could activate taint mode, but it looks like the equivalent in Python was rejected in 2009. I don’t know if Ruby has anything equivalent. And even if you are using Perl, there’s a lot of ways your function can fail that taint mode won’t help you catch.

1. 1

Ruby does have taint mode; I’ve never seen anyone ever use it.

1. 6

The trouble is that the OS - when that OS is unix - is very much oriented towards passing around streams of bytes and reinterpreting them willy-nilly. Using the OS is exactly how Shellshock happened, and to continue to use it this way is to invite similar vulnerabilities in the future.

I could have solved it in Python, figuring out a library for doing SSH, hoping it’s async or using a bunch of threads and using in-memory data structures to store the data. But parallelizing the whole thing was a matter of adding -P10 to the xargs command, almost too easy. I had some edge cases in what the data looked like, so having the data from each phase on disk made debugging and doing the next phase easy. By embracing the OS, I actually got a smoother experience than I likely would have otherwise.

The composability of unix utilities is to be applauded. This experience is what every language should be aiming for. But there’s no reason you can’t accomplish this in a safer language. In Scala I’d get close - parallelizing would just be a matter of dropping in a `.par`, I could handle edge cases in the REPL. No doubt Haskell or others could get closer. It’s possible to do this without sacrificing safety.

1. 2

The trouble is that the OS - when that OS is unix - is very much oriented towards passing around streams of bytes and reinterpreting them willy-nilly. Using the OS is exactly how Shellshock happened, and to continue to use it this way is to invite similar vulnerabilities in the future.

What do you mean by that? In my view bash != OS, bash is just a crappy shell that uses a crappy scripting language and that was used in a crappy way to run CGI programs. Most of the shellshock impact was derived from poor practices, not from using simple programs that take advantage of the OS functionality.

You will run any program in the OS anyway, I think the merit of the proposed solution is that it composes from simple pieces rather than building a unique, but larger and more complicated one.

1. 2

But how else does the composition work? The article talks about composing things like `xargs`, `sort`, `uniq`, with `ssh` for distributing the work. That puts you solidly in shell-scripting territory. At the very least, in territory where you have to be careful about the finer points of POSIX rules about quoting, record terminators, etc.

1. 2

Composition works by:

• connecting input and output streams of components

• using files for persistency

That is often done with shell scripts, but is not mandatory. I used that approach a lot using erlang ports, then erlang acts as orchestrator. You could still launch the processes you compose using a shell (per process), which is typically done just as an easy way to control file descriptor redirections, but quoting would only be a problem for the shell arguments. You can even avoid that by execing instead of using a parent shell.

I don’t think that is specific to erlang, you can use the same idea with popen, for example, though I recall some popen implementations to be a bit quirky, and elrang concurrency approach make it more natural to compose several small tools into higher level applications.

All in all, I discourage any kind of complicated shell script, but I believe composing small, focused programs is a safe and nice approach for many common problems.

1. 2

If you take it really generally as a programming model, I can buy that. I read the original article as arguing something different, more about using the classic (and built in, unlike erlang) Unix approach to processing based on pipes, I/O redirection, and a suite of built-in utilities. Especially the example of `xargs` is really hard to make sense of outside of that context. If you were writing erlang, I’m having trouble imagining why you would ever call `xargs`; it’s really a tool for use in shell pipelines.

1. 1

Yes, xargs is probably specific to shell scripting. One thing I like about this approach is that it is possible to debug the individual components (and even test them) in the shell. I that case tools like xargs, tee, etc are handy.

2. 2

You will run any program in the OS anyway, I think the merit of the proposed solution is that it composes from simple pieces rather than building a unique, but larger and more complicated one.

The proposal is to do composition via pipes and the filesystem (presumably writing a script to orchestrate them). I don’t think this this is fundamentally more “composing from simple pieces” than a program written using something like fs2 style would be.

In my view bash != OS, bash is just a crappy shell that uses a crappy scripting language and that was used in a crappy way to run CGI programs. Most of the shellshock impact was derived from poor practices, not from using simple programs that take advantage of the OS functionality.

If you’re doing composition via pipes and/or the filesystem then what you transfer between successive stages necessarily has to be streams of bytes. Any control instructions have to be mixed with data via “magic” data values - exactly what lead to shellshock. I agree that it’s poor practice, but using the OS functionality necessitates that practice.

1. 3

The proposal is to do composition via pipes and the filesystem (presumably writing a script to orchestrate them). I don’t think this this is fundamentally more “composing from simple pieces” than a program written using something like fs2 style would be.

I don’t think shell scripting is necessarily a part of the proposal. My understanding is that the relevant bit is composing small programs that work over bytestreams and bypassing any complicated storage as long as the file system is good enough.

I am not familiar to fs2, but my understanding after a quick perusal is that it implements the same concept, but instead of using independent processes it provides functions native to the orchestrating language (scala in this case). The different there is that you don’t get process isolation, which I think is valuable if you can get away with it (sometimes is not possible for performance reasons).

If you’re doing composition via pipes and/or the filesystem then what you transfer between successive stages necessarily has to be streams of bytes. Any control instructions have to be mixed with data via “magic” data values - exactly what lead to shellshock. I agree that it’s poor practice, but using the OS functionality necessitates that practice.

What lead to shellshock, in my view, was an old dubious feature of a given, and not particularly safe, shell. That feature basically involves executing arbitrary streams as code. The main shellshock exploit, that is Apache’s CGI further involves exporting unsanitised user input (http headers) as variables to be used by a shell with potentially high privileges. I cannot find a clear link from that to implying that independent command composition over data streams is an unsafe practice, all the steps that led to shellshock are bad practices yes, but also not the practices you need to compose programs in “unix style”.

1. 2

I am not familiar to fs2, but my understanding after a quick perusal is that it implements the same concept, but instead of using independent processes it provides functions native to the orchestrating language (scala in this case). The different there is that you don’t get process isolation, which I think is valuable if you can get away with it (sometimes is not possible for performance reasons).

OS-level process isolation only really provides memory isolation IME, which is a lot less important in a memory-safe language. (File handles and I/O bandwidth can still be an issue; the OS does share CPU up to a point but it’s very easy to get priority inversions and the like where suddenly you do need to be concerned about what the other processes are doing. You can use processes for resource management up to a point (i.e. rely on exiting to close file handles) but I don’t think that’s any easier than doing it in a programming language - you have to worry about the worst-case for how long your processes live and the like).

A zero-shared-memory model is a big advantage for consistent performance, but as you say it means you incur the overhead of serializing everything; even more than the runtime performance, until recently I found that serialization of sum types (tagged unions) was unpleasant at the code level in the main options (json/thrift/protobuf). Maybe it’s got better. If you allow your processes to share memory then that works but you’re not getting much advantage from the process isolation. I understand the Erlang runtime manages green processes that have an explicit distinction between shared and non-shared heaps; I don’t have the experience to know if that’s useful in practice, but it seems like it shouldn’t be in theory - compared to Scala et al you’re moving some things out of the shared heap into private heaps, but as long as you’re using a shared heap you will have the problems of shared heaps.

What lead to shellshock, in my view, was an old dubious feature of a given, and not particularly safe, shell. That feature basically involves executing arbitrary streams as code. The main shellshock exploit, that is Apache’s CGI further involves exporting unsanitised user input (http headers) as variables to be used by a shell with potentially high privileges. I cannot find a clear link from that to implying that independent command composition over data streams is an unsafe practice, all the steps that led to shellshock are bad practices yes, but also not the practices you need to compose programs in “unix style”.

Arbitrary streams are the only thing that unix has, so everything is passed around as them, necessarily including both user input and things that will be executed as code. I took the original post to be arguing for using specifically unix pipes and unix utilities - because if it’s just advocating composing together small reusable components then you can do that equally well in Python. In any case I think you and I agree on what the best practice is.

1. 2

I do think that memory isolation is very important, as it is process isolation. That keeps failure modes way more contained. Type safety across components would be a nice addition, and you can get type safety within the components just by choosing the right language.

Type safety in the protocols the components use to communicate is more or less difficult depending on how much control you have over them, but you can always engineer the protocols so that they are checked against contracts, at the very least.

What having these small, isolated components as building blocks buys you is simplicity. Thus, it is easier to find out which part is misbehaving if something goes wrong, you can just take that part and poke with it catting to it, grepping from it, you can watch how many resources it consumes with top, poke at the descriptors it has opened in /proc, kill it if needed, etc, etc. Plus they tend to compose very well with just reasonable designs. Or at least that is my experience.

A minor clarification about Erlang. The Erlang programming model presents you with memory isolation, processes share nothing. The VM does share large binaries, but that is not visible from the code (unless some very specific situations where performance can be affected if certain programming patterns are not followed). Still, an Erlang application is still a single OS process, so from the perspective of this post, if you programmed the logic in Erlang (instead of ugly shell scripting), you’d have the Erlang process, and all the other components running as separate processes. I’ve built a couple of systems that way and I think it worked great

1. 17

I thought this was pretty well-written, and I could imagine people that previously said “I have nothing to hide” thinking twice after reading this. Obviously if tracking was done in the physical world like he described, nobody would tolerate it but because it happens so secretively in the digital world, many people don’t know it’s going on so they aren’t really bothered by it.

Perhaps the key to getting many more people to care about privacy is to make this tracking more obvious and visible to them, so they can actually see what is being collected, rather than technical solutions that try to block tracking.

1. 9

I thought this was pretty well-written, and I could imagine people that previously said “I have nothing to hide” thinking twice after reading this.

I suspect it requires a certain level of imagination and knowledge about human nature, rather than technical skills, to care about privacy.

I had this debate recently with family members who have nothing to hide. One particular family member loses her mind over pictures of her posted to social media if she doesn’t like how she looks in them. But she has nothing to hide, nor will her young kids ever have.

I had to go the John Oliver-vs-Edward Snowden-route of asking for a list of the medicine they take and a list of the previous 10 orgasms they had anything to do with. And even then, they weren’t sure privacy was a general problem: They only had issues with providing that specific information about them, not understanding what privacy really is.

Finally, I just had another beer.

1. 3

Ghostery does a good job at this, and his swipe against them is relatively unfounded. From my reading of their settings page, it seems they sell aggregate information about trackers and provide it to website owners (i.e. these are the trackers on your site, and this one is blocked by 100% of your Ghostery visitors).

1. 1

I was having more subversive thoughts… a whistleblower at or a compromise of a tracking company to disclose this information. Or maybe just regulation forcing these companies to allow the public to request their profile data.

1. 2

There are quite some challenges to allowing the public to request profile data. Usually the ad tech company doesn’t actually have your name. You’re just a cookie, browser user #32198. Should anyone be able to request the profile for user #32198? The cookie itself could be the identifier, unless forged of course, and again privacy issues with shared browsers.

Consider for a moment that nearly every web site keeps visitor logs for some amount of time, which usually include the requesting IP. These would presumably fall under any profile access law. Should I be able to query every website, “when did my IP last visit your site?”?

1. 2

Surely if they have some long-lasting profile on you based on cookies, you should be able to visit some site served by that company that recognizes those cookies and shows you what info they have, even if it’s not tied to your name.

1. 3

Again, I think there are unforeseen privacy implications. Clearing local history is one thing, enabling users to access a remotely backed up history is another. Isn’t the assumption that these companies can reconstitute and combine profiles after the fact? So even if you clear your cookies, anyone you let use your browser would have full access to your history. Anyone who can spoof a similar enough profile to cause merging would also have full access to your profile.

2. 1

Perhaps the key to getting many more people to care about privacy is to make this tracking more obvious and visible to them, so they can actually see what is being collected, rather than technical solutions that try to block tracking.

I think this would be a good step. Technically, tracking could improve the services we as users get from applications. The information it provides to parties we don’t control (the companies) is, however, very tempting to misuse and sadly very unimaginatively used mostly for ads, which are almost always annoying since very few (if any) have found a way of making them actually useful (otherwise people wouldn’t want to block them).

1. 2

Posted this there, waiting for moderation:

There are a few points in your analysis that are incorrect. An interviewer familiar with this test will beat on those:

• A random string is not a hash. The fundamental property of hashes is that they are functions from a domain (say strings) to a smaller domain (say strings of 5 characters). The same input produces always the same output.
• Neither hashes nor random strings of 7 characters would work in practice. You calculated how many URLs you can store, but that assumes you can efficiently use all that space. Once you have a few urls stored in your system, you will have clashes. Avoiding those clashes is non trivial in a globally distributed system. Even worse, random strings will will hit that limit much earlier if the system is popular since every requests generates a new key. Conversely, with actual hashes, the likelihood of clashes depends on the amount of urls you store. Since you have chosen a fairly long identifier, you ’d be “more or less” safe until you store about 100.000.000 URLs, but that goes against the URL being “tiny”. You could store the 644000000 urls using 5 character identifiers, so your identifiers are currently too big.

• When you suggest using a “better function” pointing out to actual hashes. Both functions that you suggest are bad examples. CRC is designed for cheap error detection, it’s distribution is less than ideal and is easy to attack (an attacker could very easily generate many different urls that hash to the same crc). SHA-1 is better, but has also know attacks.

• I doubt there is a cost penalty for the database depending on which type of ID you use. The cost of random IDs is that you need to check for clashes which requires a transaction, which is not really possible in a globally distributed system.

• I got confused when you introduced the idea of using GUIDs. GUIDs won’t work, they are simply too large (to guarantee uniqueness). But then you move to incremental ids, I assume that you refer to a counter that increments per time you write an entry. While that could work, there is much more to that. Generating a unique sequence of ids in a distributed system is not trivial either, you’ll need to find your ways around that. However, once you do that, why are you adding an extra hash to the ID? It is already unique.

• You don’t only need distribution due to scalability issues (which I doubt would be much of a problem here) but due to both availability and latency. If the system is global, you don’t want people in Australia connecting to some datacenter in Ireland, plus you need to cope with network and server failures. Distributing the solution is the meat of this problem, if you could solve it with only one computer is pretty trivial.

1. 4

My optimistic suggestion would be that the intersection of people who care about lower defect rates than you achieve with a small pile of manual unit tests and people using Python is close to empty. It’s pretty easy to use a good type system these days, and once you have that, for any formalizable correctness property why would you ever not lift that property into the type system where it’s not just tested but proven correct?

My pessimistic suggestion would be some combination of no-one caring about software quality and no-one actually doing anything nontrivial with software.

1. 4

It’s pretty easy to use a good type system these days, and once you have that, for any formalizable correctness property why would you ever not lift that property into the type system where it’s not just tested but proven correct?

This seems like a gross oversimplification. Isn’t the answer obvious? Because not all type systems (even “good” ones) are capable of lifting all properties into the type system.

Alternatively, there exist properties that aren’t easily formalizable, but are still worth testing. (I wish I could give an example, but I’m not really sure what you have in mind.)

Alternatively still, I contest the notion that just because one can lift a property into the type system doesn’t mean one should do it. Depending on your type system and the nature of your invariants, it might make the code a lot more complex or harder to use, which should absolutely be weighed against the benefits of correctness.

1. 2

Because not all type systems (even “good” ones) are capable of lifting all properties into the type system.

In theory maybe. In practice I never seem to hit anything important and formalizable that I can’t express via types.

Alternatively, there exist properties that aren’t easily formalizable, but are still worth testing. (I wish I could give an example, but I’m not really sure what you have in mind.)

I’m thinking that in any case where I could formalize an invariant well enough to test it in Hypothesis style I could likely just put it in the type system. Rather than me giving an example, do you have an example of something that’s so much easier to test that way?

Alternatively still, I contest the notion that just because one can lift a property into the type system doesn’t mean one should do it. Depending on your type system and the nature of your invariants, it might make the code a lot more complex or harder to use, which should absolutely be weighed against the benefits of correctness.

Hmm. On the assumption that it’s going to be a similar amount of code either way, I’d rather have it in the types than in the tests - the feedback cycle is shorter, and new code is automatically covered rather than needing to write new tests, which to my mind makes that new code simpler to write, at least for the same level of correctness. Sometimes correctness enforcement isn’t worth it, but for cases where it is, I don’t see tests being simpler.

1. 5

In theory maybe. In practice I never seem to hit anything important and formalizable that I can’t express via types.

I find this to be an incredibly suspicious claim. There is an entire field of study whose main purpose is to find new ways use types to make stronger guarantees. Has all progress in this field been reduced to something purely academic?

I’m thinking that in any case where I could formalize an invariant well enough to test it in Hypothesis style I could likely just put it in the type system. Rather than me giving an example, do you have an example of something that’s so much easier to test that way?

Well, the classic example is abstract syntax. How can you use the type system to guarantee that there are no expressible inhabitants that your evaluator can’t handle? This comes up a lot in practice for me. Depending on how complicated your invariants are, it’s usually not that hard to write quickcheck style `Arbitrary` impls to generate and shrink inhabitants.

Of course, there are ways to move these guarantees to the type system. GADTs is one way. Kiselyov has a phenomenal paper on it (“Typed Tagless Final Interpreters”). I bet there are other ways that you might educate me on, but I’m at the boundary of my knowledge here.

Hmm. On the assumption that it’s going to be a similar amount of code either way, I’d rather have it in the types than in the tests - the feedback cycle is shorter, and new code is automatically covered rather than needing to write new tests, which to my mind makes that new code simpler to write, at least for the same level of correctness. Sometimes correctness enforcement isn’t worth it, but for cases where it is, I don’t see tests being simpler.

I guess if your assumption holds, then maybe I’ll grant you that. I’m not sure how good that assumption is though. More types (possibly a lot more, depending on how expressive your type system is) can be required, for example.

1. 1

I find this to be an incredibly suspicious claim. There is an entire field of study whose main purpose is to find new ways use types to make stronger guarantees. Has all progress in this field been reduced to something purely academic?

I think most of the research is about doing this without needing any turing-complete pieces (i.e. doing this kind of analysis in a way that’s complete as well as consistent). That’s a laudable goal and enables many worthwhile things, but not directly necessary for correctness.

I mean given Curry-Howard, research can only really be about a) proofs that are known but can’t be expressed in current type systems - very rare (probably impossible since it’s a mechanical translation?) b) conjectures for which no proof is known - not something I’ve ever had to use directly when programming, and not cases that I can see Quickcheck-style checks helping with (maybe it’s more useful for exploratory programming?) c) things that aren’t directly about correctness - e.g. I see a fair bit of research about efficient representations, but that’s not going to be a concern for anyone for whom Python is an option.

Well, the classic example is abstract syntax. How can you use the type system to guarantee that there are no expressible inhabitants that your evaluator can’t handle? This comes up a lot in practice for me. Depending on how complicated your invariants are, it’s usually not that hard to write quickcheck style Arbitrary impls to generate and shrink inhabitants.

Can’t you just define a fold/visit method? (and then rely on the parametricity theroem, i.e. if it implements a method that takes a `Visitor<V>` and returns a `V` then it must be one of the cases of the visitor, barring bottoms). Or am I missing something?

1. 3

I think most of the research is about doing this without needing any turing-complete pieces (i.e. doing this kind of analysis in a way that’s complete as well as consistent). That’s a laudable goal and enables many worthwhile things, but not directly necessary for correctness.

I mean given Curry-Howard, research can only really be about a) proofs that are known but can’t be expressed in current type systems - very rare (probably impossible since it’s a mechanical translation?) b) conjectures for which no proof is known - not something I’ve ever had to use directly when programming, and not cases that I can see Quickcheck-style checks helping with (maybe it’s more useful for exploratory programming?) c) things that aren’t directly about correctness - e.g. I see a fair bit of research about efficient representations, but that’s not going to be a concern for anyone for whom Python is an option.

Where did this come from? All you said was “… In practice I never seem to hit anything important and formalizable that I can’t express via types.” To me, this suggests that you think there are no practical benefits to come from further study of type theory. The whole point of the field is to make it not only possible, but convenient to make stronger guarantees using types.

I also find your rigidity to be rather distracting. Just because a mechanical translation is possible (which might require an order of magnitude more code) doesn’t detract from my point at all.

Can’t you just define a fold/visit method?

But how do you prevent construction of nonsense terms? Jones, Washburn and Weirich explain it better: http://www.msr-waypoint.net/pubs/65143/old-wobbly.pdf — You don’t need to read the whole thing (although it is a good read), but I think just the background section is enough to describe what I have in mind.

And Kiselyov’s paper is also great for an alternative solution to this problem (type classes instead of GADTs).

It’s pretty easy to use a good type system these days, and once you have that, for any formalizable correctness property why would you ever not lift that property into the type system where it’s not just tested but proven correct?

Frankly, I find your claim that quickcheck is essentially useless to be rather extraordinary and very controversial. From where I’m standing, you’re completely dismissing a rather large body of evidence that suggests otherwise. (e.g. QuickCheck is used quite a bit among Haskell programmers.)

Other things I find quickcheck useful for:

• Testing algebraic laws. For example, does your implementation of `Monad` satisfy the monad laws? Does my implementation of a min-heap satisfy the heap invariant after any sequence of insertions or deletions?
• Testing the equivalence of two different implementations of the same idea. For example, does my implementation of regular expressions match the output of a different implementation of regular expressions given the same inputs?
• Testing the invariants defined in the contract of functions. For example, if my function’s documentation says, “the output will satisfy property X on any input,” then that seems like a natural use for quickcheck if that property is hard or impossible to move into the type system. (For example, a “sequence of sorted non-overlapping, non-adjacent Unicode codepoints.”)
• If I have an AST for a regular expression, print it, and then parse it, do I get back the original AST? That’s pretty straight-forward to test with quickcheck, but I have no earthly clue how you’d move that into the type system.
• I wrote a recursive directory iterator, and I wanted to make sure that I could describe a directory hierarchy in memory, create it, walk it and get back the same directory hierarchy that I started with. Quickcheck shined here.
• If you’re writing a linear algebra library, then there is probably no limit to the number of properties one might want to test.
1. 1

All you said was “… In practice I never seem to hit anything important and formalizable that I can’t express via types.” To me, this suggests that you think there are no practical benefits to come from further study of type theory. The whole point of the field is to make it not only possible, but convenient to make stronger guarantees using types.

Ok, maybe I am saying that. I don’t have any particular insight into why people are researching or funding research into type theory. I mean I find e.g. alternatives to the boilerplate of monad transformers to be a fascinating idea (I’m working on an implementation of the “freer monads, more extensible effects” paper in my spare time), but I can’t say I’ve actually found the boilerplate prohibitive in practice. So yeah, ok, I’ve never encountered an open problem in type theory in practice, and from my point of view type research probably is of purely academic interest at this stage.

I also find your rigidity to be rather distracting. Just because a mechanical translation is possible (which might require an order of magnitude more code) doesn’t detract from my point at all.

Shrug. I’ve not seen an order of magnitude blowup (at least, not in a way that couldn’t be encapsulated) in practice. I’ve occasionally had to copy-paste code because a language didn’t have kind polymorphism, but not often enough that I feel it’s particularly important.

But how do you prevent construction of nonsense terms? Jones, Washburn and Weirich explain it better: http://www.msr-waypoint.net/pubs/65143/old-wobbly.pdf — You don’t need to read the whole thing (although it is a good read), but I think just the background section is enough to describe what I have in mind.

AFAICS the example in the background section is straightforward to implement in any good type system, or even an old and creaky language like Java? All it requires is polymorphism and basic generics, and any serious typed language in this day and age will have those.

Frankly, I find your claim that quickcheck is essentially useless to be rather extraordinary and very controversial. From where I’m standing, you’re completely dismissing a rather large body of evidence that suggests otherwise. (e.g. QuickCheck is used quite a bit among Haskell programmers.)

You’re right yeah. Thinking about it my claim is pretty much that by using this kind of technique one can implement a spec correctly (i.e. such that the only bugs are ambiguities in the spec, or behaving-as-specified) without using QuickCheck-like tools, which is an extraordinary claim and you should be skeptical. It is my experience, but I don’t know how I could even begin to convince you.

For example, a “sequence of sorted non-overlapping, non-adjacent Unicode codepoints.”

Sounds simple enough to, if not prove in the type system per se, just not allow the relevant type to be constructed any other way.

If I have an AST for a regular expression, print it, and then parse it, do I get back the original AST? That’s pretty straight-forward to test with quickcheck, but I have no earthly clue how you’d move that into the type system.

I’d use the same structure to define the parser and the printer - parser combinators are quite close to that already. Looking around there seem to be some Haskell libraries and papers taking that approach.

Testing algebraic laws. For example, does your implementation of Monad satisfy the monad laws? Does my implementation of a min-heap satisfy the heap invariant after any sequence of insertions or deletions?

Testing the equivalence of two different implementations of the same idea. For example, does my implementation of regular expressions match the output of a different implementation of regular expressions given the same inputs?

If you’re writing a linear algebra library, then there is probably no limit to the number of properties one might want to test.

These sound like cases where this kind of testing would be useful. I think there’s a common thread of equivalences that one doesn’t necessarily know how to quotient out. I’m used to working in a style where the data has a defined canonical form (or else, defining one). I see some commonality with the stuff I’ve been saying about (Turing-complete) functions and preferring to represent operations as data that can be compared for equality. If you have something that you can only represent as a function then this kind of testing is perhaps the best (only?) way to compare it. Maybe that’s a better (or at least more practical) approach than trying to eliminate functions entirely.

Food for thought. Thanks.

1. 1

Ok, maybe I am saying that. I don’t have any particular insight into why people are researching or funding research into type theory. I mean I find e.g. alternatives to the boilerplate of monad transformers to be a fascinating idea (I’m working on an implementation of the “freer monads, more extensible effects” paper in my spare time), but I can’t say I’ve actually found the boilerplate prohibitive in practice. So yeah, ok, I’ve never encountered an open problem in type theory in practice, and from my point of view type research probably is of purely academic interest at this stage.

I think we’re still talking past each other. Here’s a snippet from the SPJ paper I linked that is exactly what I have in mind here:

This use of explicit constraints has the advantage that it can be generalised to more elaborate constraints, such as subtype constraints. But it has the disadvantage that it exposes programmers to a much richer and more complicated world. In keeping with our practical focus, our idea is to see how far we can get without mentioning constraints to the programmer at all – indeed, they barely show up in the presentation of the type system. Our approach is less general, but it has an excellent power-to-weight ratio.

It’s an explicit acknowledgment of the trade offs involved here, which is critical to developing type theory for practical use. There are all sorts of things that we can prove using types, but if it’s really hard to do it, then you’ve lost something that you might not be willing to give up.

That you’re talking about “open problems in type theory” is an enormous distraction from what I’m saying. GADTs didn’t solve any open problems (that I’m aware of?), but they undoubtedly made it easier (for some definition of “easier”) to prove certain properties about one’s code using types.

AFAICS the example in the background section is straightforward to implement in any good type system, or even an old and creaky language like Java? All it requires is polymorphism and basic generics, and any serious typed language in this day and age will have those.

No. The evaluator is proven correct in the type system. Without that proof, you need to fall back to unit or property based tests to make sure your evaluator doesn’t do something horrible when it’s given an invalid input. GADTs specifically rule out the possibility of an invalid input in the type system.

Something similar might be accomplished by using privacy and combinators, where any combination of combinators is guaranteed to produce inhabitants of your AST that your evaluator can handle. But how do you know that’s true? If you got it right, then certainly, for callers it’s true. But internally? Maybe a certain permutation of your combinators can produce an inhabitant that your evaluator chokes on. Then again, I could definitely see how the combinators might be so simple as to be obviously correct, but I don’t think it’s something that is proven by the type system itself.

Sounds simple enough to, if not prove in the type system per se, just not allow the relevant type to be constructed any other way.

But how do you test that your construction is correct in the first place?

I’d use the same structure to define the parser and the printer - parser combinators are quite close to that already. Looking around there seem to be some Haskell libraries and papers taking that approach.

But your approach doesn’t replace a property based test because you still can’t guarantee that printing is correct. For example, perhaps my printer doesn’t write the syntax correctly, or misquotes a character, or… And it’s not necessarily straight forward to couple parsing and printing since there can be multiple parses the produce the same AST, but printing the AST is always going to produce exactly one variant. There are algebraic laws at play here! Look:

``````parse(print(expr)) == expr
``````

But this does not necessarily hold:

``````print(parse("...")) == "..."
``````

I very much want to test the first property!

These sound like cases where this kind of testing would be useful. I think there’s a common thread of equivalences that one doesn’t necessarily know how to quotient out. I’m used to working in a style where the data has a defined canonical form (or else, defining one).

That is quite convenient. It might help to clarify this earlier!

I see some commonality with the stuff I’ve been saying about (Turing-complete) functions and preferring to represent operations as data that can be compared for equality. If you have something that you can only represent as a function then this kind of testing is perhaps the best (only?) way to compare it. Maybe that’s a better (or at least more practical) approach than trying to eliminate functions entirely.

That’s a good insight I think! I might buy it.

1. 1

It’s an explicit acknowledgment of the trade offs involved here, which is critical to developing type theory for practical use. There are all sorts of things that we can prove using types, but if it’s really hard to do it, then you’ve lost something that you might not be willing to give up.

I guess. I stand by “if it’s formalizable at all, I’ve (personally) not (yet) found it hard to express in types”.

Something similar might be accomplished by using privacy and combinators, where any combination of combinators is guaranteed to produce inhabitants of your AST that your evaluator can handle. But how do you know that’s true? If you got it right, then certainly, for callers it’s true. But internally? Maybe a certain permutation of your combinators can produce an inhabitant that your evaluator chokes on. Then again, I could definitely see how the combinators might be so simple as to be obviously correct, but I don’t think it’s something that is proven by the type system itself.

If you define a visit/cata method on the interface, where the visitor has to handle all the valid cases, then you have parametricity theorem style assurance that the valid cases are the only cases, don’t you? And then you’d define the evaluator in terms of that method.

But how do you test that your construction is correct in the first place?

The same way - either its invariants are obviously true, or break them down (by representing them as types) until they are.

And it’s not necessarily straight forward to couple parsing and printing since there can be multiple parses the produce the same AST, but printing the AST is always going to produce exactly one variant.

To the extent that this is a problem, isn’t it exactly the same problem if you use property-based testing to test that all the AST -> string -> AST roundtrips are correct?

1. 1

I guess. I stand by “if it’s formalizable at all, I’ve (personally) not (yet) found it hard to express in types”.

Do you consider mathematical properties formalizable? How do you encode the various algebraic laws listed here into types? https://en.wikipedia.org/wiki/Matrix_(mathematics)#Main_types

If you define a visit/cata method on the interface, where the visitor has to handle all the valid cases, then you have parametricity theorem style assurance that the valid cases are the only cases, don’t you? And then you’d define the evaluator in terms of that method.

This sounds isomorphic to Kiselyov’s approach with type classes.

To the extent that this is a problem, isn’t it exactly the same problem if you use property-based testing to test that all the AST -> string -> AST roundtrips are correct?

I don’t see how. Writing that test requires production rules for the AST, a shrinker, a printer and a parser. There needn’t be any explicit coupling between the printer and the parser (other than one being a consumer of the AST and the other being a producer).

I mean, if your printer looks like this:

``````print Dot = "."
print Literal s = s
print Kleene e = print e ++ "*"
print Concat e1 e2 = print e1 ++ print e2
print Alternate e1 e2 = print e1 ++ "|" ++ print e2
``````

Then what exactly in the type system dictates that `"."` is actually the correct syntax for `Dot`? Nothing does, but a quickcheck property that tests `parse(print(expr)) = expr` can totally check that.

A real world regex engine has syntax more complex than that. For example, in addition to `Dot` you might also have `DotAny`, where the latter might need a special flag enabled to actually be used.

1. 1

Do you consider mathematical properties formalizable? How do you encode the various algebraic laws listed here into types? https://en.wikipedia.org/wiki/Matrix_(mathematics)#Main_types

The same way - you give the particular types their own, well, type, require suitable evidence when constructing instances, and supply relevant evidence derived from that evidence. The one tricky part is that simpler type systems don’t always let you generalize (similar to working in first-order logic where you can’t make “for all f”-type statements) - but you never actually need a for-all-like statement in programming because you’re always ultimately going to operate on values. So you can work in a kind of continuation-passing-esqe style where you represent a “for all a” statement as a function from a to the specific instance of the statement.

This sounds isomorphic to Kiselyov’s approach with type classes.

It may well be, but you don’t need typeclasses to do it; you can implement it in any language that has polymorphism and basic generics i.e. any serious typed language.

There needn’t be any explicit coupling between the printer and the parser (other than one being a consumer of the AST and the other being a producer).

There needn’t be, but why wouldn’t there be? Writing the parser means expressing enough information to print the object, so there’s no sense repeating that definition separately to write the printer.

1. 1

The same way - you give the particular types their own, well, type, require suitable evidence when constructing instances, and supply relevant evidence derived from that evidence. The one tricky part is that simpler type systems don’t always let you generalize (similar to working in first-order logic where you can’t make “for all f”-type statements) - but you never actually need a for-all-like statement in programming because you’re always ultimately going to operate on values. So you can work in a kind of continuation-passing-esqe style where you represent a “for all a” statement as a function from a to the specific instance of the statement.

Sounds like you’re making the code quite a bit more complex. I’m not interested in what’s theoretically possible here. I’m interested in what can actually be used. Frankly, I won’t buy this until I see it, so I think we’ll just have to disagree. I also wonder whether your approach can be done such that it is as fast as implementations that don’t need any testing. I find the lengths to which you are arguing against quickcheck as an effective tool to be extremely strange.

There needn’t be, but why wouldn’t there be? Writing the parser means expressing enough information to print the object, so there’s no sense repeating that definition separately to write the printer.

You can’t. We have `parse(print(expr)) = expr` but not `print(parse("...")) = "..."`. You could probably gain the second law with a sufficiently complex abstract syntax to capture all of the particulars of the concrete syntax, but that abstract syntax is now a lot more complicated and much harder to work with. Hence, putting invariants into the type system just isn’t always worth it. Proof of correctness isn’t everything, but if we can approximate it with property based testing while avoiding the complexities of moving as many invariants into the type system as possible, then maybe that’s just fine.

1. 1

Frankly, I won’t buy this until I see it, so I think we’ll just have to disagree.

Fair. I literally can’t remember the last time I saw a bug in my code that compiled that wasn’t also a bug/ambiguity in the spec (and I don’t think this is particular skill on my part - when I tried to write Python I put an embarrassingly large number of bugs live). But yeah, implausible claim I know.

I also wonder whether your approach can be done such that it is as fast as implementations that don’t need any testing.

No, not remotely. But again, presumably not a concern for cases when Python is an option.

I find the lengths to which you are arguing against quickcheck as an effective tool to be extremely strange.

Shrug. We all have our preferred tools/techniques. If you think you have a better way of doing something, isn’t it natural to argue strongly for that approach?

We have parse(print(expr)) = expr but not print(parse(“…”)) = “…”

Sure, but how does that change anything?

1. 1

No, not remotely. But again, presumably not a concern for cases when Python is an option.

Well, I don’t really care about Python here. I’m trying to poke at your argument that quickcheck is useless in languages with good type systems.

Shrug. We all have our preferred tools/techniques. If you think you have a better way of doing something, isn’t it natural to argue strongly for that approach?

Sure, but there’s a lot of data you seem to be shrugging off…

Sure, but how does that change anything?

It makes the “one unified definition” impossible? I sure would love to see a real regex parser that could prove roundtripping via the type system!

1. 1

Sure, but there’s a lot of data you seem to be shrugging off…

One of the eternal Hard Questions is what to do when the data is contrary to your own experience.

It makes the “one unified definition” impossible? I sure would love to see a real regex parser that could prove roundtripping via the type system!

I’m talking about something like https://github.com/pawel-n/syntax-example-json/blob/master/Main.hs , where you write a single definition of your syntax tree and then that single definition is used for parsing and printing, so roundtripping is necessarily correct.

1. 1

where you write a single definition of your syntax tree and then that single definition is used for parsing and printing, so roundtripping is necessarily correct.

I don’t see how that works when there are many different ways to write the same AST. For example, `(?s)foo.bar` is has an AST equivalent to `foo(?s:.)bar`. Similarly, `[a-c]` might have an AST equivalent to `a|b|c`.

As I mentioned previously, if you defined your AST such that the above wasn’t true (e.g., `[a-c]` always has a different AST than `a|b|c`), then perhaps one could take your approach. But now your AST is much harder to work with. To make it easier to work with, you probably need more types and a lot more code.

1. 1

For example, (?s)foo.bar is has an AST equivalent to foo(?s:.)bar. Similarly, [a-c] might have an AST equivalent to a|b|c.

They don’t have the same AST; their syntax is different. They’re semantically equivalent. You might define a “canonicalize”-type function that would transform ASTs such that equivalent ASTs output the same form (in which case roundtripping would be “free” but it would be desirable to prove that canonicalize was idempotent), or you might define a semantic representation (e.g. the compiled DFA) in which all equivalent values are identical (and then it would be a case of proving that the compiled representation roundtripped via the AST, i.e. that the conversion functions are partial inverses).

1. 1

They don’t have the same AST; their syntax is different.

No. Their concrete syntax is different. That doesn’t mean the AST must be different.

This is exactly my point. I could define an AST that is more faithful to the precise concrete syntax, but now I’ve created a lot more work for myself, including one type that expresses a very precise but unwieldy AST and another type that expresses a “canonical” and more convenient AST that can be used for analysis and/or compilation to a finite state machine.

Or I could have one AST and write property based tests.

1. 1

“AST” literally means abstract syntax tree - a tree whose structure corresponds to the structure of the syntax. What you’re describing is not having an AST.

To be able to make statements about the concrete syntax you need a formal representation of it, sure. But how could you ever reason about it without that?

1. 1

You can argue about definitions all you want, but the fact of the matter is that I have a tree that describes the syntax of a regex and multiple distinct concrete instances may map to a single instance of that tree. I pointed out long ago that this representation makes it difficult to prove round tripping via the type system. Conversely, it simplifies many other portions of the code such as analysis and compilation.

You’re diving down exactly the rabbit hole I’m trying to point out. To move more invariants into the type system requires more types, more code and more complexity in the API surface. I mentioned this several comments ago as a key trade off.

I don’t know how many times I have to say this, but the point isn’t “what can I prove using types,” but more, “what can I prove using types convenienty.” You’ve retorted that the two categories are “the same in your experience.” I’ve submitted several examples where they aren’t the same in my experience.

1. 1

I think the complexity is there already, and I’d want the code there already. Even if I were working in Python I’d want to have an AST as well as a more semantic representation - if nothing else, if you are doing testing it makes it much easier to test the logic if both input and output are first-class objects rather than strings. And I don’t see how you could possibly do property-based testing of the parser without having a structured representation of its input (I mean, presumably you would actually want to test some input strings that aren’t canonical representations, so you can’t just do all the testing by roundtripping - the case in which you can do that is precisely the case where you have an actual AST where the structure corresponds 1:1 to the textual structure).

2. 5

In theory maybe. In practice I never seem to hit anything important and formalizable that I can’t express via types.

Unless you program in Coq, how would you lift some evidence that a sort algorithm works into the type system? Note, OpenJDK’s sort algorithm was shown to be broken for some inputs relatively recently IIRC.

Another example is something like a Paxos or Raft implementation. This goes through several transitions and one needs to prove all transitions do not invalidate any variants. In Quviq’s quickcheck this is building a state machine model and running over it. Again, unless one is programming in Coq, how would you do that in the type system?

1. 1

Unless you program in Coq, how would you lift some evidence that a sort algorithm works into the type system?

Even in say Java you can do something like:

``````public interface CompareResult {
<V> V visit(CompareResultVisitor<V> visitor);
}
public interface CompareResultVisitor<V> {
V visitLessThan(LessThan lt);
V visitEqual(Equal eq);
V visitGreaterThan(GreaterThan gt);
}
public class LessThan {
public final int a;
public final int b;
LessThan(final int a, final int b) {
assert(a < b);
this.a = a;
this.b = b;
}
@Override
public <V> V visit(CompareResultVisitor<V> visitor) {
return visitor.visitLessThan(this);
}
}
//Equal and GreaterThan omitted lest I die of old age, I'd forgotten quite what writing Java is like
public class Compare {
private Compare(){}
public static CompareResult compare(int a, int b) {
if(a < b) return new LessThat(a, b);
else if(a == b) return new Equal(a, b);
else return new GreaterThan(a, b);
}
}
``````

And then you build on that e.g. you might have another method that only accepts a `LessThan`. If there are intermediate states as you’re sorting the list then you define a class to represent each one, and write the transitions such that they can only ever happen correctly (using visitors). You can take it as far as you like, e.g. in Coq you might define `a < b` as `∃ c > 0: a + c = b`, but you can do exactly that in Java, encoding the naturals with Church encoding or similar. In practice you generally choose a point at which things are “obviously correct” and assume they’re, well, correct.

This isn’t proof in the type system proper as you would have in Coq; you don’t have the same level of tool support, and you don’t have provable termination (e.g. you could “cheat” by defining `compare` as `return compare(a, b);` - or more subtly, do that only for one of the cases). You can avoid that by disallowing recursive calls and recursive types but in practice that’s impractical (e.g. you’d need to define a concrete type for each list size - even HList as usually defined can be “tricked” into creating a circular structure). Still, I don’t find these to be limiting in practice - even many dependent-typed languages allow recursive types which have the same limitation, and no-one actually writes the malicious infinitely-recursive counterexamples.You can reduce the problem of writing a correct implementation of your algorithm to only instantiating your “primitives” when their invariants hold (i.e. only creating a `LessThan` for `a < b` which is easy enough to verify by inspection - there is no public constructor so you only have check code in the same package, if you really need to protect against malicious classes being created in other packages you can split your project into OSGi modules), and if the invariant for any given “primitive” is too complex then you can reduce it to simpler ones in exactly the same way you’d do when writing a proof in Coq.

I can see that in high-performance code (like the main sort algorithm) you’d want to reuse the same class for multiple states and in that case separating the proof from the implementation code could be valuable - but I don’t think that’s a concern that applies to any problem where Python is an option.

1. 6

How does this prove a sort function sorts? The visitors are run at runtime. With asserts removed in a release build I can construct a `LessThan` that is not less than. So, if I want an invalid sort function to not compile, how does this accomplish that? I’m not much of a type theorist so perhaps I’m missing something obvious.

1. 1

The `visit` method is generic, so we have parametricity-theorem-style assurance that it calls one of the cases. Which in turn means it must provide one of the cases, which it can only do by either being one itself or calling `compare`. So although you can write a custom implementation of `CompareResult`, the only possible values are the ones defined here or a “lazy” value that’s isomorphic to them and preserves the correctness properties (assuming we disallow e.g. global mutable state).

That relies on `visit` not being implemented as bottom, e.g. `return null;` or `throw new RuntimeException();` or `return visit(visitor);`. The first two we can rule out by just disallowing those constructs everywhere (as with global mutable state); the last we either accept the risk for simple cases, or for complex mutual recursion cases we can require passing evidence that the recursion is getting smaller.

You can’t construct a `LessThan` that is not less than because the constructor is package-private. We can enumerate all the constructor calls within the package and verify by inspection that it’s only called when `a` really is less than `b` (i.e. treating the correctness of `compare()` as an axiom in our proof), or if we decide less-than-ness is too complex a property to be happy that the code is obviously correct then we break it down into simpler properties using the same techniques (which in this case probably means representing the numbers at the type level as `Zero` and `Successor`, like you do in Idris). Even in Coq you can only ever prove things under axiom assumptions.

1. 1

I still do not see how this actually proves a particular sort implementation is correct. Could you please elaborate on that? On top of that, your solution seems to get farther and farther away from something usable, if I am understanding it correctly (which I don’t think I am).

1. 1

I still do not see how this actually proves a particular sort implementation is correct. Could you please elaborate on that?

Given a proof that an invariant holds, you translate each step of the proof into a type; constructing that type requires the assumptions of that step, and that type provides evidence that the outcome holds (implemented using the inputs).

On top of that, your solution seems to get farther and farther away from something usable, if I am understanding it correctly (which I don’t think I am).

The further you go in formalizing it from very basic assumptions it the less practical it gets - I probably wouldn’t want to work with naturals implemented directly in real code. But whatever reasoning it is that convinces you that the invariant you believe in is true, you can encode that same reasoning (in the same terms) into the type system - the parts that you see as obviously true, you just leave as obviously true.

I’ve never had to implement something that I wasn’t convinced was correct (like I said, maybe these things make more sense in an exploratory programming setting where you might genuinely not know whether a given property holds - does that happen?), and pushing that reasoning into the type system enforces that the reasoning itself is correct (in as much as the conclusion follows from the premises). It’s still possible to make errors at the premise level, but I think when making that kind of mistake and using quickcheck-like tools I’d just recapitulate it in the generators.

1. 3

Given a proof that an invariant holds, you translate each step of the proof into a type; constructing that type requires the assumptions of that step, and that type provides evidence that the outcome holds (implemented using the inputs).

For sorting, I have a function like `int[] sort(int[])`

What are the types that would prove that `sort` actually sorts? In Ocaml, I can certainly construct a type that encodes the sorting between 2 elements, or 3 elements, but not for an arbitrary N elements. This is where dependent types come in to play as far as I am aware. If I understand the example you gave, you have encoded a relationship between two elements, but you haven’t shown how that proves an entire sequence of values is in a particular order or how to take that and correctly construct an array with that invariant.

A similar example I’ve seen is handling commiting something in Raft only after a majority of nodes have acked and update. Doing it for a fixed number of nodes is easy, you can encode all the states in a phantom type or similar. But doing it for an arbitrary number of nodes is not encodable directly in the type system (at least in Ocaml), unless I missed something trying to do it.

1. 1

What are the types that would prove that sort actually sorts? In Ocaml, I can certainly construct a type that encodes the sorting between 2 elements, or 3 elements, but not for an arbitrary N elements. This is where dependent types come in to play as far as I am aware. If I understand the example you gave, you have encoded a relationship between two elements, but you haven’t shown how that proves an entire sequence of values is in a particular order or how to take that and correctly construct an array with that invariant.

For arbitrarily many elements you do the CPS-like thing - the return type is something like ``` interface SortedListGreaterThanOrEqualTo<N extends Nat> { public <V> V cata(Visitor<N, V> visitor); }``` where `interface Visitor<N extends Nat, V> { V empty(); <M extends Nat> V cons(M head, SortedListGreaterThanOrEqualTo<M> tail, GreaterThanOrEqualTo<M, N> evidence); }`

In terms of showing a specific sorting algorithm works I’d start from a proof for that sorting algorithm - in practice if the problem was implementing sorting I’d probably write a mergesort, which might well be simple enough to be obviously correct.

1. 2

You still haven’t shown how you can prove a sorting algorithm sorts via the type system, at least in Java (I’m assuming we agree that something like Coq or Agda can prove such a thing in the type system). Your claim was:

It’s pretty easy to use a good type system these days, and once you have that, for any formalizable correctness property why would you ever not lift that property into the type system where it’s not just tested but proven correct?

And the counter argument is that no, you cannot, for all things. Expressing something like a sorting algorithm is correct in the type system requires a powerful type system. So far you have insisted that you can lift this into the type system but have not proven it. So please, lay down the law and show me that I am incorrect, or admit there is some portion of a program that you cannot express in the type system, and this is where something like property based testing adds value.

1. 1

If the task is “write a sorting algorithm in Java that you’re confident is correct, without using property-based testing” then I’m very happy to do that, but it probably doesn’t involve much that you’d call proof in the type system. But that’s the thing that would have to be difficult in order for property-based testing to be useful.

You want an actual proof, as a demonstration of the techniques? How basic do you want the axioms to be? (E.g. can I assume that the `<` operator does what it does?) And what, formally, do you want proof of - just that the result is ordered?

1. 1

Let’s not get too crazy, I’m just really curious how you would actually solve this problem without resorting to some kind of testing (and a sorting algorithm is a place where prop testing shines). In the code you’ve been showing, your code always goes through some black-hole of correctness information. Your claim earlier was that you haven’t met an invariant that you haven’t been able to lift into the type system

I assume that means you can define types such that an invalid sorting algorithm does not compile, as that is what almost everyone I’ve met means by that claim. Is that assumption correct? If so, how would you do this? Otherwise what do you mean by lifting to the type system?

1. 2

I assume that means you can define types such that an invalid sorting algorithm does not compile, as that is what almost everyone I’ve met means by that claim. Is that assumption correct?

If you mean by this “any edit to the source that preserved all type signatures would necessarily result in a valid sorting algorithm or something that doesn’t compile”, I think that’s theoretically possible in Java but not practical. I tend to think of representing some of the information by classes with sealed constructors as still lifting to the type system (I see it as similar to having an unchecked theorem in a more fancy typed system), but maybe that’s an important distinction I’m glossing over there. So I guess I’m making a couple of different claims:

1. The extra tools provided by a system like Coq or Agda are never strictly necessary for proving correctness of a concrete program. They make proofs a lot more practical and a lot more human-readable, but anything you can prove (about a single concrete program - I think they might have ways to talk about all possible programs that you can’t do in simpler languages) in them you could (at least in theory) mechanically-ish expand out to prove in any type system that has polymorphism and basic generics.

2. A programmer of moderate skill who is deliberately pushing invariants into the type system (and explicitly not writing for maximum performance) can, in practice, write correct-to-the-specification code without any kind of testing (beyond checking it compiles) in any reasonable typed language (again, polymorphism and basic generics).

I think an example is probably a good idea - I’ve already tried to describe what I’m talking about doing as clearly as I can - looking around for examples of mathematical proof they all seem too simple or too general though. I certainly wouldn’t want to try to express the TimSort verification paper in Java (I had a go at it in Scala once and that was hard enough), but that’s an array-twiddly performance-oriented piece of code where the invariants have dozens of conditional clauses.

Honestly I’m not sure how I ended up arguing for Java here, because I’m all in favour of using systems like Coq to make proving correctness easier. But even if you are tied to Java I think it’s still possible to use types to enforce correctness, and I think that’s more likely to be an effective approach than quickcheck-like techniques.

In particular the bug in JDK sorting was, as far as I know, found by applying formal methods to it, not by a quickcheck-like testing system. And in the scalaz-stream example that was posted recently the tester seemed to spend more time encoding their assumptions into the generative tester than actually finding any bugs with it. So one thing that would be interesting to see would be more examples of bugs found with that kind of tool, and whether the invariant that was violated could have been expressed directly in the code.

1. 4

Thanks for the thoughtful response.

I think they might have ways to talk about all possible programs that you can’t do in simpler languages) in them you could (at least in theory) mechanically-ish expand out to prove in any type system that has polymorphism and basic generics.

I think this is likely untrue, if you want to take a dynamic amount of input at any point. Consider how you would mechanically expand out into the type system that `List.add` results in a list that is the the `current_length + 1`. I think this is impossible in a language like Java without really impractical restrictions.

A programmer of moderate skill who is deliberately pushing invariants into the type system (and explicitly not writing for maximum performance) can, in practice, write correct-to-the-specification code without any kind of testing (beyond checking it compiles) in any reasonable typed language (again, polymorphism and basic generics).

I think this is likely untrue as well. Again, I think a `sort` function is a valid counter example here. It’s easy to construct a relationship between a fixed-number of elements, like your `LessThan` class, but establishing a relationship between a dynamic number of elements is not possible without a powerful type system. As far as I am aware Scala, Java, and even Haskell, do not have sufficiently expressive type systems (although maybe that’s changing with type families in Haskell). Even a simple mergesort, I think you’d be unable to prove correct using the type system of Scala or Java (although I know less about what magic type features Scala might offer).

But even if you are tied to Java I think it’s still possible to use types to enforce correctness, and I think that’s more likely to be an effective approach than quickcheck-like techniques.

I don’t know where this belief comes from and there doesn’t seem to be any reason to believe it’s true. Not because prop testing itself is special but because there seems to be a clear limit on what can be expressed in Java’s type system.

In particular the bug in JDK sorting was, as far as I know, found by applying formal methods to it, not by a quickcheck-like testing system.

This is absolutely correct. However this thread has not been about proving property based testing is useful but rather exploring your claim that you can move any invariant into the type system.

And in the scalaz-stream example that was posted recently the tester seemed to spend more time encoding their assumptions into the generative tester than actually finding any bugs with it

What’s the problem with this? If there are no bugs to be found then there are no bugs to be found. The post did prove that (for the properties the author cared about) the scalaz-stream code had the properties it claimed.

So one thing that would be interesting to see would be more examples of bugs found with that kind of tool, and whether the invariant that was violated could have been expressed directly in the code.

My best experiences with a property based testing has been validating state machine transitions. Specifically, QuviQs tooling lets one express operations that can be done on a state machine and invariants that must be held after the operation, then it performs a large number of them and tries to find minimal number of transitions when a fault is found. I’ve used this quite a bit in code that has to serialize data to the disk and be crash proof. Some of the state machine transitions where simulating various system crashes and then validate the data was readable (but perhaps missing entries). This found errors that were difficult to reason about, or poorly documented, or just unclear.

A lot of my experience with prop testing has also been finding inputs that I didn’t consider. In many cases it’s around user input that, as the developer, I’m unlikely to think of because it’s clearly wrong. One recent example of this is a program that takes two set of values, one being replaced by the other set. And the original code did not catch the case, because it make no sense given what the program does, when the user put the same value in both sets. Pretty obvious in retrospect but the prop testing found it pretty easily when we were not even testing for it directly.

Thanks again for the response. I still feel unsatisfied with your claim that you can lift any invariant to the type system. So far you’ve just seemed to insist that one can but haven’t demonstrated it. But it would probably require a lot of work to demonstrate you are correct, more work than is probably wroth it for a lobste.rs post, so I don’t expect it.

1. 1

I think this is likely untrue, if you want to take a dynamic amount of input at any point. Consider how you would mechanically expand out into the type system that List.add results in a list that is the the current_length + 1. I think this is impossible in a language like Java without really impractical restrictions.

May be my JVM bias showing. I think it’s possible to do this in Java by representing integers as `Succ<Succ<...<Zero>...>>`. Of course you’d never want to write out 100 in code in that form - but you’d never want to write a 100-item list literal either. If you’re just using these as type parameters they’ll be erased at runtime, so the overhead you get is compile-time-only and proportional to the size of the literals you use.

I would like to give an example of sorting when I get time - you’re right not to take my word for it - I’ll probably use Scala but java-izable Scala, if that makes sense.

1. 1

I just don’t see how that’s going to be possible. What would the type of an `append` look like for a List? Where we take two Lists of length m and n and we want to ensure the output is a List of length m + n. Each one of those are unique types and the third type is derived from the previous two types. AFAIK, you require dependent types to express that.

1. 2

In a dependent typed language we’d write something like:

``````interface FixedLengthIntList<N extends Nat> {
<M extends Nat> FixedLengthIntList<#(N + M)> append(FixedLengthIntList<M> other);
}
``````

With a simpler type system we can’t write the dependent type, but we can do what a dependently typed language would do by hand: we return a pair of `FixedLengthList<O>` and a witness/evidence `IsSum<M, N, O>`, for some (unknown) type `O extends Nat`. In Java we can write an existential type like that directly; otherwise we can do a CPS-like transform where we write the type “X<T> for some T” as “(X<T> => U for all T) => U for all U”.

It becomes our responsibility rather than the compiler’s to cart around the value of type `IsSum<M, N, O>` (representing the fact that M+N=O) to the point where we we actually use the fact that M+N=O. We have to implement any facts we use about `+` (e.g. associativity). We can either treat our `IsSum` as a marker that’s axiomatically correct, or we can implement everything by recursion: allow `IsSum` to be instantiated only for `M=0` or `N=0` or by adding 1 to each side of a previous `IsSum`. (In the implementation of `append` we’d build up the `IsSum` recursively at the same time we were doing the append recursively - in the base case one of the lists is empty and M + 0 == M, in the recursive case we can append a list that’s 1 longer to get a result that’s 1 longer). And in the opposite direction, `IsSum` implements a cata-like method that says that an `IsSum<N, M, O>` is either an `M=0` and an `N=O`, or a `IsSum<N, M-1, O-1>`. Then we don’t have to take associativity as an axiom, we can implement an associate method with a signature something like

``````static <A, B, C, AB, BC, ABC> IsSum<A, BC, ABC> associate(ab: IsSum<A, B, AB>, abc: IsSum<AB, C, ABC>, bc: IsSum<B, C, BC>);
``````

in terms of this cata-like method and the implementation will be provably correct.

2. 4

So one thing that would be interesting to see would be more examples of bugs found with that kind of tool, and whether the invariant that was violated could have been expressed directly in the code.

If you want examples, the authors of the Erlang version have found a decent number of non trivial bugs using it. Some references:

I think all of these (maybe but the last one) are fairly difficult to avoid just relying on type safety. Despite of the actual quality of the code (I haven’t looked at it), the first two ones are caused by concurrency, and specially the second depends on some semantics that are not trivial to describe formally, yet is practical to describe them as quickcheck properties.

The third family of bugs are arguably just quality problems of the compiler, but formally specifying a compiler is not a trivial task either, again quickcheck provides a cost effective approach to at least test it more intensively.

The fourth is the one I believe could be possibly avoided by using better coding techniques, yet using property based testing again helped covering more corner cases than the existing tests had done.

It would be interesting to actually know what could be the cost of avoiding those kind of errors using other techniques.

1. 1

Do you have transcripts or similar for either of the first two? I’m particularly intrigued by this idea of semantics that are hard to describe formally but easy to describe as quickcheck properties, because as far as I can see that’s the same thing?

If I’ve understood that compiler example correctly I’d be very surprised if it wasn’t caught by any type system. Proving that a compiler outputs correct code is hard, but enforcing that the tree is some valid tree at every stage is pretty easy.

1. 1

No, I don’t have much more than those links, and the vage memories from when we talked about the actual bugs. Those are all open source projects though, so it should be possible to find the patches in github.

1. 17

I’m seriously impressed by the way they abstracted away the significant incidental complexities of data storage, concurrency, ordering, and composition. I hope to see a lot more of this quality in programming.

1. 3

I hope to see a lot more of this quality in programming.

It seems like this is only possible in a few languages (maybe only Haskell, even). Do you think there is a reason to believe that is not true?

1. 5

Yes, I think these particular technical feats need some kind of non-strict evaluation (Haskell) or very powerful macros (Lisp). Some way for the program to introspect at a very fundamental level.

I’m not suggesting these languages should be used for device drivers or anything, though. When I say “quality”, I mean that they took a very thorough look at their problem and solved it thoroughly by making very smart tradeoffs. I want to see more business goals achieved with very thoughtful technical decisions.

1. 6

It seems like this is only possible in a few languages (maybe only Haskell, even). Do you think there is a reason to believe that is not true?

To get the same level of safety and composability, you’d be fighting uphill with anything that isn’t non-strict by default or doesn’t have explicit effects. Another bit is HKTs with ML modules replace but it can be harder to see the relationship between the types in the latter case. It seems like it would be pretty noisy to mix and discharge all the constraints in a module based system as well.

You could fake it, maybe, in another language. But with the same niceness, composability, and safety? I’m not sure how close you could get.

1. 6

I don’t see how strictness comes into it. This seems clearly applicable to OCaml, with no compromises I can see. I’ve talked a bit with Simon about this, and nothing Haskell specific came up.

2. 6

I don’t think it’s Haskell-specific. Purity is not a binary so much as a question of what effects you consider equivalent. Haskell is famous for explicitly sequencing I/O effects, but how often do you actually care about explicitly ordering all I/O? If you don’t care about that (which is not to say you don’t care about sequencing e.g. your HTTP API calls - just that for random log lines or local file reads you don’t care), Scala can do everything Haskell can and more (and I find the way Haskell hides the distinction between a value and a thunk for a value to be a much more important inequivalence). And OCaml or F# can do the same if you’re willing to use modules to replace HKTs.

I am slowly trying to put together a blog post with the thesis that functions are the biggest problem with modern functional programming, because you can’t compare functions for equality. We see an aspect of this in the presentation with their “applicative where possible” hackery. I think we may see a move towards free objects that make equational reasoning possible. If you can form an actual value - not a monadic value that might be hiding Turing-complete computation - to express your interpreter command, then it’s much easier to implement a reduction to canonical form (eliminating e.g. redundant fetches of the same data) before interpreting.

1. 4

I look forward to that blog post.

1. 3

Thanks - don’t hold your breath, mind, it could be months or years yet. At this stage it’s more an idea I’m turning around and trying to gather examples of. I think that free objects idea is one of the most important pieces though.

2. 4

I am slowly trying to put together a blog post with the thesis that functions are the biggest problem with modern functional programming, because you can’t compare functions for equality…

I’m having trouble understanding what you’re saying here, maybe you can you unpack it a little bit? I’m not sure why being able to test equivalence is important here.

1. 2

I’m not sure what intensionality or extensionality would have to do with your question either.

1. 2

I see the modern functional culture as being about values, declarative code, denotational rather than operational semantics. And a commonly cited advantage of functional programming is being able to use equational reasoning in place of testing.

Functions obstruct these goals. A function is inherently operational (the only thing you can do with it is call it), and while you can still reason equationally in the presence of functions, you can’t actually test by comparing directly for equality. Concretely I’m thinking about monads, and thinking about examples like this one. The idea of something like this is that you have some piece of complex business logic that involves redis calls, and you want to test the logic separately from testing the actual access to redis, so the business logic evaluates to some value and there’s a separate interpreter that runs it. But the testing style there is still very operational - the way you test that your value is correct is by running it in a test interpreter, which is really no different from swapping out a different implementation of an interface. Whereas what we really want to do is to canonicalize the value by some kind of (pure) “normalization by evaluation” and then compare the value for equality against what we expect the complex business logic to evaluate to in this case.

No doubt there are things I’m missing - at the very least I want to try putting this into practice before drawing sweeping conclusions. But I think the idea has promise.

1. 5

I still don’t understand your claim about functions obstructing reasoning or testing, not the relationship with monads. A monad is a data type, and something that has a monadic type is a value. You can compare those values for equality. There are functions that operate on monads, but I don’t see how they play a role in that.

The IO monad is not special in that either. You can compare two values that have an IO type. The only time when side effects, and the ordering of them, becomes relevant is when you run the program that that monad represent.

In which point are functions interfering and what is the alternative that doesn’t?

And what does it have to do with the question of whether haskell (or similar languages) support the style described in the post better than more widely used languages?

1. 1

Apologies, I missed a word: I was thinking specifically of free monads, which can’t be compared for equality without executing, because they contain functions. A lot of other monads have the same issue, but of course there are some monads like `Writer` that you can just compare directly for equality. I’m thinking the “free monad + interpreter pattern” for Haxl-like operations would be better replaced with some other (free?) structure that could be compared for equality.

It affects language choice because Haskell has a strong tradition of anonymous functions where other languages might prefer callback objects, and of using function composition fairly freely. In a traditional-OO “command pattern” approach to the same problem you would be unlikely to allow arbitrary functions in the “command”. One of the things I want to try in my native Scala (particularly with the forthcoming SAM support) is using only identified `object`s or classes that implement function interfaces.

1. 3

free monads, which can’t be compared for equality without executing, because they contain functions.

Free monads do not (necessarily) contain functions. And you can define equality. See here

As a simple example, too, consider `Free Identity`. It’s morally equivalent to `Identity` which clear equality and even structurally equivalent to just `(Int, _)` which is also clearly equatable.

1. 1

Free monads do not (necessarily) contain functions. And you can define equality. See here

Hmm. In the implementation I’m used to, `2.point map {_ + 2}` is `Suspend(2, {x => Immediate(x + 2)})` which wouldn’t necessarily even compare equal to itself, never mind to `4.point` i.e. `Immediate(4)`. I guess Haskell being lazy obviates the need for trampolines and so allows you to “see through” suspend steps, but as soon as you have a “command” like `Get` (in `Reader` even) you can’t compare for equality “past” that (unless the topology of what you’re reading allows you to compare functions from that).

1. 6

Sure, trampolines break you as do functors with functions. Neither is really critical to a free monad, though.

I think you’re fighting a losing battle avoiding functions everywhere. Having computationally reasonable equality is nice (functions have computationally unreasonable equality, of course) but avoiding it means (a) representing binding syntactically (b) having to quotient off your syntactic/structural equalities to the semantics of whatever language you’re building. Those steps have their own significant expense.

2. 3

Note that equality in OO Commands is not equivalent to equality of the functions they represent, it is just equality of the objects which in turn would be comparing pointers or whatever the `equals` method do. It is possible to have two commands doing exactly the same but comparing as different (how to depends on the concrete implementation of equals). Whether this improves reasoning about the programs is not so clear to me, as we are deviating from mathematical concepts into implementation specific details, which in my view is a more difficult arena to move on.

If comparing functions were something you really needed, the same trick can be applied in any other language. For haskell you could just have some indexed structure as function registry and pass functions indirectly as positions in the registry, thus emulating OO style. What that solves is still out of my reach, sorry :)

Note also that, mathematically speaking, equality of functions is undecidable, so any way of comparing functions that any language implements is necessarily an approximation to it. The point being, again, commands are a “bastardised” version of functions with some apparent properties that can be misleading

1. 1

Note that equality in OO Commands is not equivalent to equality of the functions they represent, it is just equality of the objects which in turn would be comparing pointers or whatever the equals method do. It is possible to have two commands doing exactly the same but comparing as different (how to depends on the concrete implementation of equals)

With a correct implementation of equals that’s not true. If you can normalize the command then you can compare it for true semantic equality. E.g. if your command is an `Int => Int` expressed as a tree of `+` and `*` operators and constant values, then you can put that into canonical form as a polynomial and then compare for actual equality by comparing the coefficients.

I’m not advocating using a table of pointers. I’m advocating eschewing functions in favour of data structures that can be evaluated as functions but which have additional structure on top of that.

1. 2

With a correct implementation of equals that’s not true. If you can normalize the command then you can compare it for true semantic equality. E.g. if your command is an Int => Int expressed as a tree of + and * operators and constant values, then you can put that into canonical form as a polynomial and then compare for actual equality by comparing the coefficients.

That is only true for a constrained set of possible operations. And also is true for any language, there is nothing in, e.g., haskell that prevents you from representing expressions with + and * as a tree. If you want to have a turing complete representation of commands (or functions), my claim still holds (in all languages).

Sorry if I sound stubborn, I am trying to understand what is the essence of the problem. I still don’t see what property do objects have that solves the problem of function equality.

1. 2

And also is true for any language, there is nothing in, e.g., haskell that prevents you from representing expressions with + and * as a tree

Sure - all languages are equivalent. But at the same time, syntax and culture guide you down a particular path. I think there’s a place for a language that has Haskell-style typing but guides you towards representing more as data structures rather than the liberal use of anonymous functions that Haskell encourages.

If you want to have a turing complete representation of commands (or functions), my claim still holds (in all languages).

Sure - I’m thinking about cases that don’t need to be turing complete (which I think is most programs really).

1. 1

Ok. I misunderstood the scope of your claim then.

I am still not convinced about your idea of favouring constrained, comparable operations, but at this point I think is a matter of opinion.

1. 1

I’m not convinced either! It’s an idea I want to investigate, not something I’ve got working yet.

2. 1

If I have data structures `f1` and `f2` where `f1` represents looping over an array and calling `x` on each element then looping over the array again and calling `y` on each element, and `f2` represents the same thing except `y` is applied first, then `x`, and `x` and `y` are commutative, what is the canonical form that lets `f1` and `f2` compare as equal?

1. 1

A data structure containing a set `{x, y}`. (Ideally in a language that won’t let you iterate over a set unless you can prove that the operation you’re doing is commutative).

2. 3

Scala can do everything Haskell can and more

That’s a strong claim and not one believed by anybody I’ve known that knows Scala and Haskell very well.

(and I find the way Haskell hides the distinction between a value and a thunk for a value to be a much more important inequivalence)

Hide is an interesting way to put it, since you can reason about when something will be a thunk or a value pretty easily in GHC Haskell.

And OCaml or F# can do the same if you’re willing to use modules to replace HKTs.

Replacing HKTs isn’t the main problem with using ML Modules to do something like what Haxl is doing.

I am slowly trying to put together a blog post with the thesis that functions are the biggest problem with modern functional programming, because you can’t compare functions for equality.

What relevance do intensionality or extensionality have to @apy’s question?

I think we may see a move towards free objects that make equational reasoning possible.

Equational reasoning is possible with Haskell code without resort to `free`. Free objects, for the purposes of a working programmer, are a DSL pattern, not a remedy to all of the ills of PL design.

If you can form an actual value - not a monadic value that might be hiding Turing-complete computation - to express your interpreter command, then it’s much easier to implement a reduction to canonical form (eliminating e.g. redundant fetches of the same data) before interpreting.

This is the code-as-data point Hickey made years ago. It’s a good tactic to have in your bag, but not at all appropriate for what Haxl does nor does it seem relevant to @apy’s question.

3. 1

It seems like this is only possible in a few languages (maybe only Haskell, even).

It’s hard to do in other languages, and it’s rarer. You absolutely can write reliable code in dynamic languages (see: Erlang, Clojure) but you need agreement among the developers that it’s important. You also need to invest a lot more effort into testing for dynamic languages, and coming up with good tests is hard, especially when you consider that a compiler can find the bulk of the random/stupid bugs (which is what most bugs, and even most costly bugs, are) in seconds.

I’m going to go out on a limb and say that Haskell is part of the success here, but it’s also that the developers have the autonomy and resources necessary to do things right. Those factors are intercorrelated, of course. If developers have that autonomy, they might choose to use different tools (e.g. OCaml, verified C) for application-specific reasons, but they probably won’t be slapping together enterprise-style Java classes.

1. 3

I was thinking more about what languages it’s actually possible to abstract away so much incidental complexity. Doing that in Java quickly becomes really leaky, IME. Even Ocaml has trouble here (hard to wrangle mutability)

2. 1

abstracted away the significant incidental complexities of data storage, concurrency, ordering, and composition

I may be naive but I am not sure how plain old SQL couldn’t be said to abstract these things away. A SQL select is merely a request for data having a particular logical SQL (SQL itself being a logic language, though naturally problematic compared to an ideal logic language).

1. 1

It does, but it’s also not turing complete, which Sigma is.

1. 1

Why is that a problem?

SQL has a variety of extensions that can solve many/most practical problems.

1. 1

It’s a problem because it’s not the context of what’s being discussed with Sigma, which is a turing complete solution that still manages to hide away a lot of incidental complexity.

2. 1

given N functions running independently that may want to access data sources { D1, D2, … } concurrently, where access to D is at an arbitrary place in N and can take appreciable time, the standard way of handling it is to abstract access to D behind a cache with either function, function-group, or preset time lifetime. The only thing that I figure you get from reordering the cache fills to the front of a function or function-group is that none of the functions start until the cache is filled, which is irrelevant in the case of pure functions. Every language can employ a cache in this way. Some are less great at what happens when many functions are simultaneously waiting for a cache to fill, because sometimes running a function is relatively heavyweight; and sometimes it’s not entirely clear when you want to validate the cache; and keeping an explicit cache might make you mad operationally; but those are orthogonal to whether or not a language has the possibility. IMO.

1. 1

This is not about N functions running independently, though, from how I understand Haxl, which could be wrong. It’s actually taking a piece of code that is expressed in a sequential looking way and reorganizing it such that data access can be batched (not just cached) for latency. And it looks like natural Haskell code because all of the “magic” is outside of the block one is writing. Doing the same in Java, for example, will produce very non-Java looking code. So that’s really what my question is about, not if I could brute force something into working, but can I do it with the same elegance in removing all of the complexity of what is happening underneath form the user.

1. 2

Any idea why this was ever included in Python 2 as a ‘feature’?

1. 5

There are other languages that also define a total order for their terms. Erlang for example: http://erlang.org/doc/reference_manual/expressions.html#id80214

I this particular example, I blame more the lenient error handling. That an undefined variable has length, or is equivalent to the empty string are much more serious WATs than having a comparison operation that works in the whole term space. At least that is defined and has sound properties.

1. 4

Though I agree, I’ll also say that those are really django WATs, not Python WATs.

1. 4

Can we please get rid of these language specific package managers??

1. 9

No. The package managers of distributions have many disadvantages, the biggest one being the missing support for virtualenvs. Deployment of Python code is so much easier when you completely ignore the distribution’s package mangement. Python Wheels are also cross platform (as long as they’re pure Python packages), Windows and OS X included.

1. 4

Nix actually solves these problems quite well and even better because it can have packages from any language. It’s a shame that we have developers spending more time on yet another Python package manager (and every language community is doing this, even Ocaml, my preferred language) rather than banding together and using a package manager that solves the problem once and for all for everyon.

1. 2

Nix doesn’t seem to support Windows, so there is one big use case that it doesn’t solve.

1. 3

I think it’s safe to assume that the effort involved in getting Nix to work on Windows is less than the effort involved in every programming langauge getting its own package manager that has to run on all the same platforms.

But it doesn’t have to be Nix, I just want everything in the same manager.

2. 3

No. The package managers of distributions have many disadvantages, the biggest one being the missing support for virtualenvs.

This is not a fundamental problem of packaging. It is maybe a problem of those systems (at least deb and rpm) being quite old and far from perfect. That doesn’t mean that packaging has to be solved for each and every single technology (pip, gems, melpa, cpan, ctan, opam, hex, npm, mvn, hackage, … ad infinitum).

All those are versions of each others, and all of those share good things and bad things that are specific to packaging, not languages.

As mentioned repeatedly here in lobsters, it seems to me that nix solves most, if not all, of the issues one can find when using deb or rpm in a generic way. Even if it didn’t I still can’t see a good reason to invest in solving the packaging problem specifically for every language.

3. 1

Hell, I’d settle for not requiring packages at runtime. Preferably not in a way that also has an entire OS tag along.