I find interesting that this Haskell implementation of the Timsort sorting algorithm exhibits the same bug as Python and Java implementations, despite Haskell having a more advanced type system. This is a reminder that type checking is not a magic bullet, and is hardly enough to prove a program correctness. Formal proofs and whole program static analysis are still necessary, with or without an advanced type system.

Using a more powerful type system like Idris, you could rewrite the QuickCheck properties to types and then prove them. Type checking may not be a magic bullet but are actually perfect for solving this problem.

1/ How is Idris different from KeY (the formal verification tool used to find the bug in Timsort: http://en.wikipedia.org/wiki/KeY)? I just know that Idris is a pure functional language with dependent types that tries to blur the lines between general-purpose programming and theorem proving, but my knowledge stops here :)

2/ What would be the pros and cons of using Idris instead of KeY to verify an implementation of Timsort?

Pro: verification in the same language as the main code, so any Idris tools would understand the proof and e.g. preserve it when doing automated refactoring. Con: Idris probably has less tooling available than Java.

Pro: verification in the same language as the main code, so any Idris tools would understand the proof and e.g. preserve it when doing automated refactoring.

As far as I know, the JML (Java Modeling Language) specifications used by KeY are written directly into the Java source code files, as comments, which implies they can be preserved too.

Con: Idris probably has less tooling available than Java.

Does the fact that Idris is a total language limit in some way the kind of algorithm you can write and/or constrain the way you write them, compared to an imperative language like Java or even a pure functional language like Haskell?

As far as I know, the JML (Java Modeling Language) specifications used by KeY are written directly into the Java source code files, as comments

Which means most Java tools will treat them as simple comments, and not e.g. update them appropriately when performing automated refactoring.

Does the fact that Idris is a total language limit in some way the kind of algorithm you can write and/or constrain the way you write them, compared to an imperative language like Java or even a pure functional language like Haskell?

There are escape hatches even in Idris, so the difference is more that you’re explicit about which parts of your program contain assumptions. The difference between a total language and not is that a non-total language allows you to invert functions when you can’t prove the inverse exists. In practice it would only ever come up if you wanted to write a program that relied on some statement that you couldn’t prove but nevertheless thought was true. There are plenty of those in mathematics (e.g. a function that accepts an even number >= 4 and returns two primes that sum to that number), but it’s hard to think of one that would be useful in programming. But if you did come up with such a thing, in Idris all that would mean is that you’d have an explicit “this program assumes Goldbach’s conjecture” somewhere in your source.

You can’t get around doing the work somewhere if you want a proof - and even a language like Haskell will allow you to not prove all the properties of your program. But in something like Haskell you can do the work in the type system rather than in magic comments. To my mind that’s an improvement.

Yes, I agree that if you want a proof, you have to do the work somewhere. My point is that, in the specific case of this implementation of Timsort, it was not possible to do the “work” in the type system, even in Haskell.

I think it’s more likely that the author wasn’t bothered enough to do it at all than that it was impossible to do it in the type system. I don’t know Haskell but I could take a stab at porting the proof to Scala if you think that would be enlightening.

Are you saying that you think it would have been possible to “refine” the types enough to identify the bug, by just using Haskell’s type system, and without any external proof (like the verification system KeY used to identify the bug in Python and Java)?

But if you’re right, it would imply that the bug in Timsort could have been found by simply implementing the algorithm in Scala with a set of types restrictive enough to exhibit the bug? I’d be curious to see that.

The SKI calculus has been implemented using the Scala type system, which means it’s turing complete. I have no idea how hard it would be to implement something of this complexity using it, however.

So it’s a subtle point, but Turing completeness is true of several typesystems, including Haskell’s and Common Lisp’s, but that doesn’t mean they can necessarily catch this. Dependent types are not more expressive computationally, but they allow more categories of input into typing decisions, and are therefore able to express constraints that “weaker” systems cannot.

Rewriting Timsort in terms of SKI calculus is certainly possible since the SKI calculus has been proven Turing complete, but the code would be a lot harder to write and maintain than in Java, Scala or Python. Moreover, as @lrene wrote, it doesn’t necessarily means the bug would have been found after such a rewrite. It probably explains why people use verification tools like the KeY project.

I find interesting that this Haskell implementation of the Timsort sorting algorithm exhibits the same bug as Python and Java implementations, despite Haskell having a more advanced type system. This is a reminder that type checking is not a magic bullet, and is hardly enough to prove a program correctness. Formal proofs and whole program static analysis are still necessary, with or without an advanced type system.

Using a more powerful type system like Idris, you could rewrite the QuickCheck properties to types and then prove them. Type checking may not be a magic bullet but are actually perfect for solving this problem.

Interesting. Two questions on this topic:

1/ How is Idris different from KeY (the formal verification tool used to find the bug in Timsort: http://en.wikipedia.org/wiki/KeY)? I just know that Idris is a pure functional language with dependent types that tries to blur the lines between general-purpose programming and theorem proving, but my knowledge stops here :)

2/ What would be the pros and cons of using Idris instead of KeY to verify an implementation of Timsort?

Pro: verification in the same language as the main code, so any Idris tools would understand the proof and e.g. preserve it when doing automated refactoring. Con: Idris probably has less tooling available than Java.

As far as I know, the JML (Java Modeling Language) specifications used by KeY are written directly into the Java source code files, as comments, which implies they can be preserved too.

Does the fact that Idris is a total language limit in some way the kind of algorithm you can write and/or constrain the way you write them, compared to an imperative language like Java or even a pure functional language like Haskell?

Which means most Java tools will treat them as simple comments, and not e.g. update them appropriately when performing automated refactoring.

There are escape hatches even in Idris, so the difference is more that you’re explicit about which parts of your program contain assumptions. The difference between a total language and not is that a non-total language allows you to invert functions when you can’t prove the inverse exists. In practice it would only ever come up if you wanted to write a program that relied on some statement that you couldn’t prove but nevertheless thought was true. There are plenty of those in mathematics (e.g. a function that accepts an even number >= 4 and returns two primes that sum to that number), but it’s hard to think of one that would be useful in programming. But if you did come up with such a thing, in Idris all that would mean is that you’d have an explicit “this program assumes Goldbach’s conjecture” somewhere in your source.

You can’t get around doing the work

somewhereif you want a proof - and even a language like Haskell will allow you to not prove all the properties of your program. But in something like Haskell you can do the work in the type system rather than in magic comments. To my mind that’s an improvement.Yes, I agree that if you want a proof, you have to do the work somewhere. My point is that, in the specific case of this implementation of Timsort, it was not possible to do the “work” in the type system, even in Haskell.

I think it’s more likely that the author wasn’t bothered enough to do it at all than that it was impossible to do it in the type system. I don’t know Haskell but I could take a stab at porting the proof to Scala if you think that would be enlightening.

Are you saying that you think it would have been possible to “refine” the types enough to identify the bug, by just using Haskell’s type system, and without any external proof (like the verification system KeY used to identify the bug in Python and Java)?

Yes.

But if you’re right, it would imply that the bug in Timsort could have been found by simply implementing the algorithm in Scala with a set of types restrictive enough to exhibit the bug? I’d be curious to see that.

The SKI calculus has been implemented using the Scala type system, which means it’s turing complete. I have no idea how hard it would be to implement something of this complexity using it, however.

So it’s a subtle point, but Turing completeness is true of several typesystems, including Haskell’s and Common Lisp’s, but that doesn’t mean they can necessarily catch this. Dependent types are not more expressive computationally, but they allow more categories of input into typing decisions, and are therefore able to express constraints that “weaker” systems cannot.

Rewriting Timsort in terms of SKI calculus is certainly possible since the SKI calculus has been proven Turing complete, but the code would be a lot harder to write and maintain than in Java, Scala or Python. Moreover, as @lrene wrote, it doesn’t necessarily means the bug would have been found after such a rewrite. It probably explains why people use verification tools like the KeY project.

I would love to see that!

Would also be interesting to hear from someone verse in Haskell about an alternative, bug-free implementation.