Given that formal verification is expensive and does not produce components that can be composed into provably correct systems without additional work, one may despair and claim that the problem seems hopeless, distributed systems are awful, and we can’t have nice things.
. Formal methods start to look a lot cheaper when you consider the alternative is your entire company failing to scale due to lost writes that you can’t explain.
In my opinion the problem is time to market, period. Comments to the effect of cost generally come from companies that have never attempted it.
Ms. McCaffrey’s most recent employers have included Twitter and Microsoft. At Microsoft, she seems to have been part of a mixed team consisting of traditional industrial ops staff and Microsoft Research people, who came up with Orleans to serve the business-side need (http://research.microsoft.com/pubs/141999/pldi%2011%20submission%20public.pdf). So given that her Erdos-number equivalent to one of the most esteemed formal methods labs in industry is either 0 or 1, I’m inclined to give her story a tremendous amount of weight.
Additionally, her statement is simply obviously true on the face of it, if you’re familiar with the current state of the art. For example, there’s a proven Raft implementation in Coq (https://github.com/uwplse/verdi). Please demonstrate how to make use of this in a language that is not Coq. Or, if you like, please demonstrate how to adapt a program in the language of your choice to TLA+. She’s right; we’re in early days; and although one day we’ll get there, it’s easy to despair about what we have.
In a previous life I worked on a proven paxos implementation that did c&x86 asm codegen with massive piles of isabelle. It ended up taking less time then the initial paxos implementation that wasn’t proven, and lead to serious product improvements.
The fact that the vast majority of proven distributed systems code is not public or published means I cannot really point directly to my experiences.
See my reply below to catiem20 for the mea culpa, however.
Oh god now I’m reading through her previous work.
Embarrassed me is embarrassed.
The citation you’re looking for is in the composition section, which precedes this statement. 1. Alvaro, P., Rosen, J., Hellerstein, J. M. 2015. Lineage-driven fault injection; http://www.cs.berkeley.edu/~palvaro/molly.pdf.
Thank you for the follow up, my misreading. I think I missed a core part of the argument based on combining proved systems != a proved system, which the paper elucidated for me and I’m not sure how I could read this article so poorly. I’ve worked to prove distributed systems at past jobs, but they are largely single proprietary implementations of which only a portion is necessarily proven.
I guess I push back blindly on anyone saying provability is “impossible” or the additional work isn’t worth it based on my previous experience, and I should have been more detailed in my reading of this article.
If you are the author based on your user name, I appreciate you taking the time to comment.
yup no worries, and I am the author, I agree with you if you are building core infrastructure doing Formal Methods is a great idea. However, from what I’ve seen in industry and the proliferation of micro-services makes this near impossible from a cost/time perspective for general applications, thats why I wanted to provide some other techniques for increasing confidence.
Incidentally, I stumbled upon the MetaCLF project that addresses “Formal Reasoning about Languages for Distributed Computation.” Unfortaunately, the last publication is from 2014 and the source code has been in limbo since 2013.
See also Chapar: Certified Causally Consistent Distributed Key-Value Stores, which I discovered thanks to Lobsters.
Here’s a good paper about creating a formal model for Dropbox (by Benjamin C. Pierce): PDF.