Great advice! Thinking about your three points made me think of the following, I hope you find it useful.
In your comment you linked, I think the list of all the projects is extremely useful to see what’s going on. Related to your point (1), I think is that each of these projects is impressive because they built something within a theorem prover, e.g., no one would be impressed if someone wrote a C-compiler in C++, but verifying non-trivial theorems about a C-compiler is impressive.
Perhaps a better way to look at the problem of “how to get into formal methods?” is to look at it as “what do you want to get from your efforts?”.
This, I believe, expands on your point (2): if you want to reduce the number of bugs in your code, perhaps just fuzzing is the way to go. This probably has the highest return-on-investment for existing projects. I know this is probably no longer “formal” methods but I would consider fuzzing at least a distant cousin. Extending beyond this are more heavy-weight techniques like bounded model checking, abstract interpretation and concolic execution: these again are not full-blown verification but provide some formal guarantees. These are maybe stepping stone before the lightweight stuff mentioned in your comment, or perhaps are around the same area.
If you want to learn how to encode proofs into your type system, then maybe the first few chapters for Software Foundations or some other theorem prover textbook would be great: you won’t build anything “real”, but perhaps will have some extra appreciation for types. This relates to your point (1).
Related also to (1) is the difficulty in formalizing “correctness” of a program: often writing the specification for a program is more difficult than writing the program itself. A contrived example is, is specifying the correctness of int max(int a, int b). Especially for a “real engineering” problem where the problem itself is not very well specified, coming up with a notion of correctness is in and of itself difficult. Stuff like using the runtime sanitizers (ASAN, UBSAN, MSAN) makes encoding some notion of correctness into your existing tests easy.
Anyway, I feel this comment was a bit rambling but I perhaps the ideas will be useful to you: I agree that equating formal-methods with dependent types is maybe too far, and wanted to show some of the other techniques for reducing the number of bugs, which sometimes are not officially included in “formal methods”. (As you can maybe tell, my own interests are in fuzzing/bounded-model checking/symbolic execution/abstract interpretation instead of dependent types ;)
To illustrate then, here’s the max function written in Whiley:
function max(int a, int b) -> (int r)
ensures (r >= a) && (r >= b)
ensures (r == a) || (r == b):
//
if a >= b:
return a
else:
return b
(try it out for yourself at http://whileylabs.com)
But, yes, in this case the specification is as long as the body itself. Here’s another example: the well-known indexOf function:
function indexOf(int[] items, int item) -> (int r)
// If valid index returned, element matches item
ensures r >= 0 ==> items[r] == item
// If invalid index return, no element matches item
ensures r < 0 ==> all { i in 0..|items| | items[i] != item }
// Return value is between -1 and size of items
ensures r >= -1 && r < |items|:
//
int i = 0
while i < |items|
where i >= 0
where all { k in 0 .. i | items[k] != item }:
//
if items[i] == item:
return i
i = i + 1
//
return -1
Again, overall, probably as much spec stuff as code. But, in the end, is it such a problem? Think about the lines of unit test code you would write to test this function. Likewise, the lines of documentation you would write for it anyway.
Thanks for the thorough response with the examples.
In retrospect the max example was probably not the best for the point I was trying to make: I didn’t mean to say that “writing the spec is too hard:” I definitely agree, especially with comments+tests, that it’s preferable to write the spec and prove it’s correct rather than write tests.
But, what about cases where it’s hard to determine what correct even means? For example, what if you wanted to write a spec that captures an arbitrary webpage loading? You can’t look at stuff like exceptions being thrown and not caught since so many webpages do this without them being “incorrect.” This is probably a moot point when talking about verifying stuff since it implies you need to know the specification first: so, what I was trying to say is that often actually coming up with the spec is hard, let alone proving its correct, thus making the problem even more difficult.
what if you wanted to write a spec that captures an arbitrary webpage loading?
Right, I see. Yeah, so this is really about whether or not we always try to make our specifications “complete” in some sense. The key to remember here is that we can still get value from it without specifying absolutely everything. In your example, it could be too hard to specify that. And, that’s fine! Or, we could try to capture it with e.g. a boolean flag to indicate when its loaded or not. Then we can specify things over this flag (e.g. cannot call this method unless flag is true, etc). That’s a kind of proxy approach.
Other things like e.g. “this page must load within 200ms” are also pretty hard to specify. So, it’s not a silver bullet … but it can still be useful (much like types are).
Sorry, just to add to that. My pet peeve (which I think is relevant to the original post) is that formal methodists do often seem to get carried away with always trying to specify everything. I personally think we need to move away from that mindset to a more middle ground.
I’m gonna start calling myself a formal methodist from now on.
Also, I completely agree. This is my favorite “don’t get carried away” piece on verification.
I’m gonna start calling myself a formal methodist from now on
You’ll like this quote then:
it is clear to “Formal Methodists” like ourselves, for some of whom formal methods can be something of a “religion”, that introducing greater rigor into software development will improve the software de- velopment process, and result in software that is bet- ter structured, more maintainable, and with fewer er- rors
https://pdfs.semanticscholar.org/60d8/aeb3f1c6a90b05ade53b638350463ae6d357.pdf
Oh wait, that’s good for a separate reason.
https://www.microsoft.com/en-us/research/publication/tlz-abstract/
I told Ron Pressler (pron) that his vision of future formal methods was the past. That is, verifying the algorithms, not code, in formal specifications w/ combo of model-checkers and provers depending on utility. He said TLA+ was superior because it was basic mathematics: set theory and (iirc) first-order logic with concurrency/time support. I told him he basically just described Z + CSP/SPIN used for high-assurance systems back in the day. The work proceeded to go deeper than algorithm level due to the problems at code, compiler, and assembly levels they ran into. He’s right there’s still tons of mileage to get out of that level as prior work showed with low defects.
Then, I kept telling @hwayne that one might have substituted TLA+ for SPIN combined with Z for its benefits on sequential. Otherwise, emulated same benefits in TLA+ itself if capable (not specialist enough to know). Then, your link has the TLA+ inventor himself doing exactly that. Wow. Then, the audience has no interest in it. Z’s tool support combined with TLA+‘s ease of use could’ve sped up deployment of lightweight methods that would’ve drawn more funding to them after successes came in like with ASM’s, SPIN, and Alloy. And they had no interest. Unbelievable…
Well, I appreciate you drawing it to my attention since it was neat to see he tried it. He was right that he did have the benefit of no Z baggage. Maybe it was for the better they didn’t listen and he made TLA+. Had Z really taken off, the opposite might have been true as in Worse is Better effect of hitching a ride on a bandwagon.
Then, I kept telling @hwayne that one might have substituted TLA+ for SPIN combined with Z for its benefits on sequential.
I don’t remember you telling me that and now I feel really bad for forgetting. Sorry dude :(
Regardless, though, I agree. I’m a huge fan of TLA+, but I recognize that it’s in part due to sentimental reasons. TLA+ is the first formal method I learned and also how I’ve managed to contribute back to tech. I wrote Learn TLA+ because there weren’t good TLA+ resources but also because I didn’t really know about any other formal methods techniques when I started. If I started with, say, Alloy, I probably would have seen how much harder TLA+ was to learn and how much smaller the community was, gone “nah”, and never learned it.
I still think TLA+ is amazing, but now I recognize that 1) it’s “the best tool in a lot of cases” and not “the only tool in all cases” and 2) most of the difficulty using it is UI/UX, not essential complexity. It’s a rough diamond that needs to be cut.
Alternatively, someone may be inspired by it to make something better. That’d be awesome, too.
I don’t remember you telling me that and now I feel really bad for forgetting. Sorry dude :(
I tell people hear about a lot of stuff. Forgetting something is inevitable. No worries.
He was right that he did have the benefit of no Z baggage.
Right, and I think that’s a big deal. Z is very big and somewhat bloated I think. Even attempts to mechanise the language were challenging, with tools like CZT appearing to help. That’s why B spun out with its tool support and then eventually event B.
Hey,
What book is that chapter you’ve referenced from? I found another chapter:
http://www.cypherpunks.to/~peter/05_verif_implementation.pdf
… ?
Peter Gutmann. He’s super nice! I emailed him gushing about how much I loved that essay and he didn’t even call me weird for reading his PhD thesis for fun.
ok, no worries I figured it out:
https://www.cs.auckland.ac.nz/~pgut001/pubs/thesis.html
Another New Zealander no less!!
It was Peter Guttman. I eventually give it to people to try to keep them from being overly optimistic about what formal verification can accomplish in general or alone. Of course, high-assurance security required a lot of assurance activities that all acted as a check against each other. Formal verification itself was done only on the small, central component enforcing the security policy. As CompSci operates in silos, many in formal methods community thought they’d solve everything that way much like we saw with OOP and unit testing zealots.
Old saying goes, “Everything in moderation.” Just seems more true the longer I’m around. I do dig the whole-stack efforts like CLI, Verisoft, and DeepSpec, though. We can keep some teams trying to verify all the core or at least practical stuff while the rest of us exercise a bit more sense for more productivity. I expected trickle down effects to kick in eventually which were delivered at least with CompCert and CakeML given usefulness in security-critical C and CompSci tooling (esp compilers or extracted code). Push-button simple eliminating whole classes of errors. Probably should count those checking SCOOP concurrency model or certified WCET analyses given obvious benefits of both in various sectors.
So, final outlook is formal verification should be usually done to simplest, most-critical parts of probably hardly any software projects. Most do stuff like DbC w/ generated tests in memory-safe languages. That with the modifier that even those people can benefit if some teams push heavy stuff as hard as possible.
That was the position of reviews in Cleanroom methodology. They kept everything structured for easy analysis but level of formality depended on what they actually needed.
On top of it being in a theorem prover, most of them are done in a “LCF-style” theorem prover designed to make it hard to screw up the proof process. Both Isabelle and Coq also have work done on foundations where the verifiers or extractors can be verified themselves to some degree. So, people can focus more on the specs and extracted code than with a prover not designed that way. If the specs are peer-reviewed and right, this leads to a level of confidence that’s hard to achieve without formal methods. I dare say near impossible for at least optimizing compilers based on empirical evidence.
Far as reducing bugs, you first have to know what your code is supposed to do. That’s why I push behavioral contracts a la Design by Contract. This can be combined with methods such as fuzzing to help you spot where it broke with laser precision. Those same contracts can also be used for generation of tests (e.g. Eiffel) or verification conditions (e.g. SPARK). Harder to do it the other way around. Hence, my recommending DbC combined with automated analysis and tests. That’s not a substitute for review or manual tests where appropriate (esp hard to specify stuff).
The problem you mention on difficulty of specification is real. My shortcut as above is to just tell people to switch to tests for anything that’s hard. They can Google prior work if they want to see if anyone did it and how. As minimax was suggesting, might mention something about this when describing tradeoffs. Far as abstract interpretation, Astree Analyzer was one of my favorite results of software, assurance field given prevalence of C in embedded. I wonder what it would take for a CompSci team to crank out a FOSS version with equivalent capabilities. Far as dependent types equivalence, it’s definitely too far since people who give up on Coq and Idris often like TLA+, Alloy, SPIN, or even Z.
There’s an interesting similar line of work formalizing the C++11 memory model which includes a tool to explore possible behaviors (relative to the stress tester here it systematically explores possible behaviors based on a the model).
Here’s some links for those interested:
The artifact evaluations the author talks about is a step in the right direction but it isn’t perfect. First, I don’t believe the submitted code is required to be published publically, its just run by the committee. And second, there’s nothing stopping an unethical researcher from submitting an artifact which just does sleep(8000); printData(); return 0;.
I’m not sure what a better solution would look like.
http://blog.regehr.org/archives/1292 is the result of applying tis-interpreter to SQLite.
I’m lucky enough to be “that guy who always asks questions” in class. I justified it for two reasons: (1) it’s the instructor’s job to teach me, and (2) it’s almost disrespectful not to ask questions. Elaborating on (2), as a presenter/instructor the person asking questions is both paying attention to what you’re doing and showing that your presentation is at least somewhat intelligible.
I was always curious talking to my friends after class and hearing them say “I have no idea what we were just taught.” My (naive) theory is there are a few different archetypes of quiet students.
In my mind, it’s a problem that the “dumb” person is quiet and that many questions get asked one-on-one: the questions help everyone (including the instructor).
The argument that questions slow down the class and prevent material from being covered is nonsense: the question indicates the material was not thoroughly presented; if there are so many questions such that all the material cannot be presented then something needs to be changed. (This does get into a gray area as to when questions are off-topic: the instructor needs to take some discretion and discuss things offline with individuals.)
I don’t have any experience with Frama-C, but valgrind is, as far as I know, a runtime analyzer. It requires hitting code paths to validate them according to whatever metrics it has. Frama-C is a source could analyzer, which doens’t require running the source code.
To extend on this, to the best of my knowledge, Frama-C is largely trying to reason about value properties in the program, for example deducing that “on line 5 the variable x >= 5.” Such a statement holds on all paths for all program inputs (i.e., it is always true in the program). This allows for the user to check if the program satisfies some specification. In contrast, valgrind can only generate information about the runtime paths which it observed. Generally, something like Frama-C is used to verify safety properties (something bad never happens) in a program whereas Valgrind is used to find bugs but not prove their absence.
Specifically, creating an algorithm to automatically deduce information which holds over all paths and over all inputs is intractable since the search space is huge. So, approximations to the program’s behaviour are used so that the calculated information contains at least all the behaviour in the program and perhaps some additional, spurious information. As a result, the analysis may produce false alarms, e.g., the analysis may say that “x >= 5” when in fact in all feasible executions of the program “x >= 10”.
However, when attempting to reason about a specification in this way, the analysis usually attempts to always find all the errors in the program. In other words, if there exists an error in the program it will be detected (along with some false alarms). In the previous example, notice how “x >= 5” implies that “x >= 10”; the analysis deduced an over-approximated fact which included all the valid program behaviour. If “x >= 5” is not sufficient enough to prove the program satisfies its specification then either the analysis or the program needs to be modified.
To make a long story short, a runtime (dynamic) analyses typically are used for bug hunting (e.g., trying to find a path through the program which causes an error) whereas a static analysis in the style of Frama-C attempts to prove the absence of errors. (Though, there are dynamic analyses which try to predict additional behaviour of the program given a dynamic trace).
A cheap oscilloscope, multimeter, and bench power supply are always welcome additions to any workbench, as well as an electronics breaedboard.
If money allows, I’d suggest a logic analyzer on top of these things. I’ve found that recording long traces of data on many IO ports to be useful for debugging, particularly for realtime systems where you do not have access to logging via printf. This lets you see when certain interrupts are triggered and how data is being processed without perturbing the timing constraints required by, e.g., the communication protocols. I imagine if the oscilloscope can write data to a file, this would be sufficient too.
Depending on your needs, some other things can also be used as substitutes for a logic analyzer. For example you can hook something up to a Raspberry Pi’s GPIO ports and run some software. If you don’t need a high sampling frequency, I’ve even used a soundcard as a cheap two-channel, 44.1 kHz voltage sampler: just wire your stuff up to the line-in, use some resistors to make sure it’s in the right voltage range, and hit record in Audacity. In my case that worked out pretty well to log bus traffic on some MIDI interfaces.
Admittedly, just throwing in a logic analyzer is a lot more straightforward.
I definitely agree. The logic analyzer I was using seemed way overpriced (hundreds of dollars) and had a just barely usable windows-only interface. I’d prefer something more a long the lines of what you described: dump the data to a text/audio file and use some additional analysis to decode, for example, the data flowing on the I2C bus and display it to the user.
Edit: While the interface was barely usable, meaning clunky, it was a great debugging tool
Depends on how low-level you want to go. Anything with GPIO and I2C will be useful, and the question is what you’d want to go for. If you don’t mind working in lower-level C, AVR has a great set of microcontrollers that are widely used and don’t require a lot of external parts. If your friend is more comfortable with python / sshing into stuff, you might be better off with an Arduino or Raspberry Pi. I would shy away from the esoteric stuff especially if your friend is starting out, you want to work with easily googlable parts that other people have used until you’re more confident. Buying random parts off digikey and discovering that the datasheet is wrong can be extremely frustrating, so it’s helpful to tread where others have tread before you.
I can highly recommend Adafruit’s shop, they explicitly design their modules to be beginner friendly with lots of documentation. If you wanted to go AVR they have nice premade boards, as well as the bigger beaglebone/arduino/raspberrypi stuff as well. They also have generic prototyping stuff, sensors, and a pretty comprehensive beginning electronics toolkit.
I’d agree that anything with GPIO and I2C to be useful. Some other things to look for depend on what your friend wants to do. For example, it may be useful to get a board with analog-to-digital and digital-to-analog converters running at a high enough sample rate if they want to interact with analog devices (like analog sensors, or making noise on a speaker). Where high enough sample rate depends on the application. Getting something with CAN or UART support would also be useful if you need these communication methods.
I started learning on PIC micro-controllers (e.g., Cerebot). I’ve found their manuals to be dense but provide enough information about all the hardware details. Probably these PIC boards are fairly similar to the AVR recommendations made by stip.
This was really cool - anyone have some good references to where I can learn more about SBV in particular and solving problems using SMT in general? (I did review the examples in the repo, and they were helpful, but I’m looking for more of a tutorial-style exposition if one exists.)
The Calculus of Computation by Bradley and Manna is pretty good overview of logic, program verification, and how a SAT/SMT solver works.
I think its great to see “combinator style” parsers; after writing my first parser using Parsec I feel bad when writing quick-and-dirty parsers in C++/Java.
However, and this may be pedantic, but I feel the claim “safe by default” is a bit strong. Seems the authors wants to say that it is memory safe (no use after free/out of bounds access) but in the absence of a safety specification for what the parser should do and a way to show the parser meets that specification I don’t think you could call it safe. Additionally, the fact that the library was fuzzed to “verify” it is safe seems to hint that it does not have a proof of the absence of such errors: such a proof is what I would call safe.
Again, this is probably just me having my own definitions of what “safe” and “verify” mean.
Again, this is probably just me having my own definitions of what “safe” and “verify” mean.
In the context of Rust, ‘unsafe’ means ‘memory safety’.
in the absence of a safety specification for what the parser should do and a way to show the parser meets that specification I don’t think you could call it safe
Well, Rust as a language guarantees memory safety as long as you don’t use unsafe, and then it’s up to you to use it correctly. nom currently uses unsafe four times, two to perform an unsafe cast, and two to copy some pointers. As long as these four uses are correct, then if nom isn’t memory safe, it’s a bug in Rust.
I know, absence of proof is not proof of absence. But outside of actual proof of correctness for the generated machine code (which exists in some specific cases, not applicable here yet), correct combinators able to withstand tools like AFL are a pretty good deal.
Great tutorial. I consider myself somewhat well versed in LLVM but I had no idea it was so easy to run a pass straight with clang.
The worst part about LLVM is it spoils you: writing analyses for languages without LLVM frontends, like javascript, because working on existing compilers seems much more esoteric. Working with Spidermonkey or V8 seems painful (though, I’d love to be proven wrong).
Interesting to see a more modern Java concurrency approach compared to what I remember from The Art of Multiprocessor Programming.
What wasn’t clear to me on reading is how the example Atom satisfies the property stated by the author:
Every operation will succeed after the finite number of tries
For example, two threads could execute the swap function.
Thread 1 Thread 2
=======================================
oldv = get()
oldv = get()
newv = swapOp(old)
cas(oldv, newv)
newv = swapOp(old)
cas(oldv, newv)
oldv = get()
oldv = get()
newv = swapOp(old)
cas(oldv, newv)
...
If this repeats infinitely then thread 1’s swap never succeeds. Did I miss something? I’m never too confident of pen-and-pencil proofs of concurrent programs.
What I always thought was a subtle design challenge for concurrent data structures is that you can have “bugs” in your API. For example, if you expose a read() operation and a write() operation on your lock-free data-structure then you open the user to do something like:
if (read()) {
write()
}
Allowing an atomicity violation to creep in.
the only exit condition from swap is a successful swap, so it has to succeed: there’s a for loop out there. This is pretty much the whole point of Atoms: to allow async swaps using the unary function.
If you want more guarantees, you can add a counter and make sure you exit after a certain amount of retries.
As regards API design: you’re absolutely right. It’s always hard. Luckily, Atom doesn’t have this problem, since old value is always explicitly passed, and it’s user’s responsibility to make sure he doesn’t leak the modifiable state.
Yeah, the author is not terribly precise here. My impression is that lock-free algorithms guarantee over-all system progress, but won’t prevent per-thread starvation. As you’ve noticed, an Atom can always starve–in particular, optimistic concurrency often does poorly under extremely high concurrency, as many threads may do work simultaneously, but only one will have its work actually be used.
I think one possible next step from using AtomicRefs together is something like STM, where you can combine several separate atomic operations into a single larger atomic operation. I’ve heard mixed things about how well it performs in general though, so I’m not sure that it’s the next step.
All the JVM-based implementations of STM that I have seen so far are using lots of locks internally, so wouldn’t say it’s a progressing step against contention.
Although what could help is to design a system the way all the writes are going through the single writer.
Locks don’t necessarily mean you suffer more from contention. If you are contended the vast majority of the time, programming with locks can be much faster than optimistic locking. In this way, programming with locks can ensure much higher throughput.
Well, if we’re really looking at implementations, java compare and swap uses LOCK before CMPXCHG on x86, so we are all sinners. I think correct implementations of STM, even if they rely on locks eventually, can’t run into deadlocks, and are guaranteed to always make system-wide progress.
Note that lock-freedom doesn’t mean you can’t use locks, it just means that at least one thread must always be able to make progress, even if individual threads starve.
Single-threading your writes is OK if you don’t have to do a lot of writes, but it’s in no way a silver bullet, and can add latency unnecessarily, especially since you need to do locked bus traffic anyway if you want to read safely.
Well done writeup, with enough detail to be interesting but not tedious. I’m looking forward to Part 2.
My impression is that their approach is largely inspired by Google’s ClusterFuzz. Projects that meet Google’s criteria can use their “OSS-Fuzz” infrastructure, but hooray for DIY if you have the resources.
I’m looking forward to the next part as well. I think this is a little bit different than clusterfuzz/ossfuzz since it is using a more cusotmized (grammar based) fuzzer, as opposed to a generic fuzzer like libfuzzer.
I’m mostly interested to see if this is really a security issue since the sanitizer reported a segmentation fault rather than a buffer overflow. Regardless, it is still a bug that they found so it is a worthy cause.
I’m also not a javascript expert but I’m wondering why the author was confused to find promise objects within the array: doesn’t mapping an async function over an array create an array of promises?
If you read the first part of this series, this was the vulnerability they used in their pwn2own exploit.