Main page here:
Abstract of the PDF has both impressive results and another nail in the coffin of many eyeballs argument. Even if you count their tool seeing it, fifteen years is still quite the delay between publishing code & spotting the bugs.
See also: MAYHEM, the winner of the recent DARPA Cyber Grand Challenge, and SAGE, which has been in large-scale use at Microsoft for years (more).
Sadly neither is open-source, which makes it hard to verify their effectiveness on real-world software. Comparatively dumb coverage-guided fuzzers (AFL, LibFuzzer, syzkaller, etc.) certainly have a much bigger impact on open-source software than any tool employing symbolic execution. This might be simply because noone created a tool as streamlined and robust as AFL yet, but it might also reflect limitations of the technology itself: after all, most programs are orders of magnitude more complex than coreutils-style utilities.
I’m not so confident about evaluating ‘competing’ analysis technologies by their number of vulnerabilities produced. There is enormous flexibility in fuzzers: they are frequently run without any modification to the underlying source code, the creation of incidental machinery or even any understanding of the underlying codebase. Contrawise, in a prevailing atmosphere of intense focus on a small number of programs exhaustively analysed yet still suspected to be vulnerable (Browsers, PDF Readers), it’s hard to imagine a scenario that SAT solvers and symbolic analysis don’t play an important role.
I do understand the hesitation but the auction of hyperbole is trading in both houses – security analysts panning the state of the art while ‘Academia’ (whatever that designation might imply) prophesies a fundamental shift to symbolic or even artificially intelligent analysis.
In spite of this, I’ve personally used these tools and their siblings in the service of checking real-world code. Assisting me not only in finding vulnerabilities, but also generating several exploits (as well of plenty of ordinary bugs) in complex software like Plan9’s JPEG parser and the Erlang runtime system. Erlang has been reviewed by dozens of extremely intelligent people coding in a straightforward style, fuzzers have been run on virtually every major component along with the incidental artifacts like EPMD. Yet still, I found 4 (1 stack, 3 heap overflow) remotely exploitable vulnerabilities and all of which I can say with confidence could have never been found by a fuzzer.
It would have been impossible to comprehensively analyse even modestly complex software just as your comment suggests, but by relying on my own intuition about the weaknesses of software and the people who wrote it, along with the perseverance and extraordinary speed of my machine I could semi-exhaustively search for problems.
Yes, it was a huge hassle. The most flashy and visible symbolic execution tools turn out to either not work or be extremely special case, while the stuff that does (like KLEE) is like translating Greek. I had to learn enough about SAT solvers to get a Knuth Check (literally). I’m confident that none of this is in any way ‘essential to the art’ – All of this could be trivialized by a visionary team who decides to tackle this problem in the public domain.