This is a very long article, but it seems kinda vapid? Or maybe that’s not the word, but having read it, I’m not sure what information I can actually use. How to reproduce their results, etc.
The early versions really sucked, too. I kept duclare’s review because it was hilarious:
“I tried infer on our standard build environment but it depends on newer libs than you get on rhel/centos7. So I moved on to the docker image but it ran out of memory while building infer. So I moved on to using the prebuilt binaries in a custom docker image running ubuntu, and after getting the deps right, I can finally run infer on a trivial one-line C program. Unfortunately the thing segfaults if I try to use it on real code.”
Far as reproducing, you can go to the website, download it, and run it on some code you have. They have a “try it in your browser” option, too. It can probably find the basic bugs. What’s really worth reproducing with benchmarks are anything involving separation logic and concurrency. They’ve been harder to deal with automatically than most types of program analysis.
“Infer targets our mobile apps as well as our backend C++ code, codebases with 10s of millions of lines; it has seen over 100 thousand reported issues fixed by developers before code reaches production. “
That’s probably a world record for static analysis.
“Overall, the error trace found by Infer has 61 steps, and the source of null, the call to X509 _ gmtime _ adj () goes five procedures deep and it eventually encounters a return of null at call-depth 4.”
@hwayne, here’s another one for illustrating the depth that formal methods can do versus what testing might have found. IIRC, that’s approaching double the steps in the Amazon example.