It’s a SCM problem. David A. Wheeler has the definitive write-up covering the various angles of it:
I’m just throwing a few quick things out rather than being thorough. There’s several logical components at work here. There’s the developers contributions that might each be malicious or modified. The system itself should keep track of them, sanitize/validate them where possible, store in append only storage, snapshots in offline media, and automated tooling for building/analyzing/testing. It’s advantageous to use a highly-isolated machine for building and/or signing with the input being text over a safe channel (eg serial).
In parallel, you have the fast box(es) people are actually using for day-to-day development. The isolated machine w/ secure OS periodically pulls in the text to do the things I already described with whatever sandboxing or security tech is available. Signing might be done by coprocessors, dedicated machines, smartcards, or HSM’s. The output goes over a separate line to separate computers that do distribution to the public Internet with no connection to development machines. Onboard software and/or a monitoring solution might periodically check the sources or binary hashes each are creating to ensure they match with ability to automatically or with admin approval shut off distribution side.
Simply having updates and such isn’t good enough if the boxes can be hacked from the Internet. Targeted attacks have a lot of room to maneuver on that. The development boxes ideally have no connection to the deployment servers or even company web site. One knowing the latter can’t help hackers discover the former. Those untrusted boxes just have a wire of some sort periodically requesting info they handle carefully or sending info they’ve authenticated. The dev boxes would be getting their own software using the local Internet or off the wall wifi’s if person is really paranoid. Also hardened.
It was also common practice to have separate VM’s or especially hardware w/ KVM switches for Internet or personal activities. As in, the software development was completely isolated from sources of malice such as email or the Web. Common theme is evil bits can’t touch the source, build system, or signing key. So, separation, validation, POLA, and safe code everywhere possible.
It’s a SCM problem.
It’s a SCM problem.
Unfortunately, it is not just a SCM problem. I wish the problem was that easy. Supply chain attacks can have at many points during the software-value chain. Wheeler himself brought reproducible builds to attention because of this reason (e.g., a backdooring compiler). Software updates and distribution media are also a common means of attack.
All in all, I think it’s a very underdeveloped field in cyber security that has a really wide attack surface and with devastating consequences.
Needless to say, David A. Wheeler brought many issues to the table years ago and we’re finally realizing that we need to do something about it :P
The collection, protection, and distribution of software securely via repos is SCM security. It’s a subset of supply, chain security which entails other things such as hardware. Securing that is orthogonal with different methods. Here’s an analysis I did on it if you’re interested in that kind of thing:
David A. Wheeler learned this stuff from the same people I did who invented INFOSEC and high-assurance security. They immediately told us how to counter a lot of the issues with high-assurance methods for developing the systems, SCM for the software, and trusted couriers for hardware developed similarly. Wheeler has a nice page called High Assurance FLOSS that surveys tools and methods. He turned the SCM stuff into that great summary that I send out. I also learned a few new things from it such as the encumberance attack. His goal was that FOSS developers learned high-assurance methods, applied at least medium assurance with safe languages, applied this to everything in the stack from OS’s to compilers to apps, and also developed and used secure SCM like OpenCM and Aegis tried to do. The combination, basically what Karger et al advised starting in MULTICS evaluation, would eliminate most 0-days plus deliver the software securely. Many problems solved.
They didn’t do that, though. Both proprietary sector and FOSS invested massive effort into insecure endpoints, langauges, middleware, configurations, and so on. The repo software that got popular were anything but secure. Being pragmatic, he pivoted to try to reduce risk of issues such as Paul Karger’s compiler-compiler subversion and MITMing of binaries during distribution. His methods for this were Diverse-Double Compilation and reproducible builds. Nice tactics with DDC being hard to evaluate esp given the compiler can still be malicious or buggy (esp optimizing security-critical code). The reproducible builds have their own issues where they eliminate site-specific optimizations or obfuscations since hashes won’t match. I debated that with him on Hacker News with us just disagreeing on the risk/reward tradeoff of those. What we did agree on was that what’s needed and/or idea are a combination of high-assurance endpoints, transports, SCM, and compilers. His site already pushes that. We also agreed economic and social factors have kept FOSS from developing or applying them. Hence, methods like he pushes. The high-assurance, proprietary sector and academia have continuously developed pieces of or whole components like I’ve described with things occasionally FOSSed like CakeML, seL4, SAFEcode, and SPARK. So, it’s doable but they don’t do it.
If you’re wondering, the old guard did have a bag of tricks for interim solution. The repo is on highly-secure OS’s with mandatory access control. Two example, the first products actually, in link below. The users connect with terminals with each thing they submit being logged. The system does builds, tests, and so on. It can send things out to untrusted networks that can’t get things in per security policy. Possibly via storage media instead of networking. Guard software also allows humans in the loop to review and allow/deny a code submission or software release. Signing keys are isolated or on a security coprocessor. The computers with source are in an access-controlled, TEMPEST shielded room few can enter. Copies of source in either digital or paper mediums are kept in a locked safe. The system has the ability to restore to trusted state if compromise happens with security-critical actions logged. The people themselves are thoroughly investigated to reduce risk plus paid well. Any one of these helps reduce risk. Fully combining them would cover a lot of it.
In 2017, such methods combining isolation, paper, physical protection, and accountable submissions are still way more secure than how most security-critical software is developed today. If people desire, we also have highly-secure OS’s, tons of old hardware probably not subverted (esp if you pay cash for throwaways), good implementations of various cryptosystems, verified or just robust compilers for stuff from crypto DSL’s to C to ML, secure filesystems, secure schemes for external storage, cheap media for write-once backups or distribution, tons of embedded boards from countless suppliers for obfuscation, and so on. This is mostly not an open problem: it’s a problem whose key components have been solved to death with dead simple solutions for the basics like old guard did. Solving it the simple way is just really inconvenient for developers who value productivity and convenience over security. I mean, using GCC, Git, Linux, and 3rd-party services over a hostile Internet on hardware from sneaky companies is both so much easier and set up to fail in countless ways. Have failed in countless ways. If people really care, I tell them to use low-risk components instead with methods that worked in the past and might work again. It’s just not going to be as fun (FOSS) or cheap (proprietary).
Quick Note… I do have a cheat based on old pattern of UntrustedProducer/TrustedChecker where you develop everything on comfortable hardware writing the stuff that works down on paper then manually retype in trusted hardware. If it still works, it probably wasn’t subverted. Tediuous but effective. I’ve never seen a targeted, remote, software attack that beat that. Sets bar much higher. Clive Robinson and I also determined infrared was among the safest if you wanted careful communication between electrically-isolated machines. Lots of suppliers, too. Hardware logic for anything trusted can be done in an ASIC on old nodes that are visually inspectable w/ shuttle runs for cost reduction. All the bounds checks and interface protection built-in. Lots of options to let one benefit from modern tooling while maintaining isolation of key components. Just still going to be inconveient, cost more, or both.
I’m glad that finally supply chain attacks are both being detected and acknowledged as an issue. Here at NYU, we have been working on a framework called in-toto to address this for over a year ago now. Although I agree with the just use [buzzword] point, I think in-toto is a good way forward to start discussing and addressing the issue.
just use [buzzword]
There are some videos of our talks at debconf and dockercon and others in the website.
And sure enough, VLC is currently serving a security update 2.2.6 over HTTP by default.