Unlike the prototypical buffer overflow vulnerability, exploiting features is very reliable. Exploiting something like shellshock or imagetragick requires no customized assembly and is independent of CPU, OS, version, stack alignment, malloc implementation, etc.
^ buried the lede a bit here. I think this insight is worth as much as the rest of the article put together, so maybe it deserved more prominent placement than right at the end.
Yeah, that point may have been too obvious for me, I didn’t consider anybody wouldn’t realize it. Would emphasize it more in a do over.
Nice points I overall agree with. Then, following original post I see…
“Web browsers have existed as a category for about twenty years. In that time, nobody has ever produced one that I’d call secure.”
Is that all the code or the architecture? If the former, products going back to maybe 2005 (eg INTEGRITY-178B w/ Padded Cell) used separation kernels with 4-12Kloc to isolate browsers or other untrusted apps with mediation of data going into or out of that partition. Isolated, partitioning networking stacks & filesystems on top of that. If the latter, was @tedu aware of OP/OP2, Gazelle, or Illinois Browser Operating System? The high-assurance security side of things was at least trying to make a secure browser. Results were huge improvements on prior architectures. Some impact even given Chrome’s model was a watered down, albeit clever, version of OP’s strategy. Their modifications put top performance above security with the result in market share and vulnerabilities predictable. Haha.
The sandbox at the playground is pretty safe when it’s the morning, there’s only your kid inside, and you’re keeping a close eye on them. It’s a lot less safe when it’s the afternoon, it’s full of kids, and all the parents have mostly given up due to the insanity. But, it’s sure a lot more fun!
Is there an academic term for the inherent tension of any successful sandbox? “The paradox of the sandbox?”
Sandboxing is part of what I call mainstream security. The industry is a lot like Popular Pyschology is to cognitive science. The high-assurance methodology is to solve as many root problems as possible with solutions as strong as possible. A number of these resisted NSA pentesting for years at a time whereas the kinds of things mainsteam security push often fail and in repeating ways. Medium assurance is what we call a mix of regular stuff with right amount of high-security stuff added. A bit more practical.
What’s popularly called sandboxing are “security solutions” that build on a pile of components built with low-assurance methods that nearly guarantee bypass opportunities. Those kind of things keep getting hit while their proponents justify continuing to use similar methods despite the failures. This will remain popular due to network effects of those people and their widely distributed code. It’s a social phenomenon more than a technical inevitability, though.
It’s a social phenomenon more than a technical inevitability, though.
When as-is interfaces are all low-assurance and adhoc, is there a way to get from here to there without sacrificing backwards compatibility in a way that guarantees market rejection?
I think we agree the issue is social. Does that mean the solution(s) is/are too?
“is there a way to get from here to there without sacrificing backwards compatibility in a way that guarantees market rejection?”
Starting in early 1980’s, high-assurance security used a number of methods to increase assurance of hardware. The basic ones in A1 certifications added 30-50% to the cost of a solid development. Bigger problem for companies was time-to-market was way slower due to all the rigor. However, medium-assurance methods like Cleanroom can knock out tons of vulnerabilities with good time to market. I have a summary below of the assurance techniques & examples of application from high-assurance security over decades field applied them:
Any solution will have to be socially or economically acceptable. That means it has to run something mainstream, like DNS or Linux, likely designed in most insecure fashion. This requires isolation, transformation, and/or monitoring of that things. The transformation is either manual, clean-slate work or automated by tools. As bolt-on approaches always failed, the solution must either provably eliminate root causes of problems (eg pointer modifications, buffer overflows) or provably isolate/detect them. Let’s go through a few examples of how a socially-acceptable solution might be made.
The oldest starts with a separation kernel and trusted components built in the strong ways. It adds a VM for the compatibility API that runs the legacy apps. Security-critical apps can run directly on the kernel natively or with a runtime for safe language like Ada. The two can communicate through IPC mediated by separation kernel and/or policy-enforcing middleware. Commercial products and academics stay doing stuff like this. I’ll give you Nizza paper for its great illustrations/examples plus a commercial, separation kernel so you can see its features (ignore the marketing) and certification data that went into evaluation. Btw, GenodeOS is a FOSS attempt at Nizza-like architecture for desktops.
For automated transformation, you’re talking tools such as SVA-OS + SAFEcode or data flow integrity that make a traditional codebase immune to many forms of code injection often at a significant price in performance. There’s CPU’s developed to eliminate most of that price but deploying CPU’s has its own higher price. ;)
The next thing one can do is an appliance for specific jobs or protocols. This has a lot of potential on marketing side. Green Hills has a whole subsidiary doing this with INTEGRITY-178B for everything from secure browsing to call center software. Other examples are Secure64’s SourceT OS for DNS and Sentinel’s HYDRA (also uses INTEGRITY RTOS). Again, focus on details that aren’t marketing BS. ;)
Next side is embedded where you have a low-cost MCU or CPU w/ necessary peripherals that’s itself worth selling. The added benefit is it has features to prevent code injection in C language or supports safer one like Java. You can do same thing in severs by selling a “platform” where CPU or OS is done same. I’ll include an example of secure CPU and high-level language OS. Note that the smartcard market uses more high-assurance hardware and software than about any of them. Just weak MCU’s, though.
Save best for last. You can also just produce a product that people buy. You run that product on more secure tech. As money comes in, you use some of it to increase the assurance of the underlying tech or build new tech. This is what big players like Microsoft do with Microsoft Research. However, small players do it, too, like MLstate’s Opa langage for web apps (actually any safer language), Ada community in embedded sector, Rust or Eiffel in business apps, SQLite especially comes to mind, and all kinds of commercial spinoffs from academia where grants paid for hardest stuff. Best advice I have is to make money however to then funnel into stuff like my above list. Which also makes money to produce more money and secure stuff. A virtuous cycle. :)
Thank you for such a knowledgeable and thoughtful reply.
Funny how different people look at the same thing and see different problems. I look at all these bugs and I see a lack of type safety. ftp has a bug when following redirects because doesn’t distinguish between the filename and the local viewer to run, it just smooshes them together into a single string. Shellshock because Bash doesn’t distinguish between function/variable definitions and commands to execute, it just accepts them all in the same input string. ImageTragick because ImageMagick doesnt' distinguish between scripts and image data, it just accepts whatever as input. ed and pdflatex vulnerabilities because again there’s no distinction made between data and executable commands. Vim lilili because tmux is apparently configured by sending <F19> followed by some text over its keyboard input channel. (I think the NSF vulnerability is a fairly ordinary vulnerability, that doesn’t follow the pattern of the rest of the examples; it’s not really a feature interaction, just a bug in an emulator, though I suppose the emulator implementation technique could be blamed).
Back in the 1940s, theorists realised that the lambda calculus (i.e. all programming languages of the time) was subtly but deeply flawed and that types were needed to fix it. And they added types, and added generics to make them practical to use, and got on with things. Unfortunately no-one told the inventors of Unix or C, so these things carried the fundamental brokenness within them, and in any case C is so bad at expressing structured values that this same author recommends passing strings instead and parsing them, perpetuating this kind of vulnerability as soon as someone wants to change that value in response to user input.
I don’t think Unix is salvageable. Constant vigilance can reduce your defect rate by a factor of 10 or 100, and ad-hoc mitigation measures like ASLR and pledge can reduce exploitability of defects by a similar rate, but that still just makes exploitability of a system a question of scale. The OpenBSD team have removed many features, but they still have more than 2 features and more than 100 lines of code, and probably always will do. Any new systems I produce will be OCaml unikernels. They’ll still get exploited for the time being because they have to actually run somewhere and at the moment that’s Xen and Xen’s codebase is pretty bad, but “run this unikernel image” is something that seems at least feasible to implement with less than 100 lines and fewer than 2 features.
It’s worse. They knew about the stuff but intentionally ignored it. Thompson also really liked BCPL. Holdovers from bad hardware they started with were never changed. Even simple solutions from INFOSEC field were ignored. An example was setuid root vulnerability. Trusted Xenix knocked that out by just clearing the bit every time the file was written to with admin having to manually reset it. I think they did one other thing that was small. Idea being they’d only do that during a software update to begin with. Instead, they left us to hunt for and be careful with the setuid apps. Another favorite is how predecessor MULTICS had stack whose contents flow away from stack pointer but UNIX kept it flowing toward it to aid attempts to clobber it. Justified initially by hardware but could’ve been eliminate in at least some builds when RISC showed up. Enough adoption would’ve likely had Intel or AMD supporting it as an execution option to generate more demand for their chips.
“Any new systems I produce will be OCaml unikernels. ”
MirageOS is neat. Madhavapeddy is also one of my favorite, CompSci people. Just because of how he presents himself and gets audiences excited about computing in safe languages. Along with work to increase ease of adoption.
“They’ll still get exploited for the time being because they have to actually run somewhere and at the moment that’s Xen and Xen’s codebase is pretty bad”
You’re already smarter about it than Joanna at QubesOS. I warned her to switch to a stronger TCB with isolation built-in, fast IPC, a trusted path for graphics, and user-mode drivers. I named off some projects & products you see me mentioning here. She shouted all kinds of shit at me saying Xen was good enough & other stuff didn’t matter. Years later she’s griping at Xen people about their security & has a graphics subsystem. I don’t know if she did my driver recommendation. It was funny stuff.
The MirageOS situation could improve over time if ported to something like Genode. The underlying components need to mature, though, before I’d believe in a security benefit. Redox also has potential to offer components for a safer MirageOS.
Really interesting talk. Even aside from security issues, I find that a significant proportion of bugs results from unintended interactions between different pieces of functionality, so reducing complexity is good for software quality in general. Except, of course, there is far too much temptation in new features, and nobody really figured out how to keep a lid on complexity.
Real example of unintended consequences that I’m dealing with right now:
Nothing to do with that. I’ll come back to SSL later, but the punch line is bugs are temporary, workarounds are forever.
Want to expand on that a little here @tedu?
EDIT: also, as always, some wonderful burns:
You may ask why you need to be the one who handles tiff conversion. Shouldn’t the person with the tiff selfie convert it before uploading? Turns out there’s very little overlap between people who would choose to upload tiff files and people who can convert them to another format.
Two kinds of awesome. So first, there’s ssl deployment in the wild, with people keeping ssl v2 etc. alive past the sell by date. Or the fact that F5 breaks with certain packet sizes, so everybody works around this. This has two effects. In twenty years, well still be sending fun sized packets long after the bug is dead. There was OpenSSL code to workaround such bugs in Netscape 1.0. And second it prolongs the life of the bug because works for me.
The other awesome is on the development side where workarounds for obscure win16 linker bugs survive for 20 years. Again, well past the sell by date. But nobody wants to be the one to decide that Windows 3.1 and the fucking trumpet winsock is obsolete.
In security terms, some of the engineering decisions OpenSSL has made in the past are suboptimal, and make other problems worse. Implementation problem. The broader community’s continued support for bullshit like export ciphers has also been bad for security. Culture problem.
I like this talk/essay a lot, A+. Good on substance and good on quips.