1. 44
  1.  

  2. 11

    Oh, I did not expect the author to pop calc.exe with this.

    1. 6

      This is so hilariously awful.

      1. 5

        Man this really is absurdly underestimated. It never occurred to me that a spreadsheet program could be coerced into executing arbitrary code in this way

        1. 3

          They probably should’ve used a “Formula CSV”(.fcsv, or .fsv) file extension that extends “Data CSV”(.csv) with arbitrary formulas. Or even a “Program CSV”(.pcsv, or .psv) for network requests or shell commands, though that one might be too many choices of file format.

          1. 1

            So, there’s a program that operates on data that can be treated as code. That data is essentially an interpreted program that contains a lot of numbers but also code snippets. It has ways of exporting or importing that interpreted program. The CSV form of an interpreted program can, when loaded into an interpreter, be interpreted as a program. That’s a stunning realization the author has made. I’m sure he’ll go on to investigate scripting languages to find they’ve been running code from mere text files.

            The good news is that it’s been common advice for ages to sandbox any Microsoft apps or web browsers. How effective it is depends on the platform and sandboxing tech. Office apps being used by malware is a well-known vector, though.

            1. 7

              I think they are pointing out something useful which is that many of us, myself included, did not realize that apps evaluated the data in a csv file as code. I expect this in xlsx files, not in csv.

              We in the tech community are often surprised when things we think of as readonly parsers do something surprising. See also: the strings command and ldd.

              http://www.catonmat.net/blog/ldd-arbitrary-code-execution/

              1. 4

                Ok, I see the point about people assuming CSV might have been fine. This will bring awareness to that bad assumption. From a security standpoint, you are taking input whose integrity or authenticity is unknown. Any time a program performs operations on potentially-malicious input an attack can occur. Doubles if the program is in unsafe language or program data can contain executable code. All I saw was “malicious data into program with history of exploits whose format was CSV.”

                That’s why I even sandboxed Notepad and was happy to see that Rust editor in development. Nothing is safe from malicious input if the code or runtime doesnt validate it properly.

                1. 6

                  That’s why I even sandboxed Notepad and was happy to see that Rust editor in development. Nothing is safe from malicious input if the code or runtime doesnt validate it properly.

                  XI Editor? Rust won’t save you from any of the problems described in the post. Once the programmer decides to take remote input and interpret it as executable instructions, all bets are off. (which is the whole problem here: where do you draw the line?)

                  1. 2

                    As I said, you draw the line by determining boundaries for the interpreters that you enforce at parsing/compile-time and/or runtime. A text editor should only be able to render text versus run arbitrary computations affecting system resources. If it supports richer text, then it should only perform operations on a data structure representing the image the user sees which is passed to graphics stack within its boundaries.

                    Text editor is an easy example. Excel is a harder one. I’d start by giving executable portions a label that was clear to the user. We might have metadata, static parts like text, and dynamic parts. The executable files already do this. Then, user can optionally set the program to be in Safe Mode where it only uses the static content. Just one way.

                  2. 3

                    I remember as late as 1999 telling people “no, don’t worry, you can’t get a virus just from reading an email”… thanks, Microsoft!

                    1. 1

                      Then, Apple tops them by saying Mac users can’t get viruses at all. Until iServices showed them that pirating Photoshop wasn’t a great idea. ;)

                  3. 3

                    I remember this also being the reaction when the first libjpeg vulnerabilities were found, after the library had been in wide use across platforms for a decade. Today, perhaps it’s “obvious” that image formats can be dangerous, or perhaps it’s generally been forgotten… I don’t have the perspective to say which.

                    What I wish is that we had more robust systems that could make these assumptions explicit, and validate them. Strong type systems are a start, but I can easily imagine a csv parser in Haskell which would still be vulnerable to this particular issue, since the problem isn’t in the parser itself but in how its output is used.

                    1. 2

                      Following up on my other comment, three teams in the 1960’s I usually cite… Barton of B5000, Dijkstra’s THE, and Hamilton at Apollo… all discovered one must validate data (esp preconditions) at interfaces while avoiding risky constructs or mitigating them. Burroughs, ALGOL, and Wirth were widespread knowledge over next decade or two. Industry and FOSS diverged to ad hoc methods of correctness in languages like C. Meyer’s Eiffel and the Ada folks follower the prior principles preventing a lot of these attacks.

                      So, this is more an example of mainstream ignoring proven lessons in favor of other stuff. Then, they re-discovered malicious data could cause malicious execution. For a while, they mentally silo’d which programs they’d worry about for this whereas the forerunners in the 60’s were clear it was a problem at every interface. They’re now catching up to the 1960’s understanding that everything down to a CSV file or overloaded MOV instruction can be a problem.

                      Whether they’ll adopt 1980’s methods of solving this like Design-by-Contract with safe language remains to be seen. I bet system programmers keep coding in C with weak or no preconditions or other mitigations. Managed languages and Ada 2012 got contracts, though. So, some progress toward the 1960-1980’s level of robustness.

                      1. 3

                        I generally agree with your characterization, and what I most want to focus on is designing systems which don’t require every programmer to understand these lessons.

                        I think most programmers aren’t thinking about security all the time; there’s a tendency to mentally model certain tasks as security-sensitive and the rest as not being. And specific cognitive traps such as “this file format is too simple to be an issue” are worth being aware of when trying to build awareness. If nothing else, this exploit is useful as a starting point for discussion to get buy-in to actually use better-engineered solutions.

                        1. 3

                          That’s a good point. They do silo it. I fell into the same trap as a developer with no experience in security long ago. I used text files almost exclusively to reduce trust required, esp for others. Easy to inspect. Then, when I learned actual secure coding, I found that anything might have a problem if code was bad.

                          It might be easier to argue for correctness/reliability/maintainability instead of security since the methods I cite were originally for those with security benefits being a side effect. People might care about their code working most of the time even if not security-critical app. We sell them on that.