The real scary thing is that it took users weeks to notice that it shipped, despite that it wasn’t obfuscated in any way. This shows how risky the ecosystem is, without enough eyes reviewing published crates. If any high profile crate author gets infected with malware that injects itself into crates, it’s going to be an apocalypse for Rust.
I think it’s only a sign that we’re unaware until this hits a sandboxed / reproducable build system. I guess that’s currently distribution packaging or projects that otherwise use Nix or Bazel to build.
If the complaint is that binaries are more difficult to audit than source, and no one is auditing, then it should make no difference either way from a security perspective.
I think “weeks” is a bit of an exaggeration. People were openly discussing it at least a week after release. It’s true though that it didn’t blow up on social media until weeks later and many people didn’t realise until then.
If it had been a security issue or it was done by someone much less reputable than the author of serde or if the author did not respond then I suspect rustsec may have been more motivated to post an advisory.
Something that I might have expected to see included in this comment, and that I instead will provide myself, is a plug for bothering to review the code in one’s (prospective) dependencies, or to import reviews from trusted other people (or, put differently, to limit oneself to dependencies that one is able and willing to review or that someone one trusts has reviewed).
I recall that kornel at least used to encourage the use of cargo-crev, and their Lib.rs now also shows reviews from the newer and more streamlined cargo-vet.
I note that the change adding the blob to Serde was reviewed and approved through cargo-vet by someone at Mozilla. I don’t think that necessarily means these reviewing measures would not be useful in a situation that isn’t as much a drill (i.e., with a blob more likely to be malicious).
Yeah - my recollection of crev is that libraries like serde often got reviews like “it’s serde, might as well be the stdlib, I trust this without reviewing it as the chances of it being malicious are basically zero”
What a ridiculous thing to have even happened in the first place, let alone refusing to acknowledge there could possibly be an issue for so long. I glad it’s been fixed but would make me think twice about using serde. I’m sure it’ll be fine, who’s ever heard of a security issue in a codec anyway?
Remember that there are real human being maintaining serde. It is not, in fact, blindingly obvious to all developers that the pre-compiled blobs were bad; on this site there were loud voices on both sides. Can you imagine suddenly getting caught in the crosshairs of angry developers like that? When I imagine it, it feels bad, and I’m liable to get defensive about it.
It may also have been a failed attempt at fixing something you’ve heard people complain about all the time, probably even about your code that slows down peoples builds (*). So yeah it was a bad idea in hindsight, but we don’t need more burned out maintainers from this. And I say this as someone who is openly disappointed by this happening.
(*) I’m not going to discuss how much time it actually saved.
Yeah, basically the biggest gains are offset by process creation being surprisingly slow. I’m working on a follow-up article where I talk about that in detail.
I posted your piece because it was the first one that explained in detail what the hell was going on, specifically how serde works. Looking forward to a followup.
That’s how it started, then they centralized everything with one team that doles out the “managed CI” offering, with their own global library and controls. Any competing infra gets flagged and audited hardcore until you give up by attrition.
This seems to only be checking the performance under –release. Most compilation is done without –release, meaning that most of the proc macro will not be optimized.
As someone who packages software, I think it’s worth noting that packagers expect different things than end users, though they are compatible.
One of my wishes is to avoid blobs from a vendor, since we can’t always recompile those in the build process to work with the architectures we support.
(The other big difference is the DESTDIR env var. End users don’t generally care, but it becomes essential when preparing a package)
I therefore understand those who support their end users, before getting packaged.
The real human being maintaining serde knew about the pushback that would happen and did it on purpose to prove a point in a pre-RFC he submitted. I don’t feel particularly bad about him getting pushback for using half the Rust ecosystem as his guinea pigs. (In fact I would like to see more of it.)
The real human being maintaining serde knew about the pushback that would happen and did it on purpose to prove a point in a pre-RFC he submitted.
What’s the reason to believe in this over any other explanation of the situation? E.g. that pushback was unexpected and that the RFC is the result of the pushback, rather than a cause?
I consider dtolnay a competent open source maintainer who understands the people who run his code well, and I would expect any competent open source maintainer to expect such pushback.
But how that necessary leads to “on purpose to prove a point”?
I don’t think dtolnay expected exactly zero pushback. But, given that some people in this thread argue quite a reasonable point that binaries are actually almost as fine as source, it is plausible that only bounded pushback was expected.
“Someone else is always auditing the code and will save me from anything bad in a macro before it would ever run on my machines.” (At one point serde_derive ran an untrusted binary for over 4 weeks across 12 releases before almost anyone became aware. This was plain-as-day code in the crate root; I am confident that professionally obfuscated malicious code would be undetected for years.)
I don’t see someone competent casually pushing such a controversial change, casually saying that this is now the only supported way to use serde, casually pushing a complete long pre-RFC that uses the controversial change to advance it, and then casually reverting the change in the span of a few days. That takes preparation and foresight.
I actually respect this move. It is exactly the kind of move I would do if I had goodwill to burn and was frustrated with the usual formal process, and it takes boldness and courage to pull it off the way he did it. I also think the pushback is entirely appropriate and the degree of it was quite mild.
Aha, thanks! I think that’s a coherent story to infer from this evidence (and I was wondering if there might be some missing bits I don’t know).
From where I stand, I wouldn’t say that this explanation looks completely implausible, but I do find it unlikely.
For me, the salient bits are:
what it says on the tin. dtolnay didn’t write a lot of responses in the discussion, but what they have written is more or less what I have expected to see from a superb maintainer acting in good faith.
there wasn’t any previous Wasm macro work that was stalled, and that required nefarious plans to get it unstuck.
really, literally everyone wants sandboxed Wasm proc macros. There can’t be any more support for this feature already. What is lacking is not motivation or support but (somewhat surprisingly) a written-down RFC for how to move forward and (expectedly) implementation effort to make it come true.
dtolnay likes doing crazy things! Like how all crates follow 1.0.x versions, or watt, or deref-based specialization in anyhow. So, “because I can” seems like enough motivation here.
“if you don’t like a feature in this crate, don’t use the crate or do the groundwork to make implementing this feature better” feels like a normal mode of operation for widely used OSS projects with sole maintainers. I’ve said as much with respect to MSRV of my once_cell crate.
I agree that there are multiple intepretations possible and that yours also follows from the evidence available. The reason I think it’s reasonable to consider something deeper to be going on is: every single Rust controversy I’ve discussed with key Rust people had a lot more going on than was there on the surface. Case in point: dtolnay was also the one thus far unnamed by anyone speaking for the project person who was involved in ThePHD’s talk being downgraded from a keynote. If I see someone acting surreptitiously in one case I will expect that to repeat.
O_o that’s news to me, thanks. It didn’t occur that dtopnay might have been involved there (IIRC, they aren’t a team lead of any top-level team, so I assume weren’t a member of the notorious leadership chat)
Calling me anonymous is pretty funny, considering I’ve called myself “whitequark” for close to 15 years at this point and shipped several world-class projects under it.
whitequark would be pretty well known to an old Rust team member such as matklad, having been one themself, so no, not anonymous… buut we don’t know this is the same whitequark, so yes, still anonymous.
I mean, I wrote both Rust language servers/IDEs that everyone is using and whitequark wrote the Ruby parser everyone is using (and also smaltcp). I think we know perfectly fine who we are talking with. One us might be secretly a Labrador in a trench coat, but that doesn’t have any bearing on the discussion, and speculation on that topic is hugely distasteful.
In terms of Rust team membership, I actually don’t know which team whitequark was on, but they are definitely on the alumni page right now. I was on the cargo team and TL for the IDE team.
You were talking about “Rust teams” and the only way I’ve seen that term used is to indicate those under the “Rust Project”. Neither person is on a Rust team or an alumni.
Tbh, it more reeks of desperation to make people’s badly configured CI flows faster. I think that a conspiratorial angle hasn’t been earned yet for this and that we should go for the most likely option: it was merely a desperate attempt to make unoptimized builds faster.
I think this is hard to justify when someone comes to you with a security issue, when your response is “fork it, not my problem”, and then closing the issue, completely dismissing the legitimate report. I understand humans are maintaining it, humans maintain all software I use in fact, and I’m not ok with deciding “Oh, a human was involved, I guess we should let security bad practices slide”. I, and I’m sure many others, are not frustrated because they didn’t understand the security implications, but because they were summarily dismissed and rejected, when they had dire implications for all their users. From my understanding, Serde is a) extremely popular in the Rust world, and b) deals in one of the most notoriously difficult kinds of code to secure, so seeing the developers’ reaction to a security issue is very worrying for the community as a whole.
The thing is, its not unambiguous whether this is a security issue. “Shipping precompiled binaries is not significantly more insecure than shipping source code” is an absolutely reasonable stance to have. I even think it is true if we consider only first-order effects and the current state of rust packaging&auditing.
Note also that concerns were not “completely dismissed”. Dismissal looks like “this is not a problem”. What was said was rather “fixing this problem is out of scope for the library, if you want to see it fixed, work on the underlying infrastructure”. Reflecting on my own behavior in this discussion, I might be overly sensitive here, but to me there’s a world of difference between a dismissal, and an acknowledgment with disagreement on priorities.
let alone refusing to acknowledge there could possibly be an issue for so long.
This is perhaps a reasonable take-away from all the internet discussions about the topic, but I don’t think this actually reflects what did happen.
The maintainer was responsive on the issue and they very clearly articulated that:
they are aware that the change makes it harder to build software for some users
they are aware that the change concerns some users from the security point of view
non-the-less, the change is an explicit design decision for the library
the way to solve this typical open-source dilemma is to allocate the work with the party needs the fruits of the work
Afterwards, when it became obvious that the security concern is not niche, but have big implications for the whole ecosystem, the change was reverted and a lot of follow-up work landed.
I do think it was a mistake to not predict that this change will be this controversial (or to proceed with controversial change without preliminary checks with wider community).
But, given that a mistake had been made, the handling of the situation was exemplary. Everything that needed fixing was fixed, promptly.
Afterwards, when it became obvious that the security concern is not niche, but have big implications for the whole ecosystem, the change was reverted and a lot of follow-up work landed.
I’m still waiting to hear what “security concern” there was here. Other language-package ecosystems have been shipping precompiled binaries in packages for years now; why is it such an apocalyptically awful thing in Rust and only Rust?
The main thing is loss of auditing ability — with the opaque binaries, you can not just look at the package tarbal from crates.io and read the source. It is debatable how important that is: in practice, as this very story demonstrates, few people look at the tarballs. OTOH, “can you look at tarballs” is an ecosystem-wide property — if we lose it, we won’t be able to put the toothpaste back into the tube.
This is amplified by the fact that this is build time code — people are in general happier with sandbox the final application, then with sandboxing the sprawling build infra.
With respect to other languages — of course! But also note how other languages are memory unsafe for decades…
The main thing is loss of auditing ability — with the opaque binaries, you can not just look at the package tarbal from crates.io and read the source.
It’s not that hard to verify the provenance of a binary. And it appears that for some time after serde switched to shipping the precompiled macros, exactly zero people actually were auditing it (based on how long it took for complaints to be registered about it).
OTOH, “can you look at tarballs” is an ecosystem-wide property — if we lose it, we won’t be able to put the toothpaste back into the tube.
The ecosystem having what boils down to a social preference for source-only does not imply that binary distributions are automatically/inherently a security issue.
With respect to other languages — of course! But also note how other languages are memory unsafe for decades…
My go-to example of a language that often ships precompiled binaries in packages is Python. Which is not exactly what I think of when I think “memory unsafe for decades”.
It’s not that hard to verify the provenance of a binary.
Verifying provenance and auditing source are orthogonal. If you have trusted provenance, you can skip auditing the source. If you audited the source, you don’t care about the provenance.
It’s a question which one is more practically important, but to weight this tradeoff, you need to acknowledge its existence.
This sounds like:
People say that they claim about auditing, and probably some people are, but it’s also clear that majority don’t actually audit source code. So they benefits of audits are vastly overstated, and we need to care about provenance and trusted publishing.
This doesn’t sound like:
There’s absolutely ZERO security benefits here whatsoever
I don’t know where your last two blockquotes came from, but they didn’t come from my comment that you were replying to, and I won’t waste my time arguing with words that have been put in my mouth by force.
That’s how I read your reply: as an absolute refusal to acknowledge that source auditing is a thing, rather than as a nuanced comparison of auditing in theory vs auditing in practice.
It might not have been your intention to communicate that, but that was my take away from what’s actually written.
In the original github thread, someone went to great lengths to try to reproduce the shipped binary, and just couldn’t do it. So it is very reasonable to assume that either they had something in their build that differed from the environment used to build it, or that he binary was malicious, and without much deeper investigation, it’s nearly impossible to tell which is the answer. If it was trivial to reproduce to build with source code you could audit yourself, then there’s far less of a problem.
Rust doesn’t really do reproducible builds, though, so I’m not sure why people expected to be able to byte-for-byte reproduce this.
Also, other language-package ecosystems really have solved this problem – in the Python world, for example, PyPI supports a verifiable chain all the way from your source repo to the uploaded artifact. You don’t need byte-for-byte reproducibility when you have that.
I guesss I should clarify that in GP comment the problem is misalignment between maintainer’s and user’s view of the issue. This is a problem irrespective of ground truth value of security.
Maybe other language package ecosystems are also wrong to be distributing binaries, and have security concerns that are not being addressed because people in those ecosystems are not making as much of a fuss about it.
If there were some easy way to exploit the mere use of precompiled binaries, someone would have by now. The incentives to use such an exploit are just way too high not to.
Anecdotally, I almost always see Python malware packaged as source code. I think that could change at any time to compiled binaries fwiw, just a note.
I don’t think attackers choosing binary payloads would mean anything for anyone really. The fundamental problem isn’t solved by reproducible builds - those only help if someone is auditing the code.
The fundamental problem is that your package manager has near-arbitrary rights on your computer, and dev laptops tend to be very privileged at companies. I can likely go from ‘malicious build script’ to ‘production access’ in a few hours (if I’m being slow and sneaky) - that’s insane. Why does a build script have access to my ssh key files? To my various tokens? To my ~/.aws/ folder? Insane. There’s zero reason for those privileges to be handed out like that.
The real solution here is to minimize impact. I’m all for reproducible builds because I think they’re neat and whatever, sure, people can pretend that auditing is practical if that’s how they want to spend their time. But really the fundamental concept of “running arbitrary code as your user” is just broken, we should fix that ASAP.
The fundamental problem is that your package manager has near-arbitrary rights on your computer
Like I’ve pointed out to a couple people, this is actually a huge advantage for Python’s “binary” (.whl) package format, because its install process consists solely of unpacking the archive and moving files to their destinations. It’s the “source” format that can ship a setup.py running arbitrary code at install time. So telling pip to exclusively install from .whl (with --only-binary :all:) is generally a big security win for Python deployments.
(and I put “binary” in scare quotes because, for people who aren’t familiar with it, a Python .whl package isn’t required to contain compiled binaries; it’s just that the .whl format is the one that allows shipping those, as well as shipping ordinary Python source code files)
Anecdotally, I almost always see Python malware packaged as source code. I think that could change at any time to compiled binaries fwiw, just a note.
Agree. But that’s a different threat, it has nothing to do with altered binaries.
I don’t think attackers choosing binary payloads would mean anything for anyone really. The fundamental problem isn’t solved by reproducible builds - those only help if someone is auditing the code.
Code auditing is worthless if you’re not sure the binary you’re running on your machine has been produced from the source code you’ve audited. This source <=> binary mapping is precisely where source bootstrapping + reproducible builds are helping.
The real solution here is to minimize impact. I’m all for reproducible builds because I think they’re neat and whatever, sure, people can pretend that auditing is practical if that’s how they want to spend their time. But really the fundamental concept of “running arbitrary code as your user” is just broken, we should fix that ASAP.
This is a false dichotomy. I think we agree on the fact we want code audit + binary reproducibility + proper sandboxing.
Agree. But that’s a different threat, it has nothing to do with altered binaries.
Well, we disagree, because I think they’re identical in virtually every way.
Code auditing is worthless if you’re not sure the binary you’re running on your machine has been produced from the source code you’ve audited. This source <=> binary mapping is precisely where source bootstrapping + reproducible builds are helping.
I’m highly skeptical of the value behind code auditing to begin with, so anything that relies on auditing to have value is already something I’m side eyeing hard tbh.
I think we agree on the fact we want code audit + binary reproducibility + proper sandboxing.
I think where we disagree on the weights. I barely care about binary reproducibility, I frankly don’t think code auditing is practical, and I think sandboxing is by far the most important, cost effective measure to improve security and directly address the issues.
I am familiar with the concept of reproducible builds. Also, as far as I’m aware, Rust’s current tooling is incapable of producing reproducible binaries.
And in theory there are many attack vectors that might be present in any form of software distribution, whether source or binary.
What I’m looking for here is someone who will step up and identify a specific security vulnerability that they believe actually existed in serde when it was shipping precompiled macros, but that did not exist when it was shipping those same macros in source form. “Someone could compromise the maintainer or the project infrastructure”, for example, doesn’t qualify there, because both source and binary distributions can be affected by such a compromise.
Aren’t there links in the original github issue to exactly this being done in the NPM and some other ecosystem? Yes this is a security problem, and yes it has been exploited in the real world.
What I’m looking for here is someone who will step up and identify a specific security vulnerability that they believe actually existed in serde when it was shipping precompiled macros, but that did not exist when it was shipping those same macros in source form. “Someone could compromise the maintainer or the project infrastructure”, for example, doesn’t qualify there, because both source and binary distributions can be affected by such a compromise.
If you have proof of an actual concrete vulnerability in serde of that nature, I invite you to show it.
The existence of an actual exploit is not necessary to be able to tell that something is a serious security concern. It’s like laying an AR-15 in the middle of the street and claiming there’s nothing wrong with it because no one has picked it up and shot someone with it. This is the opposite of a risk assessment, this is intentionally choosing to ignore clear risks.
There might even be an argument to make that someone doing this has broken the law by accessing a computer system they don’t own without permission, since no one had any idea that this was even happening. To me this is up with with Sony’s rootkit back in the day, completely unexpected, unauthorised behaviour that no reasonable person would expect, nor would they look out for it because it is just such an unreasonable thing to do to your users.
I think the point is that if precompiled macros are an AR-15 laying in the street, then source macros are an AR-15 with a clip next to it. It doesn’t make sense to raise the alarm about one but not the other.
There might even be an argument to make that someone doing this has broken the law by accessing a computer system they don’t own without permission, since no one had any idea that this was even happening.
I think this is extreme. No additional accessing of any kind was done. Binaries don’t have additional abilities that build.rs does not have. It’s not at all comparable to installing a rootkit. The precompiled macros did the same thing that the source macros did.
The existence of an actual exploit is not necessary to be able to tell that something is a serious security concern. It’s like laying an AR-15 in the middle of the street and claiming there’s nothing wrong with it because no one has picked it up and shot someone with it. This is the opposite of a risk assessment, this is intentionally choosing to ignore clear risks.
Once again, other language package ecosystems routinely ship precompiled binaries. Why have those languages not suffered the extreme consequences you seem to believe inevitably follow from shipping binaries?
There might even be an argument to make that someone doing this has broken the law by accessing a computer system they don’t own without permission, since no one had any idea that this was even happening.
Even the most extreme prosecutors in the US never dreamed of taking laws like CFAA this far.
To me this is up with with Sony’s rootkit back in the day, completely unexpected, unauthorised behaviour that no reasonable person would expect, nor would they look out for it because it is just such an unreasonable thing to do to your users.
I think you should take a step back and consider what you’re actually advocating for here. For one thing, you’ve just invalidated the “without any warranty” part of every open-source software license, because you’re declaring that you expect and intend to legally enforce a rule on the author that the software will function in certain ways and not in others. And you’re also opening the door to even more, because it’s not that big a logical or legal leap from liability for a technical choice you dislike to liability for, say, an accidental bug.
The author of serde didn’t take over your computer, or try to. All that happened was serde started shipping a precompiled form of something you were going to compile anyway, much as other language package managers already do and have done for years. You seem to strongly dislike that, but dislike does not make something a security vulnerability and certainly does not make it a literal crime.
I think that what actually is happening in other language ecosystems is that while there are precompiled binaries sihpped along some installation methods, for other installation methods those are happening by source.
So you still have binary distribution for people who want that, and you have the source distribution for others.
I have not confirmed this but I believe that this might be the case for Python packages hosted on debian repos, for example. Packages on PyPI tend to have source distributions along with compiled ones, and the debian repos go and build packages themselves based off of their stuff rather than relying on the package developers’ compiled output.
When I release a Python library, I provide the source and a binary. A linux package repo maintainer could build the source code rather than using my built binary. If they do that, then the thing they “need to trust” is the source code, and less trust is needed on myself (on top of extra benefits like source code access allowing them to fix things for their distribution mechanisms)
So you still have binary distribution for people who want that, and you have the source distribution for others.
I don’t know of anyone who actually wants the sdists from PyPI. Repackagers don’t go to PyPI, they go to the actual source repository. And a variety of people, including both me and a Python core developer, strongly recommend always invoking pip with the --only-binary :all: flag to force use of .whl packages, which have several benefits:
When combined with --require-hashes and --no-deps, you get as close to perfectly byte-for-byte reproducible installs as is possible with the standard Python packaging toolchain.
You will never accidentally compile, or try to compile, something at runtime.
You will never run any type of install-time scripting, since a .whl has no scripting hooks (as opposed to an sdist, which can run arbitrary code at install time via its setup.py).
I mean there are plenty of packages with actual native dependencies who don’t ship every permutation of platform/Python version wheel needed, and there the source distribution is available. Though I think that happens less and less since the number of big packages with native dependencies is relatively limited.
But the underlying point is that with an option of compiling everything “from source” available as an official thing from the project, downstream distributors do not have to do things like, say, confirm that the project’s vendored compiled binary is in fact compiled from the source being pointed at.
Install-time scripting is less of an issue in this thought process (after all, import-time scripting is a thing that can totally happen!). It should feel a bit obvious that a bunch of source files is easier to look through to figure out issues rather than “oh this part is provided by this pre-built binary”, at least it does to me.
I’m not arguing against binary distributions, just think that if you have only the binary distribution suddenly it’s a lot harder to answer a lot of questions.
But the underlying point is that with an option of compiling everything “from source” available as an official thing from the project, downstream distributors do not have to do things like, say, confirm that the project’s vendored compiled binary is in fact compiled from the source being pointed at.
As far as I’m aware, it was possible to build serde “from source” as a repackager. It did not produce a binary byte-for-byte identical to the one being shipped first-party, but as I understand it producing a byte-for-byte identical binary is not something Rust’s current tooling would have supported anyway. In other words, the only sense in which “binary only” was true was for installing from crates.io.
So any arguments predicated on “you have only the binary distribution” don’t hold up.
Hmm, I felt like I read repackagers specifically say that the binary was a problem (I think it was more the fact that standard tooling didn’t allow for both worlds to exist). But this is all a bit moot anyways
I don’t know of anyone who actually wants the sdists from PyPI.
It’s a useful fallback when there are no precompiled binaries available for your specific OS/Arch/Python version combination. For example when pip installing from a ARM Mac there are still cases where precompiled binaries are not available, there were a lot more closer to the M1 release.
When I say I don’t know of anyone who wants the sdist, read as “I don’t know anyone who, if a wheel were available for their target platform, would then proceed to explicitly choose an sdist over that wheel”.
Also, not for nothing, most of the discussion has just been assuming that “binary blob = inherent automatic security vulnerability” without really describing just what the alleged vulnerability is. When one person asserts existence of a thing (such as a security vulnerability) and another person doubts that existence, the burden of proof is on the person asserting existence, but it’s also perfectly valid for the doubter to point to prominent examples of use of binary blobs which have not been exploited despite widespread deployment and use, as evidence in favor of “not an inherent automatic security vulnerability”
Yeah, this dynamic has been infuriating. In what threat model is downloading source code from the internet and executing it different from downloading compiled code from the internet and executing it? The threat is the “from the internet” part, which you can address by:
Hash-pinning the artifacts, or
Copying them to a local repository (and locking down internet access).
Anyone with concerns about this serde change should already be doing one or both of these things, which also happen to make builds faster and more reliable (convenient!).
Yeah, hashed/pinned dependency trees have been around forever in other languages, along with tooling to automate their creation and maintenance. It doesn’t matter at that point whether the artifact is a precompiled binary, because you know it’s the artifact you expected to get (and have hopefully pre-vetted).
Downloading source code from the internet gives you the possibility to audit it, downloading a binary makes this nearly impossible without whipping out a disassembler and hoping that if it is malicious, they haven’t done anything to obfuscate that in the compiled binary. There is a “these languages are turing complete, therefore they are equivalent” argument to be made, but I’d rather read Rust than assembly to understand behaviour.
The point is that if there were some easy way to exploit the mere use of precompiled binaries, the wide use of precompiled binaries in other languages would have been widely exploited already. Therefore it is much less likely that the mere presence of a precompiled binary in a package is inherently a security vulnerability.
Afterwards, when it became obvious that the security concern is not niche, but have big implications for the whole ecosystem, the change was reverted and a lot of follow-up work landed.
I’m confused about this point. Is anyone going to fix crates.io so this can’t happen again?
Assuming that this is a security problem (which I’m not interested in arguing about), it seems like the vulnerability is in the packaging infrastructure, and serde just happened to exploit that vulnerability for a benign purpose. It doesn’t go away just because serde decides to stop exploiting it.
I don’t think it’s an easy problem to fix: ultimately, package registry is just a storage for files, and you can’t control what users put there.
There’s an issue open about sanitizing permission bits of the downloaded files (which feels like a good thing to do irrespective of security), but that’s going to be a minor speed bump at most, as you can always just copy the file over with the executable bit.
A proper fix here would be fully sandboxed builds, but:
POSIX doesn’t have “make a sandbox” API, so implementing isolation in a nice way is hard.
There’s a bunch of implementation work needed to allow use-cases that currently escape the sandbox (wasm proc macros and metabuild).
I’m sure it’ll be fine, who’s ever heard of a security issue in a codec anyway?
Who’s ever heard of a security issue caused by a precompiled binary shipping in a dependency? Like, maybe it’s happened a few times? I can think of one incident where a binary was doing analytics, not outright malware, but that’s it.
I’m confused at the idea that if we narrow the scope to “a precompiled binary dependency” we somehow invalidate the risk. Since apparently “curl $FOO > sh” is a perfectly cromulent way to install things these days among some communities, in my world (30+ year infosec wonk) we really don’t get to split hairs over ‘binary v. source’ or even ‘target v. dependency’.
I’m not sure I get your point. You brought up codec vulns, which are irrelevant to the binary vs source discussion. I brought that back to the actual threat, which is an attack that requires a precompiled binary vs source code. I’ve only seen (in my admittedly only 10 Years of infosec work) such an attack one time, and it was hardly an attack and instead just shady monetization.
This is the first comment I’ve made in this thread, so I didn’t bring up codecs. Sorry if that impacts your downplaying supply chain attacks, something I actually was commenting on.
Ah, then forget what I said about “you” saying that. I didn’t check who had commented initially.
As for downplaying supply chain attacks, not at all. I consider them to be a massive problem and I’ve actively advocated for sandboxed build processes, having even spoken with rustc devs about the topic.
What I’m downplaying is the made up issue that a compiled binary is significantly different from source code for the threat of “malicious dependency”.
So not only do you not pay attention enough to see who said what, you knee-jerk responded without paying attention to what I did say. Maybe in another 10 years…
Because I can curl $FOO > foo.sh; vi foo.sh then can choose to chmod +x foo.sh; ./foo.sh. I can’t do that with an arbitrary binary from the internet without whipping out Ghidra and hoping my RE skills are good enough to spot malicious code. I might also miss it in some downloaded Rust or shell code, but the chances are significantly lower than in the binary. Particularly when the attempt from people in the original issue thread to reproduce the binary failed, so no one knows what’s in it.
No one, other than these widely publicisedinstances in NPM, as well as PyPi and Ruby, as pointed out in the original github issue. I guess each language community needs to rediscover basic security issues on their own, long live NIH.
I hadn’t dived into them, they were brought up in the original thread, and shipping binaries in those languages (other than python with wheels) is not really common (but would be equally problematic). But point taken, shouldn’t trust sources without verifying them (how meta).
But the question here is “Does a binary make a difference vs source code?” and if you’re saying “well history shows us that attackers like binaries more” and then history does not show that, you can see my issue right?
But what’s more, even if attackers did use binaries more, would we care? Maybe, but it depends on why. If it’s because binaries are so radically unauditable, and source code is so vigilitantly audited, ok sure. But I’m realllly doubtful that that would be the reason.
The real scary thing is that it took users weeks to notice that it shipped, despite that it wasn’t obfuscated in any way. This shows how risky the ecosystem is, without enough eyes reviewing published crates. If any high profile crate author gets infected with malware that injects itself into crates, it’s going to be an apocalypse for Rust.
Maybe this is also a sign that the complaints themselves were incoherent?
I think it’s only a sign that we’re unaware until this hits a sandboxed / reproducable build system. I guess that’s currently distribution packaging or projects that otherwise use Nix or Bazel to build.
But it highlights how little else is sandboxed.
Exactly, these complaints are incoherent unless you were already doing the things that would cause you to notice the change!
I’m not sure that “No one’s looking anyway, so it’s totally fine” is the right takeaway from this.
If the complaint is that binaries are more difficult to audit than source, and no one is auditing, then it should make no difference either way from a security perspective.
it is perfectly coherent to advocate for other people.
I think “weeks” is a bit of an exaggeration. People were openly discussing it at least a week after release. It’s true though that it didn’t blow up on social media until weeks later and many people didn’t realise until then.
If it had been a security issue or it was done by someone much less reputable than the author of serde or if the author did not respond then I suspect rustsec may have been more motivated to post an advisory.
Something that I might have expected to see included in this comment, and that I instead will provide myself, is a plug for bothering to review the code in one’s (prospective) dependencies, or to import reviews from trusted other people (or, put differently, to limit oneself to dependencies that one is able and willing to review or that someone one trusts has reviewed).
I recall that kornel at least used to encourage the use of
cargo-crev
, and their Lib.rs now also shows reviews from the newer and more streamlinedcargo-vet
.I note that the change adding the blob to Serde was reviewed and approved through
cargo-vet
by someone at Mozilla. I don’t think that necessarily means these reviewing measures would not be useful in a situation that isn’t as much a drill (i.e., with a blob more likely to be malicious).Yeah - my recollection of crev is that libraries like serde often got reviews like “it’s serde, might as well be the stdlib, I trust this without reviewing it as the chances of it being malicious are basically zero”
What a ridiculous thing to have even happened in the first place, let alone refusing to acknowledge there could possibly be an issue for so long. I glad it’s been fixed but would make me think twice about using serde. I’m sure it’ll be fine, who’s ever heard of a security issue in a codec anyway?
Remember that there are real human being maintaining
serde
. It is not, in fact, blindingly obvious to all developers that the pre-compiled blobs were bad; on this site there were loud voices on both sides. Can you imagine suddenly getting caught in the crosshairs of angry developers like that? When I imagine it, it feels bad, and I’m liable to get defensive about it.It may also have been a failed attempt at fixing something you’ve heard people complain about all the time, probably even about your code that slows down peoples builds (*). So yeah it was a bad idea in hindsight, but we don’t need more burned out maintainers from this. And I say this as someone who is openly disappointed by this happening.
(*) I’m not going to discuss how much time it actually saved.
This overview by @cadey implied it did not save much time at all, basically only if you were running a CI setup without a cache.
https://xeiaso.net/blog/serde-precompiled-stupid
Running a CI setup without a cache is, for better or for worse, very common
Yeah, basically the biggest gains are offset by process creation being surprisingly slow. I’m working on a follow-up article where I talk about that in detail.
I posted your piece because it was the first one that explained in detail what the hell was going on, specifically how serde works. Looking forward to a followup.
My workplace is way too big. This describes our CI setup. Only the blessed JVM gets to have cached CI builds.
This is how shadow IT begins. Has anyone started running/sharing their locally setup CI for your project yet?
That’s how it started, then they centralized everything with one team that doles out the “managed CI” offering, with their own global library and controls. Any competing infra gets flagged and audited hardcore until you give up by attrition.
This seems to only be checking the performance under –release. Most compilation is done without –release, meaning that most of the proc macro will not be optimized.
As someone who packages software, I think it’s worth noting that packagers expect different things than end users, though they are compatible.
One of my wishes is to avoid blobs from a vendor, since we can’t always recompile those in the build process to work with the architectures we support.
(The other big difference is the DESTDIR env var. End users don’t generally care, but it becomes essential when preparing a package)
I therefore understand those who support their end users, before getting packaged.
The real human being maintaining serde knew about the pushback that would happen and did it on purpose to prove a point in a pre-RFC he submitted. I don’t feel particularly bad about him getting pushback for using half the Rust ecosystem as his guinea pigs. (In fact I would like to see more of it.)
What’s the reason to believe in this over any other explanation of the situation? E.g. that pushback was unexpected and that the RFC is the result of the pushback, rather than a cause?
I consider dtolnay a competent open source maintainer who understands the people who run his code well, and I would expect any competent open source maintainer to expect such pushback.
But how that necessary leads to “on purpose to prove a point”?
I don’t think dtolnay expected exactly zero pushback. But, given that some people in this thread argue quite a reasonable point that binaries are actually almost as fine as source, it is plausible that only bounded pushback was expected.
The excerpt from the RFC is:
I don’t see someone competent casually pushing such a controversial change, casually saying that this is now the only supported way to use serde, casually pushing a complete long pre-RFC that uses the controversial change to advance it, and then casually reverting the change in the span of a few days. That takes preparation and foresight.
I actually respect this move. It is exactly the kind of move I would do if I had goodwill to burn and was frustrated with the usual formal process, and it takes boldness and courage to pull it off the way he did it. I also think the pushback is entirely appropriate and the degree of it was quite mild.
Aha, thanks! I think that’s a coherent story to infer from this evidence (and I was wondering if there might be some missing bits I don’t know).
From where I stand, I wouldn’t say that this explanation looks completely implausible, but I do find it unlikely.
For me, the salient bits are:
I agree that there are multiple intepretations possible and that yours also follows from the evidence available. The reason I think it’s reasonable to consider something deeper to be going on is: every single Rust controversy I’ve discussed with key Rust people had a lot more going on than was there on the surface. Case in point: dtolnay was also the one thus far unnamed by anyone speaking for the project person who was involved in ThePHD’s talk being downgraded from a keynote. If I see someone acting surreptitiously in one case I will expect that to repeat.
O_o that’s news to me, thanks. It didn’t occur that dtopnay might have been involved there (IIRC, they aren’t a team lead of any top-level team, so I assume weren’t a member of the notorious leadership chat)
Maybe take hearsay from an anonymous Internet catgirl with a grain of salt.
Calling me anonymous is pretty funny, considering I’ve called myself “whitequark” for close to 15 years at this point and shipped several world-class projects under it.
whitequark would be pretty well known to an old Rust team member such as matklad, having been one themself, so no, not anonymous… buut we don’t know this is the same whitequark, so yes, still anonymous.
Hm? Neither of them are Rust team members, unless they are represented under different names in the Project.
I mean, I wrote both Rust language servers/IDEs that everyone is using and whitequark wrote the Ruby parser everyone is using (and also smaltcp). I think we know perfectly fine who we are talking with. One us might be secretly a Labrador in a trench coat, but that doesn’t have any bearing on the discussion, and speculation on that topic is hugely distasteful.
In terms of Rust team membership, I actually don’t know which team whitequark was on, but they are definitely on the alumni page right now. I was on the cargo team and TL for the IDE team.
Thank you for all the context I was missing. Is it just oversight you aren’t on the alumni for those team pages?
Turns out there were at least two bugs in the teams repo with respect to me, thanks for pointing this out!
I’m glad my bickering had at least some positive outcome :)
Probably! I think https://github.com/rust-lang/team/commit/458c784dda91392b710d36661f440de40fdac316should have added me as one, not sure why that didn’t happen
I don’t know what you mean by “the Project”, but the source of truth for Rust team membership is https://github.com/rust-lang/team.
You were talking about “Rust teams” and the only way I’ve seen that term used is to indicate those under the “Rust Project”. Neither person is on a Rust team or an alumni.
https://www.rust-lang.org/governance
That is what I meant, yes. Those pages are generated from the Git repo I linked. Ctrl-F on https://www.rust-lang.org/governance/teams/compiler and https://www.rust-lang.org/governance/teams/alumni.
A find would tell you matklad was not on a team. He was “just” a contributor. No real data exists about whitequark.
Tbh, it more reeks of desperation to make people’s badly configured CI flows faster. I think that a conspiratorial angle hasn’t been earned yet for this and that we should go for the most likely option: it was merely a desperate attempt to make unoptimized builds faster.
I think this is hard to justify when someone comes to you with a security issue, when your response is “fork it, not my problem”, and then closing the issue, completely dismissing the legitimate report. I understand humans are maintaining it, humans maintain all software I use in fact, and I’m not ok with deciding “Oh, a human was involved, I guess we should let security bad practices slide”. I, and I’m sure many others, are not frustrated because they didn’t understand the security implications, but because they were summarily dismissed and rejected, when they had dire implications for all their users. From my understanding, Serde is a) extremely popular in the Rust world, and b) deals in one of the most notoriously difficult kinds of code to secure, so seeing the developers’ reaction to a security issue is very worrying for the community as a whole.
The thing is, its not unambiguous whether this is a security issue. “Shipping precompiled binaries is not significantly more insecure than shipping source code” is an absolutely reasonable stance to have. I even think it is true if we consider only first-order effects and the current state of rust packaging&auditing.
Note also that concerns were not “completely dismissed”. Dismissal looks like “this is not a problem”. What was said was rather “fixing this problem is out of scope for the library, if you want to see it fixed, work on the underlying infrastructure”. Reflecting on my own behavior in this discussion, I might be overly sensitive here, but to me there’s a world of difference between a dismissal, and an acknowledgment with disagreement on priorities.
This is perhaps a reasonable take-away from all the internet discussions about the topic, but I don’t think this actually reflects what did happen.
The maintainer was responsive on the issue and they very clearly articulated that:
Afterwards, when it became obvious that the security concern is not niche, but have big implications for the whole ecosystem, the change was reverted and a lot of follow-up work landed.
I do think it was a mistake to not predict that this change will be this controversial (or to proceed with controversial change without preliminary checks with wider community).
But, given that a mistake had been made, the handling of the situation was exemplary. Everything that needed fixing was fixed, promptly.
I’m still waiting to hear what “security concern” there was here. Other language-package ecosystems have been shipping precompiled binaries in packages for years now; why is it such an apocalyptically awful thing in Rust and only Rust?
The main thing is loss of auditing ability — with the opaque binaries, you can not just look at the package tarbal from crates.io and read the source. It is debatable how important that is: in practice, as this very story demonstrates, few people look at the tarballs. OTOH, “can you look at tarballs” is an ecosystem-wide property — if we lose it, we won’t be able to put the toothpaste back into the tube.
This is amplified by the fact that this is build time code — people are in general happier with sandbox the final application, then with sandboxing the sprawling build infra.
With respect to other languages — of course! But also note how other languages are memory unsafe for decades…
It’s not that hard to verify the provenance of a binary. And it appears that for some time after serde switched to shipping the precompiled macros, exactly zero people actually were auditing it (based on how long it took for complaints to be registered about it).
The ecosystem having what boils down to a social preference for source-only does not imply that binary distributions are automatically/inherently a security issue.
My go-to example of a language that often ships precompiled binaries in packages is Python. Which is not exactly what I think of when I think “memory unsafe for decades”.
Verifying provenance and auditing source are orthogonal. If you have trusted provenance, you can skip auditing the source. If you audited the source, you don’t care about the provenance.
It’s a question which one is more practically important, but to weight this tradeoff, you need to acknowledge its existence.
This sounds like:
This doesn’t sound like:
I don’t know where your last two blockquotes came from, but they didn’t come from my comment that you were replying to, and I won’t waste my time arguing with words that have been put in my mouth by force.
That’s how I read your reply: as an absolute refusal to acknowledge that source auditing is a thing, rather than as a nuanced comparison of auditing in theory vs auditing in practice.
It might not have been your intention to communicate that, but that was my take away from what’s actually written.
Once again, I don’t intend to waste my time arguing with someone who just puts words in my mouth.
In the original github thread, someone went to great lengths to try to reproduce the shipped binary, and just couldn’t do it. So it is very reasonable to assume that either they had something in their build that differed from the environment used to build it, or that he binary was malicious, and without much deeper investigation, it’s nearly impossible to tell which is the answer. If it was trivial to reproduce to build with source code you could audit yourself, then there’s far less of a problem.
Rust doesn’t really do reproducible builds, though, so I’m not sure why people expected to be able to byte-for-byte reproduce this.
Also, other language-package ecosystems really have solved this problem – in the Python world, for example, PyPI supports a verifiable chain all the way from your source repo to the uploaded artifact. You don’t need byte-for-byte reproducibility when you have that.
Ah yes, garbage collected languages are famously ‘memory unsafe for decades’
I guesss I should clarify that in GP comment the problem is misalignment between maintainer’s and user’s view of the issue. This is a problem irrespective of ground truth value of security.
Maybe other language package ecosystems are also wrong to be distributing binaries, and have security concerns that are not being addressed because people in those ecosystems are not making as much of a fuss about it.
If there were some easy way to exploit the mere use of precompiled binaries, someone would have by now. The incentives to use such an exploit are just way too high not to.
There are ways to exploit binary releases. It’s certainly not easy, but this has definitely been exploited in the wild.
You can read this page https://reproducible-builds.org/docs/buy-in/ to get a high-level history of the “reproducible build” (and bootstrapping) movement.
Anecdotally, I almost always see Python malware packaged as source code. I think that could change at any time to compiled binaries fwiw, just a note.
I don’t think attackers choosing binary payloads would mean anything for anyone really. The fundamental problem isn’t solved by reproducible builds - those only help if someone is auditing the code.
The fundamental problem is that your package manager has near-arbitrary rights on your computer, and dev laptops tend to be very privileged at companies. I can likely go from ‘malicious build script’ to ‘production access’ in a few hours (if I’m being slow and sneaky) - that’s insane. Why does a build script have access to my ssh key files? To my various tokens? To my ~/.aws/ folder? Insane. There’s zero reason for those privileges to be handed out like that.
The real solution here is to minimize impact. I’m all for reproducible builds because I think they’re neat and whatever, sure, people can pretend that auditing is practical if that’s how they want to spend their time. But really the fundamental concept of “running arbitrary code as your user” is just broken, we should fix that ASAP.
Like I’ve pointed out to a couple people, this is actually a huge advantage for Python’s “binary” (
.whl
) package format, because its install process consists solely of unpacking the archive and moving files to their destinations. It’s the “source” format that can ship asetup.py
running arbitrary code at install time. So tellingpip
to exclusively install from.whl
(with--only-binary :all:
) is generally a big security win for Python deployments.(and I put “binary” in scare quotes because, for people who aren’t familiar with it, a Python
.whl
package isn’t required to contain compiled binaries; it’s just that the.whl
format is the one that allows shipping those, as well as shipping ordinary Python source code files)Agree. But that’s a different threat, it has nothing to do with altered binaries.
Code auditing is worthless if you’re not sure the binary you’re running on your machine has been produced from the source code you’ve audited. This source <=> binary mapping is precisely where source bootstrapping + reproducible builds are helping.
This is a false dichotomy. I think we agree on the fact we want code audit + binary reproducibility + proper sandboxing.
Well, we disagree, because I think they’re identical in virtually every way.
I’m highly skeptical of the value behind code auditing to begin with, so anything that relies on auditing to have value is already something I’m side eyeing hard tbh.
I think where we disagree on the weights. I barely care about binary reproducibility, I frankly don’t think code auditing is practical, and I think sandboxing is by far the most important, cost effective measure to improve security and directly address the issues.
I am familiar with the concept of reproducible builds. Also, as far as I’m aware, Rust’s current tooling is incapable of producing reproducible binaries.
And in theory there are many attack vectors that might be present in any form of software distribution, whether source or binary.
What I’m looking for here is someone who will step up and identify a specific security vulnerability that they believe actually existed in
serde
when it was shipping precompiled macros, but that did not exist when it was shipping those same macros in source form. “Someone could compromise the maintainer or the project infrastructure”, for example, doesn’t qualify there, because both source and binary distributions can be affected by such a compromise.Aren’t there links in the original github issue to exactly this being done in the NPM and some other ecosystem? Yes this is a security problem, and yes it has been exploited in the real world.
I’m going to quote my other comment:
If you have proof of an actual concrete vulnerability in
serde
of that nature, I invite you to show it.The existence of an actual exploit is not necessary to be able to tell that something is a serious security concern. It’s like laying an AR-15 in the middle of the street and claiming there’s nothing wrong with it because no one has picked it up and shot someone with it. This is the opposite of a risk assessment, this is intentionally choosing to ignore clear risks.
There might even be an argument to make that someone doing this has broken the law by accessing a computer system they don’t own without permission, since no one had any idea that this was even happening. To me this is up with with Sony’s rootkit back in the day, completely unexpected, unauthorised behaviour that no reasonable person would expect, nor would they look out for it because it is just such an unreasonable thing to do to your users.
I think the point is that if precompiled macros are an AR-15 laying in the street, then source macros are an AR-15 with a clip next to it. It doesn’t make sense to raise the alarm about one but not the other.
I think this is extreme. No additional accessing of any kind was done. Binaries don’t have additional abilities that
build.rs
does not have. It’s not at all comparable to installing a rootkit. The precompiled macros did the same thing that the source macros did.Once again, other language package ecosystems routinely ship precompiled binaries. Why have those languages not suffered the extreme consequences you seem to believe inevitably follow from shipping binaries?
Even the most extreme prosecutors in the US never dreamed of taking laws like CFAA this far.
I think you should take a step back and consider what you’re actually advocating for here. For one thing, you’ve just invalidated the “without any warranty” part of every open-source software license, because you’re declaring that you expect and intend to legally enforce a rule on the author that the software will function in certain ways and not in others. And you’re also opening the door to even more, because it’s not that big a logical or legal leap from liability for a technical choice you dislike to liability for, say, an accidental bug.
The author of
serde
didn’t take over your computer, or try to. All that happened wasserde
started shipping a precompiled form of something you were going to compile anyway, much as other language package managers already do and have done for years. You seem to strongly dislike that, but dislike does not make something a security vulnerability and certainly does not make it a literal crime.I think that what actually is happening in other language ecosystems is that while there are precompiled binaries sihpped along some installation methods, for other installation methods those are happening by source.
So you still have binary distribution for people who want that, and you have the source distribution for others.
I have not confirmed this but I believe that this might be the case for Python packages hosted on debian repos, for example. Packages on PyPI tend to have source distributions along with compiled ones, and the debian repos go and build packages themselves based off of their stuff rather than relying on the package developers’ compiled output.
When I release a Python library, I provide the source and a binary. A linux package repo maintainer could build the source code rather than using my built binary. If they do that, then the thing they “need to trust” is the source code, and less trust is needed on myself (on top of extra benefits like source code access allowing them to fix things for their distribution mechanisms)
I don’t know of anyone who actually wants the sdists from PyPI. Repackagers don’t go to PyPI, they go to the actual source repository. And a variety of people, including both me and a Python core developer, strongly recommend always invoking
pip
with the--only-binary :all:
flag to force use of.whl
packages, which have several benefits:--require-hashes
and--no-deps
, you get as close to perfectly byte-for-byte reproducible installs as is possible with the standard Python packaging toolchain..whl
has no scripting hooks (as opposed to an sdist, which can run arbitrary code at install time via itssetup.py
).I misread that as “sadists from PyPi” and could not help but agree.
I mean there are plenty of packages with actual native dependencies who don’t ship every permutation of platform/Python version wheel needed, and there the source distribution is available. Though I think that happens less and less since the number of big packages with native dependencies is relatively limited.
But the underlying point is that with an option of compiling everything “from source” available as an official thing from the project, downstream distributors do not have to do things like, say, confirm that the project’s vendored compiled binary is in fact compiled from the source being pointed at.
Install-time scripting is less of an issue in this thought process (after all, import-time scripting is a thing that can totally happen!). It should feel a bit obvious that a bunch of source files is easier to look through to figure out issues rather than “oh this part is provided by this pre-built binary”, at least it does to me.
I’m not arguing against binary distributions, just think that if you have only the binary distribution suddenly it’s a lot harder to answer a lot of questions.
As far as I’m aware, it was possible to build
serde
“from source” as a repackager. It did not produce a binary byte-for-byte identical to the one being shipped first-party, but as I understand it producing a byte-for-byte identical binary is not something Rust’s current tooling would have supported anyway. In other words, the only sense in which “binary only” was true was for installing fromcrates.io
.So any arguments predicated on “you have only the binary distribution” don’t hold up.
Hmm, I felt like I read repackagers specifically say that the binary was a problem (I think it was more the fact that standard tooling didn’t allow for both worlds to exist). But this is all a bit moot anyways
It’s a useful fallback when there are no precompiled binaries available for your specific OS/Arch/Python version combination. For example when pip installing from a ARM Mac there are still cases where precompiled binaries are not available, there were a lot more closer to the M1 release.
When I say I don’t know of anyone who wants the sdist, read as “I don’t know anyone who, if a wheel were available for their target platform, would then proceed to explicitly choose an sdist over that wheel”.
Argumentum ad populum does not make the choice valid.
Also, not for nothing, most of the discussion has just been assuming that “binary blob = inherent automatic security vulnerability” without really describing just what the alleged vulnerability is. When one person asserts existence of a thing (such as a security vulnerability) and another person doubts that existence, the burden of proof is on the person asserting existence, but it’s also perfectly valid for the doubter to point to prominent examples of use of binary blobs which have not been exploited despite widespread deployment and use, as evidence in favor of “not an inherent automatic security vulnerability”
Yeah, this dynamic has been infuriating. In what threat model is downloading source code from the internet and executing it different from downloading compiled code from the internet and executing it? The threat is the “from the internet” part, which you can address by:
Anyone with concerns about this serde change should already be doing one or both of these things, which also happen to make builds faster and more reliable (convenient!).
Yeah, hashed/pinned dependency trees have been around forever in other languages, along with tooling to automate their creation and maintenance. It doesn’t matter at that point whether the artifact is a precompiled binary, because you know it’s the artifact you expected to get (and have hopefully pre-vetted).
Downloading source code from the internet gives you the possibility to audit it, downloading a binary makes this nearly impossible without whipping out a disassembler and hoping that if it is malicious, they haven’t done anything to obfuscate that in the compiled binary. There is a “these languages are turing complete, therefore they are equivalent” argument to be made, but I’d rather read Rust than assembly to understand behaviour.
The point is that if there were some easy way to exploit the mere use of precompiled binaries, the wide use of precompiled binaries in other languages would have been widely exploited already. Therefore it is much less likely that the mere presence of a precompiled binary in a package is inherently a security vulnerability.
I’m confused about this point. Is anyone going to fix crates.io so this can’t happen again?
Assuming that this is a security problem (which I’m not interested in arguing about), it seems like the vulnerability is in the packaging infrastructure, and serde just happened to exploit that vulnerability for a benign purpose. It doesn’t go away just because serde decides to stop exploiting it.
I don’t think it’s an easy problem to fix: ultimately, package registry is just a storage for files, and you can’t control what users put there.
There’s an issue open about sanitizing permission bits of the downloaded files (which feels like a good thing to do irrespective of security), but that’s going to be a minor speed bump at most, as you can always just copy the file over with the executable bit.
A proper fix here would be fully sandboxed builds, but:
Who’s ever heard of a security issue caused by a precompiled binary shipping in a dependency? Like, maybe it’s happened a few times? I can think of one incident where a binary was doing analytics, not outright malware, but that’s it.
I’m confused at the idea that if we narrow the scope to “a precompiled binary dependency” we somehow invalidate the risk. Since apparently “curl $FOO > sh” is a perfectly cromulent way to install things these days among some communities, in my world (30+ year infosec wonk) we really don’t get to split hairs over ‘binary v. source’ or even ‘target v. dependency’.
I’m not sure I get your point. You brought up codec vulns, which are irrelevant to the binary vs source discussion. I brought that back to the actual threat, which is an attack that requires a precompiled binary vs source code. I’ve only seen (in my admittedly only 10 Years of infosec work) such an attack one time, and it was hardly an attack and instead just shady monetization.
This is the first comment I’ve made in this thread, so I didn’t bring up codecs. Sorry if that impacts your downplaying supply chain attacks, something I actually was commenting on.
Ah, then forget what I said about “you” saying that. I didn’t check who had commented initially.
As for downplaying supply chain attacks, not at all. I consider them to be a massive problem and I’ve actively advocated for sandboxed build processes, having even spoken with rustc devs about the topic.
What I’m downplaying is the made up issue that a compiled binary is significantly different from source code for the threat of “malicious dependency”.
So not only do you not pay attention enough to see who said what, you knee-jerk responded without paying attention to what I did say. Maybe in another 10 years…
Because I can
curl $FOO > foo.sh; vi foo.sh
then can choose tochmod +x foo.sh; ./foo.sh
. I can’t do that with an arbitrary binary from the internet without whipping out Ghidra and hoping my RE skills are good enough to spot malicious code. I might also miss it in some downloaded Rust or shell code, but the chances are significantly lower than in the binary. Particularly when the attempt from people in the original issue thread to reproduce the binary failed, so no one knows what’s in it.No one, other than these widely publicised instances in NPM, as well as PyPi and Ruby, as pointed out in the original github issue. I guess each language community needs to rediscover basic security issues on their own, long live NIH.
Am I missing something? Both links involve malicious source files, not binaries.
I hadn’t dived into them, they were brought up in the original thread, and shipping binaries in those languages (other than python with wheels) is not really common (but would be equally problematic). But point taken, shouldn’t trust sources without verifying them (how meta).
But the question here is “Does a binary make a difference vs source code?” and if you’re saying “well history shows us that attackers like binaries more” and then history does not show that, you can see my issue right?
But what’s more, even if attackers did use binaries more, would we care? Maybe, but it depends on why. If it’s because binaries are so radically unauditable, and source code is so vigilitantly audited, ok sure. But I’m realllly doubtful that that would be the reason.
There’s some interesting meat to think about here in the context of package management, open source, burden on maintainers, varying interest groups.