The real scary thing is that it took users weeks to notice that it shipped, despite that it wasn’t obfuscated in any way. This shows how risky the ecosystem is, without enough eyes reviewing published crates. If any high profile crate author gets infected with malware that injects itself into crates, it’s going to be an apocalypse for Rust.
I think it’s only a sign that we’re unaware until this hits a sandboxed / reproducable build system. I guess that’s currently distribution packaging or projects that otherwise use Nix or Bazel to build.
If the complaint is that binaries are more difficult to audit than source, and no one is auditing, then it should make no difference either way from a security perspective.
I think “weeks” is a bit of an exaggeration. People were openly discussing it at least a week after release. It’s true though that it didn’t blow up on social media until weeks later and many people didn’t realise until then.
If it had been a security issue or it was done by someone much less reputable than the author of serde or if the author did not respond then I suspect rustsec may have been more motivated to post an advisory.
Something that I might have expected to see included in this comment, and that I instead will provide myself, is a plug for bothering to review the code in one’s (prospective) dependencies, or to import reviews from trusted other people (or, put differently, to limit oneself to dependencies that one is able and willing to review or that someone one trusts has reviewed).
I recall that kornel at least used to encourage the use of cargo-crev, and their Lib.rs now also shows reviews from the newer and more streamlined cargo-vet.
I note that the change adding the blob to Serde was reviewed and approved through cargo-vet by someone at Mozilla. I don’t think that necessarily means these reviewing measures would not be useful in a situation that isn’t as much a drill (i.e., with a blob more likely to be malicious).
Yeah - my recollection of crev is that libraries like serde often got reviews like “it’s serde, might as well be the stdlib, I trust this without reviewing it as the chances of it being malicious are basically zero”
What a ridiculous thing to have even happened in the first place, let alone refusing to acknowledge there could possibly be an issue for so long. I glad it’s been fixed but would make me think twice about using serde. I’m sure it’ll be fine, who’s ever heard of a security issue in a codec anyway?
Remember that there are real human being maintaining serde. It is not, in fact, blindingly obvious to all developers that the pre-compiled blobs were bad; on this site there were loud voices on both sides. Can you imagine suddenly getting caught in the crosshairs of angry developers like that? When I imagine it, it feels bad, and I’m liable to get defensive about it.
It may also have been a failed attempt at fixing something you’ve heard people complain about all the time, probably even about your code that slows down peoples builds (*). So yeah it was a bad idea in hindsight, but we don’t need more burned out maintainers from this. And I say this as someone who is openly disappointed by this happening.
(*) I’m not going to discuss how much time it actually saved.
Yeah, basically the biggest gains are offset by process creation being surprisingly slow. I’m working on a follow-up article where I talk about that in detail.
I posted your piece because it was the first one that explained in detail what the hell was going on, specifically how serde works. Looking forward to a followup.
That’s how it started, then they centralized everything with one team that doles out the “managed CI” offering, with their own global library and controls. Any competing infra gets flagged and audited hardcore until you give up by attrition.
This seems to only be checking the performance under –release. Most compilation is done without –release, meaning that most of the proc macro will not be optimized.
As someone who packages software, I think it’s worth noting that packagers expect different things than end users, though they are compatible.
One of my wishes is to avoid blobs from a vendor, since we can’t always recompile those in the build process to work with the architectures we support.
(The other big difference is the DESTDIR env var. End users don’t generally care, but it becomes essential when preparing a package)
I therefore understand those who support their end users, before getting packaged.
The real human being maintaining serde knew about the pushback that would happen and did it on purpose to prove a point in a pre-RFC he submitted. I don’t feel particularly bad about him getting pushback for using half the Rust ecosystem as his guinea pigs. (In fact I would like to see more of it.)
The real human being maintaining serde knew about the pushback that would happen and did it on purpose to prove a point in a pre-RFC he submitted.
What’s the reason to believe in this over any other explanation of the situation? E.g. that pushback was unexpected and that the RFC is the result of the pushback, rather than a cause?
I consider dtolnay a competent open source maintainer who understands the people who run his code well, and I would expect any competent open source maintainer to expect such pushback.
But how that necessary leads to “on purpose to prove a point”?
I don’t think dtolnay expected exactly zero pushback. But, given that some people in this thread argue quite a reasonable point that binaries are actually almost as fine as source, it is plausible that only bounded pushback was expected.
“Someone else is always auditing the code and will save me from anything bad in a macro before it would ever run on my machines.” (At one point serde_derive ran an untrusted binary for over 4 weeks across 12 releases before almost anyone became aware. This was plain-as-day code in the crate root; I am confident that professionally obfuscated malicious code would be undetected for years.)
I don’t see someone competent casually pushing such a controversial change, casually saying that this is now the only supported way to use serde, casually pushing a complete long pre-RFC that uses the controversial change to advance it, and then casually reverting the change in the span of a few days. That takes preparation and foresight.
I actually respect this move. It is exactly the kind of move I would do if I had goodwill to burn and was frustrated with the usual formal process, and it takes boldness and courage to pull it off the way he did it. I also think the pushback is entirely appropriate and the degree of it was quite mild.
Aha, thanks! I think that’s a coherent story to infer from this evidence (and I was wondering if there might be some missing bits I don’t know).
From where I stand, I wouldn’t say that this explanation looks completely implausible, but I do find it unlikely.
For me, the salient bits are:
what it says on the tin. dtolnay didn’t write a lot of responses in the discussion, but what they have written is more or less what I have expected to see from a superb maintainer acting in good faith.
there wasn’t any previous Wasm macro work that was stalled, and that required nefarious plans to get it unstuck.
really, literally everyone wants sandboxed Wasm proc macros. There can’t be any more support for this feature already. What is lacking is not motivation or support but (somewhat surprisingly) a written-down RFC for how to move forward and (expectedly) implementation effort to make it come true.
dtolnay likes doing crazy things! Like how all crates follow 1.0.x versions, or watt, or deref-based specialization in anyhow. So, “because I can” seems like enough motivation here.
“if you don’t like a feature in this crate, don’t use the crate or do the groundwork to make implementing this feature better” feels like a normal mode of operation for widely used OSS projects with sole maintainers. I’ve said as much with respect to MSRV of my once_cell crate.
I agree that there are multiple intepretations possible and that yours also follows from the evidence available. The reason I think it’s reasonable to consider something deeper to be going on is: every single Rust controversy I’ve discussed with key Rust people had a lot more going on than was there on the surface. Case in point: dtolnay was also the one thus far unnamed by anyone speaking for the project person who was involved in ThePHD’s talk being downgraded from a keynote. If I see someone acting surreptitiously in one case I will expect that to repeat.
O_o that’s news to me, thanks. It didn’t occur that dtopnay might have been involved there (IIRC, they aren’t a team lead of any top-level team, so I assume weren’t a member of the notorious leadership chat)
Calling me anonymous is pretty funny, considering I’ve called myself “whitequark” for close to 15 years at this point and shipped several world-class projects under it.
whitequark would be pretty well known to an old Rust team member such as matklad, having been one themself, so no, not anonymous… buut we don’t know this is the same whitequark, so yes, still anonymous.
I mean, I wrote both Rust language servers/IDEs that everyone is using and whitequark wrote the Ruby parser everyone is using (and also smaltcp). I think we know perfectly fine who we are talking with. One us might be secretly a Labrador in a trench coat, but that doesn’t have any bearing on the discussion, and speculation on that topic is hugely distasteful.
In terms of Rust team membership, I actually don’t know which team whitequark was on, but they are definitely on the alumni page right now. I was on the cargo team and TL for the IDE team.
You were talking about “Rust teams” and the only way I’ve seen that term used is to indicate those under the “Rust Project”. Neither person is on a Rust team or an alumni.
Tbh, it more reeks of desperation to make people’s badly configured CI flows faster. I think that a conspiratorial angle hasn’t been earned yet for this and that we should go for the most likely option: it was merely a desperate attempt to make unoptimized builds faster.
I think this is hard to justify when someone comes to you with a security issue, when your response is “fork it, not my problem”, and then closing the issue, completely dismissing the legitimate report. I understand humans are maintaining it, humans maintain all software I use in fact, and I’m not ok with deciding “Oh, a human was involved, I guess we should let security bad practices slide”. I, and I’m sure many others, are not frustrated because they didn’t understand the security implications, but because they were summarily dismissed and rejected, when they had dire implications for all their users. From my understanding, Serde is a) extremely popular in the Rust world, and b) deals in one of the most notoriously difficult kinds of code to secure, so seeing the developers’ reaction to a security issue is very worrying for the community as a whole.
The thing is, its not unambiguous whether this is a security issue. “Shipping precompiled binaries is not significantly more insecure than shipping source code” is an absolutely reasonable stance to have. I even think it is true if we consider only first-order effects and the current state of rust packaging&auditing.
Note also that concerns were not “completely dismissed”. Dismissal looks like “this is not a problem”. What was said was rather “fixing this problem is out of scope for the library, if you want to see it fixed, work on the underlying infrastructure”. Reflecting on my own behavior in this discussion, I might be overly sensitive here, but to me there’s a world of difference between a dismissal, and an acknowledgment with disagreement on priorities.
let alone refusing to acknowledge there could possibly be an issue for so long.
This is perhaps a reasonable take-away from all the internet discussions about the topic, but I don’t think this actually reflects what did happen.
The maintainer was responsive on the issue and they very clearly articulated that:
they are aware that the change makes it harder to build software for some users
they are aware that the change concerns some users from the security point of view
non-the-less, the change is an explicit design decision for the library
the way to solve this typical open-source dilemma is to allocate the work with the party needs the fruits of the work
Afterwards, when it became obvious that the security concern is not niche, but have big implications for the whole ecosystem, the change was reverted and a lot of follow-up work landed.
I do think it was a mistake to not predict that this change will be this controversial (or to proceed with controversial change without preliminary checks with wider community).
But, given that a mistake had been made, the handling of the situation was exemplary. Everything that needed fixing was fixed, promptly.
Afterwards, when it became obvious that the security concern is not niche, but have big implications for the whole ecosystem, the change was reverted and a lot of follow-up work landed.
I’m still waiting to hear what “security concern” there was here. Other language-package ecosystems have been shipping precompiled binaries in packages for years now; why is it such an apocalyptically awful thing in Rust and only Rust?
The main thing is loss of auditing ability — with the opaque binaries, you can not just look at the package tarbal from crates.io and read the source. It is debatable how important that is: in practice, as this very story demonstrates, few people look at the tarballs. OTOH, “can you look at tarballs” is an ecosystem-wide property — if we lose it, we won’t be able to put the toothpaste back into the tube.
This is amplified by the fact that this is build time code — people are in general happier with sandbox the final application, then with sandboxing the sprawling build infra.
With respect to other languages — of course! But also note how other languages are memory unsafe for decades…
The main thing is loss of auditing ability — with the opaque binaries, you can not just look at the package tarbal from crates.io and read the source.
It’s not that hard to verify the provenance of a binary. And it appears that for some time after serde switched to shipping the precompiled macros, exactly zero people actually were auditing it (based on how long it took for complaints to be registered about it).
OTOH, “can you look at tarballs” is an ecosystem-wide property — if we lose it, we won’t be able to put the toothpaste back into the tube.
The ecosystem having what boils down to a social preference for source-only does not imply that binary distributions are automatically/inherently a security issue.
With respect to other languages — of course! But also note how other languages are memory unsafe for decades…
My go-to example of a language that often ships precompiled binaries in packages is Python. Which is not exactly what I think of when I think “memory unsafe for decades”.
It’s not that hard to verify the provenance of a binary.
Verifying provenance and auditing source are orthogonal. If you have trusted provenance, you can skip auditing the source. If you audited the source, you don’t care about the provenance.
It’s a question which one is more practically important, but to weight this tradeoff, you need to acknowledge its existence.
This sounds like:
People say that they claim about auditing, and probably some people are, but it’s also clear that majority don’t actually audit source code. So they benefits of audits are vastly overstated, and we need to care about provenance and trusted publishing.
This doesn’t sound like:
There’s absolutely ZERO security benefits here whatsoever
I don’t know where your last two blockquotes came from, but they didn’t come from my comment that you were replying to, and I won’t waste my time arguing with words that have been put in my mouth by force.
That’s how I read your reply: as an absolute refusal to acknowledge that source auditing is a thing, rather than as a nuanced comparison of auditing in theory vs auditing in practice.
It might not have been your intention to communicate that, but that was my take away from what’s actually written.
In the original github thread, someone went to great lengths to try to reproduce the shipped binary, and just couldn’t do it. So it is very reasonable to assume that either they had something in their build that differed from the environment used to build it, or that he binary was malicious, and without much deeper investigation, it’s nearly impossible to tell which is the answer. If it was trivial to reproduce to build with source code you could audit yourself, then there’s far less of a problem.
Rust doesn’t really do reproducible builds, though, so I’m not sure why people expected to be able to byte-for-byte reproduce this.
Also, other language-package ecosystems really have solved this problem – in the Python world, for example, PyPI supports a verifiable chain all the way from your source repo to the uploaded artifact. You don’t need byte-for-byte reproducibility when you have that.
I guesss I should clarify that in GP comment the problem is misalignment between maintainer’s and user’s view of the issue. This is a problem irrespective of ground truth value of security.
Maybe other language package ecosystems are also wrong to be distributing binaries, and have security concerns that are not being addressed because people in those ecosystems are not making as much of a fuss about it.
If there were some easy way to exploit the mere use of precompiled binaries, someone would have by now. The incentives to use such an exploit are just way too high not to.
Anecdotally, I almost always see Python malware packaged as source code. I think that could change at any time to compiled binaries fwiw, just a note.
I don’t think attackers choosing binary payloads would mean anything for anyone really. The fundamental problem isn’t solved by reproducible builds - those only help if someone is auditing the code.
The fundamental problem is that your package manager has near-arbitrary rights on your computer, and dev laptops tend to be very privileged at companies. I can likely go from ‘malicious build script’ to ‘production access’ in a few hours (if I’m being slow and sneaky) - that’s insane. Why does a build script have access to my ssh key files? To my various tokens? To my ~/.aws/ folder? Insane. There’s zero reason for those privileges to be handed out like that.
The real solution here is to minimize impact. I’m all for reproducible builds because I think they’re neat and whatever, sure, people can pretend that auditing is practical if that’s how they want to spend their time. But really the fundamental concept of “running arbitrary code as your user” is just broken, we should fix that ASAP.
The fundamental problem is that your package manager has near-arbitrary rights on your computer
Like I’ve pointed out to a couple people, this is actually a huge advantage for Python’s “binary” (.whl) package format, because its install process consists solely of unpacking the archive and moving files to their destinations. It’s the “source” format that can ship a setup.py running arbitrary code at install time. So telling pip to exclusively install from .whl (with --only-binary :all:) is generally a big security win for Python deployments.
(and I put “binary” in scare quotes because, for people who aren’t familiar with it, a Python .whl package isn’t required to contain compiled binaries; it’s just that the .whl format is the one that allows shipping those, as well as shipping ordinary Python source code files)
Anecdotally, I almost always see Python malware packaged as source code. I think that could change at any time to compiled binaries fwiw, just a note.
Agree. But that’s a different threat, it has nothing to do with altered binaries.
I don’t think attackers choosing binary payloads would mean anything for anyone really. The fundamental problem isn’t solved by reproducible builds - those only help if someone is auditing the code.
Code auditing is worthless if you’re not sure the binary you’re running on your machine has been produced from the source code you’ve audited. This source <=> binary mapping is precisely where source bootstrapping + reproducible builds are helping.
The real solution here is to minimize impact. I’m all for reproducible builds because I think they’re neat and whatever, sure, people can pretend that auditing is practical if that’s how they want to spend their time. But really the fundamental concept of “running arbitrary code as your user” is just broken, we should fix that ASAP.
This is a false dichotomy. I think we agree on the fact we want code audit + binary reproducibility + proper sandboxing.
Agree. But that’s a different threat, it has nothing to do with altered binaries.
Well, we disagree, because I think they’re identical in virtually every way.
Code auditing is worthless if you’re not sure the binary you’re running on your machine has been produced from the source code you’ve audited. This source <=> binary mapping is precisely where source bootstrapping + reproducible builds are helping.
I’m highly skeptical of the value behind code auditing to begin with, so anything that relies on auditing to have value is already something I’m side eyeing hard tbh.
I think we agree on the fact we want code audit + binary reproducibility + proper sandboxing.
I think where we disagree on the weights. I barely care about binary reproducibility, I frankly don’t think code auditing is practical, and I think sandboxing is by far the most important, cost effective measure to improve security and directly address the issues.
I am familiar with the concept of reproducible builds. Also, as far as I’m aware, Rust’s current tooling is incapable of producing reproducible binaries.
And in theory there are many attack vectors that might be present in any form of software distribution, whether source or binary.
What I’m looking for here is someone who will step up and identify a specific security vulnerability that they believe actually existed in serde when it was shipping precompiled macros, but that did not exist when it was shipping those same macros in source form. “Someone could compromise the maintainer or the project infrastructure”, for example, doesn’t qualify there, because both source and binary distributions can be affected by such a compromise.
Aren’t there links in the original github issue to exactly this being done in the NPM and some other ecosystem? Yes this is a security problem, and yes it has been exploited in the real world.
What I’m looking for here is someone who will step up and identify a specific security vulnerability that they believe actually existed in serde when it was shipping precompiled macros, but that did not exist when it was shipping those same macros in source form. “Someone could compromise the maintainer or the project infrastructure”, for example, doesn’t qualify there, because both source and binary distributions can be affected by such a compromise.
If you have proof of an actual concrete vulnerability in serde of that nature, I invite you to show it.
The existence of an actual exploit is not necessary to be able to tell that something is a serious security concern. It’s like laying an AR-15 in the middle of the street and claiming there’s nothing wrong with it because no one has picked it up and shot someone with it. This is the opposite of a risk assessment, this is intentionally choosing to ignore clear risks.
There might even be an argument to make that someone doing this has broken the law by accessing a computer system they don’t own without permission, since no one had any idea that this was even happening. To me this is up with with Sony’s rootkit back in the day, completely unexpected, unauthorised behaviour that no reasonable person would expect, nor would they look out for it because it is just such an unreasonable thing to do to your users.
I think the point is that if precompiled macros are an AR-15 laying in the street, then source macros are an AR-15 with a clip next to it. It doesn’t make sense to raise the alarm about one but not the other.
There might even be an argument to make that someone doing this has broken the law by accessing a computer system they don’t own without permission, since no one had any idea that this was even happening.
I think this is extreme. No additional accessing of any kind was done. Binaries don’t have additional abilities that build.rs does not have. It’s not at all comparable to installing a rootkit. The precompiled macros did the same thing that the source macros did.
The existence of an actual exploit is not necessary to be able to tell that something is a serious security concern. It’s like laying an AR-15 in the middle of the street and claiming there’s nothing wrong with it because no one has picked it up and shot someone with it. This is the opposite of a risk assessment, this is intentionally choosing to ignore clear risks.
Once again, other language package ecosystems routinely ship precompiled binaries. Why have those languages not suffered the extreme consequences you seem to believe inevitably follow from shipping binaries?
There might even be an argument to make that someone doing this has broken the law by accessing a computer system they don’t own without permission, since no one had any idea that this was even happening.
Even the most extreme prosecutors in the US never dreamed of taking laws like CFAA this far.
To me this is up with with Sony’s rootkit back in the day, completely unexpected, unauthorised behaviour that no reasonable person would expect, nor would they look out for it because it is just such an unreasonable thing to do to your users.
I think you should take a step back and consider what you’re actually advocating for here. For one thing, you’ve just invalidated the “without any warranty” part of every open-source software license, because you’re declaring that you expect and intend to legally enforce a rule on the author that the software will function in certain ways and not in others. And you’re also opening the door to even more, because it’s not that big a logical or legal leap from liability for a technical choice you dislike to liability for, say, an accidental bug.
The author of serde didn’t take over your computer, or try to. All that happened was serde started shipping a precompiled form of something you were going to compile anyway, much as other language package managers already do and have done for years. You seem to strongly dislike that, but dislike does not make something a security vulnerability and certainly does not make it a literal crime.
I think that what actually is happening in other language ecosystems is that while there are precompiled binaries sihpped along some installation methods, for other installation methods those are happening by source.
So you still have binary distribution for people who want that, and you have the source distribution for others.
I have not confirmed this but I believe that this might be the case for Python packages hosted on debian repos, for example. Packages on PyPI tend to have source distributions along with compiled ones, and the debian repos go and build packages themselves based off of their stuff rather than relying on the package developers’ compiled output.
When I release a Python library, I provide the source and a binary. A linux package repo maintainer could build the source code rather than using my built binary. If they do that, then the thing they “need to trust” is the source code, and less trust is needed on myself (on top of extra benefits like source code access allowing them to fix things for their distribution mechanisms)
So you still have binary distribution for people who want that, and you have the source distribution for others.
I don’t know of anyone who actually wants the sdists from PyPI. Repackagers don’t go to PyPI, they go to the actual source repository. And a variety of people, including both me and a Python core developer, strongly recommend always invoking pip with the --only-binary :all: flag to force use of .whl packages, which have several benefits:
When combined with --require-hashes and --no-deps, you get as close to perfectly byte-for-byte reproducible installs as is possible with the standard Python packaging toolchain.
You will never accidentally compile, or try to compile, something at runtime.
You will never run any type of install-time scripting, since a .whl has no scripting hooks (as opposed to an sdist, which can run arbitrary code at install time via its setup.py).
I mean there are plenty of packages with actual native dependencies who don’t ship every permutation of platform/Python version wheel needed, and there the source distribution is available. Though I think that happens less and less since the number of big packages with native dependencies is relatively limited.
But the underlying point is that with an option of compiling everything “from source” available as an official thing from the project, downstream distributors do not have to do things like, say, confirm that the project’s vendored compiled binary is in fact compiled from the source being pointed at.
Install-time scripting is less of an issue in this thought process (after all, import-time scripting is a thing that can totally happen!). It should feel a bit obvious that a bunch of source files is easier to look through to figure out issues rather than “oh this part is provided by this pre-built binary”, at least it does to me.
I’m not arguing against binary distributions, just think that if you have only the binary distribution suddenly it’s a lot harder to answer a lot of questions.
But the underlying point is that with an option of compiling everything “from source” available as an official thing from the project, downstream distributors do not have to do things like, say, confirm that the project’s vendored compiled binary is in fact compiled from the source being pointed at.
As far as I’m aware, it was possible to build serde “from source” as a repackager. It did not produce a binary byte-for-byte identical to the one being shipped first-party, but as I understand it producing a byte-for-byte identical binary is not something Rust’s current tooling would have supported anyway. In other words, the only sense in which “binary only” was true was for installing from crates.io.
So any arguments predicated on “you have only the binary distribution” don’t hold up.
Hmm, I felt like I read repackagers specifically say that the binary was a problem (I think it was more the fact that standard tooling didn’t allow for both worlds to exist). But this is all a bit moot anyways
I don’t know of anyone who actually wants the sdists from PyPI.
It’s a useful fallback when there are no precompiled binaries available for your specific OS/Arch/Python version combination. For example when pip installing from a ARM Mac there are still cases where precompiled binaries are not available, there were a lot more closer to the M1 release.
When I say I don’t know of anyone who wants the sdist, read as “I don’t know anyone who, if a wheel were available for their target platform, would then proceed to explicitly choose an sdist over that wheel”.
Also, not for nothing, most of the discussion has just been assuming that “binary blob = inherent automatic security vulnerability” without really describing just what the alleged vulnerability is. When one person asserts existence of a thing (such as a security vulnerability) and another person doubts that existence, the burden of proof is on the person asserting existence, but it’s also perfectly valid for the doubter to point to prominent examples of use of binary blobs which have not been exploited despite widespread deployment and use, as evidence in favor of “not an inherent automatic security vulnerability”
Yeah, this dynamic has been infuriating. In what threat model is downloading source code from the internet and executing it different from downloading compiled code from the internet and executing it? The threat is the “from the internet” part, which you can address by:
Hash-pinning the artifacts, or
Copying them to a local repository (and locking down internet access).
Anyone with concerns about this serde change should already be doing one or both of these things, which also happen to make builds faster and more reliable (convenient!).
Yeah, hashed/pinned dependency trees have been around forever in other languages, along with tooling to automate their creation and maintenance. It doesn’t matter at that point whether the artifact is a precompiled binary, because you know it’s the artifact you expected to get (and have hopefully pre-vetted).
Downloading source code from the internet gives you the possibility to audit it, downloading a binary makes this nearly impossible without whipping out a disassembler and hoping that if it is malicious, they haven’t done anything to obfuscate that in the compiled binary. There is a “these languages are turing complete, therefore they are equivalent” argument to be made, but I’d rather read Rust than assembly to understand behaviour.
The point is that if there were some easy way to exploit the mere use of precompiled binaries, the wide use of precompiled binaries in other languages would have been widely exploited already. Therefore it is much less likely that the mere presence of a precompiled binary in a package is inherently a security vulnerability.
Afterwards, when it became obvious that the security concern is not niche, but have big implications for the whole ecosystem, the change was reverted and a lot of follow-up work landed.
I’m confused about this point. Is anyone going to fix crates.io so this can’t happen again?
Assuming that this is a security problem (which I’m not interested in arguing about), it seems like the vulnerability is in the packaging infrastructure, and serde just happened to exploit that vulnerability for a benign purpose. It doesn’t go away just because serde decides to stop exploiting it.
I don’t think it’s an easy problem to fix: ultimately, package registry is just a storage for files, and you can’t control what users put there.
There’s an issue open about sanitizing permission bits of the downloaded files (which feels like a good thing to do irrespective of security), but that’s going to be a minor speed bump at most, as you can always just copy the file over with the executable bit.
A proper fix here would be fully sandboxed builds, but:
POSIX doesn’t have “make a sandbox” API, so implementing isolation in a nice way is hard.
There’s a bunch of implementation work needed to allow use-cases that currently escape the sandbox (wasm proc macros and metabuild).
I’m sure it’ll be fine, who’s ever heard of a security issue in a codec anyway?
Who’s ever heard of a security issue caused by a precompiled binary shipping in a dependency? Like, maybe it’s happened a few times? I can think of one incident where a binary was doing analytics, not outright malware, but that’s it.
I’m confused at the idea that if we narrow the scope to “a precompiled binary dependency” we somehow invalidate the risk. Since apparently “curl $FOO > sh” is a perfectly cromulent way to install things these days among some communities, in my world (30+ year infosec wonk) we really don’t get to split hairs over ‘binary v. source’ or even ‘target v. dependency’.
I’m not sure I get your point. You brought up codec vulns, which are irrelevant to the binary vs source discussion. I brought that back to the actual threat, which is an attack that requires a precompiled binary vs source code. I’ve only seen (in my admittedly only 10 Years of infosec work) such an attack one time, and it was hardly an attack and instead just shady monetization.
This is the first comment I’ve made in this thread, so I didn’t bring up codecs. Sorry if that impacts your downplaying supply chain attacks, something I actually was commenting on.
Ah, then forget what I said about “you” saying that. I didn’t check who had commented initially.
As for downplaying supply chain attacks, not at all. I consider them to be a massive problem and I’ve actively advocated for sandboxed build processes, having even spoken with rustc devs about the topic.
What I’m downplaying is the made up issue that a compiled binary is significantly different from source code for the threat of “malicious dependency”.
So not only do you not pay attention enough to see who said what, you knee-jerk responded without paying attention to what I did say. Maybe in another 10 years…
Because I can curl $FOO > foo.sh; vi foo.sh then can choose to chmod +x foo.sh; ./foo.sh. I can’t do that with an arbitrary binary from the internet without whipping out Ghidra and hoping my RE skills are good enough to spot malicious code. I might also miss it in some downloaded Rust or shell code, but the chances are significantly lower than in the binary. Particularly when the attempt from people in the original issue thread to reproduce the binary failed, so no one knows what’s in it.
No one, other than these widely publicisedinstances in NPM, as well as PyPi and Ruby, as pointed out in the original github issue. I guess each language community needs to rediscover basic security issues on their own, long live NIH.
I hadn’t dived into them, they were brought up in the original thread, and shipping binaries in those languages (other than python with wheels) is not really common (but would be equally problematic). But point taken, shouldn’t trust sources without verifying them (how meta).
But the question here is “Does a binary make a difference vs source code?” and if you’re saying “well history shows us that attackers like binaries more” and then history does not show that, you can see my issue right?
But what’s more, even if attackers did use binaries more, would we care? Maybe, but it depends on why. If it’s because binaries are so radically unauditable, and source code is so vigilitantly audited, ok sure. But I’m realllly doubtful that that would be the reason.
Definitely my favourite of these. In particular, it’s developed by a team of privacy researchers, not by a commercial entity, so they both understand privacy and have no commercial incentive to avoid complying with their obligations.
The thing that made me switch to Firefox on Android was the Self-Destructing Cookies extension, which Firefox sadly broke in an upgrade a few years ago. There are a few reimplementations (for Firefox and other browsers) but they all miss what made the original great: it didn’t ask for permission up front, it provided an undo that always worked. When you moved away from a site, it moved all of the cookies to a saved location. When you visited the site again, the cookies were not exposed and the new cookies were there again. If you discovered something stopped working, you had an option to restore the deleted cookies and another option to never delete for that site. This meant that you could live in the default-delete world safely without worrying about data loss. If you realised after you’d closed a tab that some state was stored in cookies that you cared about (scores in a game, shopping basket contents, login details, whatever) then you had the ability to undelete it easily. If you went back without losing anything that you cared about, the cookies stayed deleted and you didn’t think about it.
I really wish browser vendors would just make that the default behaviour.
I wonder why it hasn’t been replicated yet. I think the webextension API is sufficient to replicate the destruction and undo-ing. The main hitch is that webextensions don’t receive an event for the browser quitting, and I can’t remember if webextension background scripts are guaranteed to run before any page’s network requests start flying, so that bit could be racy.
I use Consent-o-Matic on android with something called Kiwi browser, which is an old fork of Chrome(/ium?) with extension support.
I migrated to it after firefox removed most of the extensions I used on android due to an API change (was a couple years ago I think?) and been using it since.
Maybe this news means I’ll be able to get back. But Mozilla is always screwing up the UX of the android version, so I wouldn’t hold my breath…
interesting, I haven’t seen that issue & none have cited that/pushed back when I pointed out the redundancy. any chance you have an example handy? I’d love to better understand that case
from .submodule import Foo
from .othersubmodule import Bar
and the only purpose is to make it more easily importable from a flat namespace.
you already use noqa or something like that to disable “unused import” warnings.
in larger codebases, it can additioanlly make sense to turn on --no-implicit-reexport, because with a lot of people working on a codebase, code that imports objects from totally random and inappropriate places tends to creep in, especially when people use IDEs that auto-insert imports. we have this enabled at work.
in those situations it is necessary to add reexports to __all__ even though nobody uses star-imports
I fixed this issue in a library I maintain just the other week. People couldn’t use MyPy or Pyright on their code that used the library because the tools would complain that names weren’t defined, until I added them to __all__. It’s annoying redundancy, but necessary to satisfy type checkers.
Interestingly, you can satisfy the type checkers with a different kind of redundancy: instead of writing from .charm import ActionEvent, you say from .charm import ActionEvent as ActionEvent, and that tells the type checkers “they’re not just importing this name to use it here, they actually want to define it” and shuts them up. See “redundant symbol alias” in the Pyright docs.
However, in the end we went with __all__ anyway, because Sphinx, the API docs generator, didn’t like the “redundant symbol alias” technique. Specifically its autodoc extension doesn’t pick up any names from the __init__.py as it still doesn’t see those names as public.
Thankfully, when type checking your library, Pyright complains both if you import a name but forget to include it in __all__, and if you include a name in __all__ but forget to import it, so it’s hard for the two lists to get out of sync.
So yeah, I wish that what your original article was saying good advice, but in light of people using type checkers with your library (which you almost certainly want to support), it probably needs updating.
Also, pyflakes (and derivatives like flake8 and ruff) allow reexports if they appear in __all__:
❯ echo 'from bar import Baz' > foo.py
❯ ruff foo.py
foo.py:1:17: F401 [*] `bar.Baz` imported but unused
Found 1 error.
[*] 1 potentially fixable with the --fix option.
→ 1 ❯ printf '\n__all__ = ("Baz",)\n' >> foo.py
❯ ruff foo.py
This follows rather subtly from the notion of public names defined in the import reference:
The public names defined by a module are determined by checking the module’s namespace for a variable named __all__; if defined, it must be a sequence of strings which are names defined or imported by that module. The names given in __all__ are all considered public and are required to exist. […]
That’s the only reason we keep writing __all__ at work; disabling F401 for each “reexport” in a module/__init__.py file is possible too but not much better.
this is a fair point, I plan to update the article to mention it. Personally disabling the error is better IMO, for one it doesn’t violate DRY and is a one time fix instead of one requiring maintenance.
Every time Ruff comes up, the fact that it’s super-hyper-mega-fast is promoted as making up for the fact that it’s basically not extensible by a Python programmer (while the tools it wants to replace all easily are extensible in Python).
But the speed benefit only shows up when checking a huge number of files in a single run. And even on a large codebase:
In my editor, I only care about the linter being fast enough to lint the file I’m directly working in.
In pre-commit hooks, I only care about the linter being fast enough to lint the set of changed files.
In CI, on a codebase large enough that, say, flake8 would actually be taking significant time due to the number of files in a full-codebase lint, it’s overwhelmingly likely that the flake8 run still would effectively be noise compared to other things like the time to run a full test suite of that size.
So I’m still not sure why I should give up easy extensibility for speed that seems to offer me no actual practical benefit.
For these kinds of tools performance (both speed and memory usage) matters a lot, because codebases are effectively unbounded in size, and because for interactive use, latency budgets are pretty tight. There’s also Sorbet’s observation that performance unlocks new features. “Why would you whatchexec this on the whole code base? Because I now can”.
Now, if we speak strictly about syntax-based formatting and linting, you can get quite a bit of performance from the embarrassingly parallel nature of the task. But of course you want to do cross-file analysis, type inference, duplicate detection and what not.
The amount of things you can do with a good static analysis base is effectively unbounded. At this point, maybe Java and C# are coming to the point of saturation, but everything else feels like a decade behind. The primary three limiting factors to deliver these kinds of tools are:
performant architecture (you need smarts to avoid re-doing global analysis on every local change)
raw speed (with the right arch, things like parsing or hashing become bottlenecks)
the work to actually implement fancy features on top of fast base
This is high-investment, high-value thing, which requires a great foundation. And I would actually call that, rather than today’s raw performance, the most important feature of Ruff. We can start from fast linting, and then move to import analysis, type inference, full LSP and what not.
From my point of view, Python’s attempt to self-host all dev tools is a strategic blunder. Python really doesn’t have performance characteristics to move beyond per file listing, so it’s not surprising that, eg, pyright does its own thing rather than re-use existing ecosystem.
All that being said, extensibility is important! And Python is a fine language for that. Long term, I see Ruff exposing a Python scripting interface for this. If slow Python scripting sits on top of fast native core that does 90% o the CPU work, that should be fine!
For these kinds of tools performance (both speed and memory usage) matters a lot, because codebases are effectively unbounded in size, and because for interactive use, latency budgets are pretty tight.
Yet as I keep pointing out, my actual practical use cases for linting do not involve constantly re-running the linter over a million files in a tight loop – they involve linting the file I’m editing, linting the files in a changeset, etc. and the current Python linting ecosystem is more than fast enough for that case.
There’s also Sorbet’s observation that performance unlocks new features. “Why would you whatchexec this on the whole code base? Because I now can”.
But what’s the gain from doing that? Remember: the real question is why I should give up forever on being able to extend/customize the linter in exchange for all this speed. Even if the speed unlocks entirely new categories of use cases, it still is useless to me if I can’t then go implement those use cases because the tool became orders of magnitude less extensible/customizable as the cost of the speed.
Long term, I see Ruff exposing a Python scripting interface for this. If slow Python scripting sits on top of fast native core that does 90% o the CPU work, that should be fine!
I think the instant that interface is allowed, you’re going to find that the OMGFAST argument disappears, because there is no way a ruleset written in Python is going to maintain the speed that is the sole and only selling point of Ruff. But by then all the other tools will have been bullied out of existence, so I guess Ruff will just win by default at that point.
they involve linting the file I’m editing, linting the files in a changeset, etc.
Importantly, they also involve only context free linting. Something like “is this function unused in the project?” wouldn’t work in this paradigm. My point is not that you, personally, could benefit from extra speed for your current workflow. It’s rather than there are people who would benefit, and that there are more powerful workflows (eg, typechecking on every keypress) which would become possible
But what’s the gain from doing that?
At minimum, simplicity. I’d much rather just run $ foo than futz with git & xargs to figure out how to run it only on the changed files. Shaving off 10 seconds from the first CI check is also pretty valuable.
I think the instant that interface is allowed, you’re going to find that the OMGFAST argument disappears,
If you do this in the stupidest possible way, then, sure, it’ll probably be even slower than pure Python due to switching back and forth between Python and native. But it seems to me that that custom linting is amenable to proper slicing into CPU-heavy part and scripting on top:
start the API with specifying the pattern to match AST against. Pattern is constructed in Python, but the search is done by the native code. So Python is only involved at all for cases where there’s a syntactic match.
for further semantic lookups (what this identifier resolves to), Python calls into native, and again the heavy lifting is done by native
similarly, semantic lookups probably use some fancy memoization behind the scene, and that’s fully native.
often you want to do varios name based queries (like, all classes named “FooSomething”), and the internal of such text index can also be hidden from Python pretty well.
Importantly, they also involve only context free linting. Something like “is this function unused in the project?” wouldn’t work in this paradigm.
There are already flake8 plugins that detect that sort of thing.
At minimum, simplicity. I’d much rather just run $ foo than futz with git & xargs to figure out how to run it only on the changed files. Shaving off 10 seconds from the first CI check is also pretty valuable.
All the existing tools have a “copy/paste this into your pre-commit config” snippet and then it Just Works. If you are indeed rolling your own solution to run only on the changed files, then I think you should probably pause and familiarize yourself with the current state of the art prior to telling everyone else to abandon it.
Sorry if my comments read as if I am pushing anyone to use Ruff, that definitely wasn’t my intention! Rather, I wanted to share my experience as implementer of similar tools, as that might be an interesting perspective for some.
That being said, I think I want to register a formal prediction that, in five years or so, something of Ruff’s shape (Python code analysis as a cli implemented in a faster language, not necessary specifically Ruff, and not counting already existing PyCharm) would meaningfully eat into Python’s dev tool “market”.
I think Ruff will gain significant “market share”, but for the wrong reasons – not because of any technical superiority or improved user experience, but simply because its hype cycle means people will be pushed into adopting it whether they gain from it or not. I’m already dreading the day someone will inevitably file a “bug” against one of my projects claiming that it’s broken because it hasn’t adopted Ruff yet.
The “not extensible by a $lang programmer” was a reason for not pursuing faster tooling in better suited languages for the web ecosystem, and everything was painfully slow.
In my experience, esbuild (Go) and swc (Rust) are a massive improvement and will trade extensibility for the speed boost every time.
I’ve been using Ruff’s flake8-annotations checks to get a very quick list of missing annotations as I port a codebase. In a watchexec loop it’s substantially faster than getting the same information from MyPy or Pyright.
Likewise, in another codebase ruff --fix has already replaced isort (and flake8 and friends).
I’ve never needed the extensibility, though. I’m curious, what do you do with it?
In a watchexec loop it’s substantially faster than getting the same information from MyPy or Pyright.
I’m not sure why you’d need to run it over the entire codebase in a loop, though. Isn’t that the kind of thing where you generate a report once, and then you only incrementally need to check a file or two at a time as you fix them up?
Likewise, in another codebase ruff –fix has already replaced isort
Again, I don’t get it: isort will fix up imports for you, and my editor is set to do it automatically on file save and if I somehow miss that I have a pre-commit hook running it too. So I’m never in a situation where I need to run it and apply fixes across thousands of files (or if I was, it’d be a one-time thing, not an every-edit thing). So why do I need to switch to another tool?
I’ve never needed the extensibility, though. I’m curious, what do you do with it?
There are lots of popular plugins. For example, pylint on a Django codebase is next to unusable without a plugin to “teach” pylint how some of Django’s metaprogramming works. As far as I can tell, Ruff does not have parity with that. Same for the extremely popular pytest testing framework; without a plugin, pylint gets very confused at some of the dependency-injection “magic” pytest does.
Even without bringing pylint into it, flake8 has a lot of popular plugins for both general purpose and specific library/framework cases, and Ruff has to implement all the rules from those plugins. Which is why it has to have a huge library of built-in rules and explicitly list which flake8 plugins it’s achieved parity with.
I’m not sure why you’d need to run it over the entire codebase in a loop, though. Isn’t that the kind of thing where you generate a report once, and then you only incrementally need to check a file or two at a time as you fix them up?
I like to work from a live list as autoformatting causes line numbers to shift around as annotations increase line length. Really I should set up ruff-lsp.
Again, I don’t get it: isort will fix up imports for you, and my editor is set to do it automatically on file save and if I somehow miss that I have a pre-commit hook running it too. So I’m never in a situation where I need to run it and apply fixes across thousands of files (or if I was, it’d be a one-time thing, not an every-edit thing). So why do I need to switch to another tool?
I don’t use pre-commit because it’s excruciatingly slow. These things are really noticeable to me — maybe you have a faster machine?
I don’t use pre-commit because it’s excruciatingly slow. These things are really noticeable to me — maybe you have a faster machine?
Can you quantify “excruciatingly slow”? Like, “n milliseconds to run when k files staged for commit” quantification?
Because I’ve personally never noticed it slowing me down. I work on codebases of various sizes, doing changesets of various sizes, on a few different laptops (all Macs, of varying vintages). Maybe it’s just that I zone out a bit while I’m mentally composing the commit message, but I’ve never found myself waiting for pre-commit to finish before being able to start typing the message (fwiw my workflow is in Emacs, using magit as the git interface and an Emacs buffer to draft and edit the commit message, so actually writing the message is always the last part of the process for me).
I gave it another try and it looks like it’s not so bad after the first time. The way it’s intermittently slow (anytime the checkers change) is frustrating, but probably tolerable given the benefits.
I think my impression of slowness came from Twisted where it is used to run the lint over all files. This is very slow.
The way it’s intermittently slow (anytime the checkers change) is frustrating
My experience is that the list of configured checks changes relatively rarely – I get the set of them that I want, and leave it except for the occasional version bump of a linter/formatter. But it’s also not really pre-commit’s fault that changing the set of checks is slow, because changing it involves, under the hood, doing a git clone and then pip install (from the cloned repo) of the new hook. How fast or slow that is depends on your network connection and the particular repo the hook lives in.
I’ve never needed the extensibility, though. I’m curious, what do you do with it?
Write bespoke lints for codebase specific usage issues.
Most of them should probably be semgrep rules, but semgrep is not on the CI, it’s no speed demon either, and last I checked it has pretty sharp edges where it’s pretty easy to create rules which don’t work it complex cases.
PyLint is a lot more work, but lints are pretty easy to test and while the API is I’ll documented it’s quite workable and works well once you’ve gotten it nailed down.
Ah, so you and ubernostrum are optimizing workflows on a (single?) (large?) codebase, and you’re after a Pylint, rather than a pyflakes/flake8.
I’m coming at this from an OSS-style many-small-repos perspective. I prefer a minimally-configurable tool so that the DX is aligned across repositories. I don’t install and configure many flake8 plugins because that increases per-repo maintenance burden (e.g., with flake8 alone W503/W504 debacle caused a bunch of churn as the style rules changed — thank goodness we now have Black!). Thus, I’m happy to drop additional tools like isort. So to me Ruff adds to the immediately-available capabilities without increasing overhead — seems like a great deal!
It seems like Ruff might slot into your workflow as a flake8 replacement, but you get a lot from Pylint, so I’d keep using the latter. You could disable all the style stuff and use Pylint in a slower loop like a type checker.
pylint is, in practice, very memory hungry and frankly slow.
Now i can’t go from there to recommending ruff for the simple fact that ruff is not checking nearly enough stuff to be considered a replacement IMO. Not yet at least. But I’ll be happy to see better stuff happening in this space (disclaimer: I’m writing a rust-based pylint drop-in replacement. Mostly for practice but also because I really suffered under pylint’s perf issues in a past life)
My admiration for ruff comes from the fact that I now have a single tool and a single configuration place. I don’t have to chase how to configure 10 different tools to do linting and ensuring that my python project has some guardrails. For example, my big annoyance with flake8 is that I can’t add it’s config in pyproject.toml, it has to be a separate file. I really, really, just want to flip the switch and have various checks done on the codebase, and not scour the internet on how to configure these tools, since each has it’s own (quite valid) interpretation of what’s the right way to do things. I just want to stay away from ever creating setup.py and all those other things I never understood why are needed to package some interpreted code (my dislike for python’s packaging is leaking here :)).
I’m curious, what do you need to change in the tools replaced by ruff? What additional checks do you need to implement?
I personally do not care about the config file thing, and I wish people would stop bullying the flake8 dude about it. Way too many people, back when pyproject.toml was introduced for a completely different purpose than this, still treated its existence as meaning “all prior config approaches are now illegal, harass everyone you can find until they give up or give in”. Which is what people have basically tried to do to flake8, and I respect the fact that the maintainer laid out clear criteria for a switch to pyproject.toml and then just aggressively locked and ignored every request that doesn’t meet those criteria.
I’m curious, what do you need to change in the tools replaced by ruff? What additional checks do you need to implement?
I already gave a reply to someone else talking about the whole ecosystem of plugins out there for flake8 and pylint, and Ruff is not at parity with them. So even if I wanted to switch to Ruff I would not be able to – it lacks the checks I rely on, and lacks the ability for me to go implement those checks.
I’ve been slowly but surely giving up on Python for some time, and I’ve often struggled to articulate the reasons why. But having just read some of the flake8 pyproject stuff, it’s hit me that most of it could be described as bullying at some level or other.
Python itself arguably bullies its users, with things like the async -> ensure_future change, sum’s special case for str because the devs don’t like it, blah blah. (I want to say something about the packaging situation here, and how much of a pain in the ass it is to maintain a Python project to popular opinion standards in 202x, but I recognise that not all of this is deliberate.) Black’s founding principle is that bludgeoning people into accepting a standard is better than wasting time letting them have preferences. Long ago, when I frequented #python, SOP whenever anyone wanted to use sockets was to bully them into using an external dependency instead. And longer ago, when I frequented python-ideas, ideas that the in-group didn’t take to were made, along with their champions, to run ridiculous gauntlets of nitpicking and whataboutism.
Of course none of the exponents of this behaviour identify it as bullying, but then who would? The results are indistinguishable whether they’re being a dick, evangelizing best practices or just trying to save everyone’s time.
In short I think that, if you don’t want to be bullied into adopting a solution that doesn’t really work for you, you are in the wrong ecosystem.
Some of us use pyflakes on its own, and are thus used to the zero-configuration experience. The configurability of pylint is a net negative for me; it leads to bikeshedding over linter configuration.
This is entirely reasonable. In my case, I started a new job and new project, and I’m not invested heavily in the existing python based toolchain, so ruff was the right choice for us. I don’t like the way these sorts of minor differences get amplified up into existential crises anyway. And no, I’m not new on the internet, just tired of it all.
A thousand times this. $CUSTOMER had wildly divergent coding styles in their repos, and the project to streamline it meant configuring these traditional small tools and their plugins to conform to how PyCharm did things because it’s quite opinionated. And popular among people who are tired of it all.
The tooling included darker, which is fine, though I personally do not like all of black’s choices.
Eventually the whole codebase was blackened all at once and ruff replaced everything else.
The pre-commit is fast and my preferences aside, good arguments can be made for those two tools.
It is what it is, a business decision, and the right way to deal with it is to not elevate it to an existential crisis.
Outside the business world, if I had popular projects, I’d dislike ending up with PRs in the black style if the project wasn’t like that. Or having to set up glue for reformatting.
This is probably how all monopolization happens; people become rightfully tired of being ever-vigilant, and inevitably something bad will come out of the monopoly.
Like not getting OSS contributions because of the project’s formatting guidelines.
100% agree. This reminds me of the old “You have a problem and decide to use regexes to solve it. Now you have two problems.” Yes, your linting is faster but now “it’s basically not extensible by a Python programmer”, which means it’s more difficult for people to maintain their own tools.
the price was definitely the main thing keeping me from using this, so that’s neat
it used to cost $$$$ for each reader node beyond the first 1 or 2.
and basically the whole point of datomic is reader scaling, that’s either going to be very expensive, or very useless
I’m not sure what the $$$$ cost was for us — the direct cost wasn’t the constraint — the real problem is that Datomic is so dang inefficient that you’re going to have to think about other costs, too.
Memory is the big issue. The transactor is very memory-intensive (JVM!) and if you need to use Datomic’s equivalent of stored procedures (you will) it’s going to need your whole working set in memory to be performant. You also pay that in-memory cost again for each reader, which leads to the usual pain of large heaps on the JVM.
Obviously storage also may be a problem depending on the write rate. Datomic is fundamentally about keeping all the history forever. While you can hackishly “excise” old facts it’s very expensive. I’d never use it in a domain where I might eventually have to deal with GDPR/right to be forgotten/CA privacy rights/etc.
These shortcomings are exacerbated by the slowness of the thing. Clojure and the JVM are such a weird choice for a database.
I can imagine wanting a system with Datomic’s architecture, if it were sufficiently performant, but the actual software as-implemented? No. Stuff it all in Postgres, which won’t blow up if you do full-text queries! Plus, it’ll tolerate blobs and documents (you probably have some of those).
Using unbufferred IO when sequentially parsing / serializing a big file line-by-line. This is the most funniest performance regression I’ve seen, since it is encountered very frequently.
A few years ago I traced Apache Cassandra while creating tables and discovered that it issues thousands of 1-byte writes. Someone forgot their BufferedWriter!
I can’t say that I’m a fan of the name “Oils for Unix” for the project as a whole. If you’re gonna rename “Oil shell” the language to YSH, I think it’s still a good idea to keep “Oil shell” as the name for the shell implementation that interprets both the OSH & YSH languages.
In terms of accuracy, there are more parts of the project than OSH and YSH.
When writing YSH / Oil, it becomes pretty apparent that language for DATA are just as important, if not more important, than the shell language (languages for code).
Summary: QSN is moving toward JSON strings, and we’ll also have formats for tables and records.
So those data languages are part of “Oils for Unix”.
There could be other things too. I had to solve the “dev env” problem for our repo (related: Nix, gitpod), and the solution ended up being a mini-distro :-/ ! Or really a bunch of tools to compile from source in containers. I’m not sure if that will be exposed to users, but it’s possible.
mycpp and ASDL (translators to C++) are also things that could be exposed to users in some form.
The shell ends up “leaking” into a distributed operating system project pretty easily :) It’s a language of processes and files, and related tools for working with them.
So I wanted to leave room for other things under the “Oils for Unix” name, not just “Oil Shell”.
In terms of the connotation:
“Oil” reminds people of the energy commodity with the big bad industry behind it. “Oil Shell” further seems to remind people of the company “Shell Oil”. My brain doesn’t work that way, but it’s come up a surprising number of times, over a long period.
Many lobste.rs users are probably past the name by now, and think of it as a shell, but new people are encountering the project every day! There are maybe 10 K lobste.rs readers; there are probably at least 10 M shell users. Shell was the 8th most used language on Github last year, and the 6th fastest growing.
If you Google “Oils for Unix”, it ALREADY turns up our site as the first result. In contrast, “Oil shell” is still polluted by “Shell Oil” after so many years.
YSH is similarly googleable (surprisingly!)
There is a single oils-for-unix binary, which you do NOT type, hence the long name. And there are 2 symlinks, like busybox:
The “Oil Shell” / OSH / Oil scheme has the problem that OSH would naturally stand for “Oil Shell”. So it’s emphasizing the old part over the new part.
In “Oils for Unix”, OSH is just opaque like YSH. (A user suggested “old shell” and “young shell” :) )
I’m actually open to any new suggestions, but I think it will be very difficult to find a name that is different, not taken, accurate, but people already “accept” as the same as “Oil Shell”. (I changed my twitter and Mastodon handle to oilsforunix and nobody noticed.) I don’t want a completely new name like “Zoo” or something.
(edited for clarity, I can see why the original was confusing)
I want to express that I mean this in the nicest way possible. I have tried for years to understand Oil, and have read (possible?) a million words on the subject, and I find that most of the time reading your posts leave me with less understanding than I started with.
The above post also fits this pattern.
I am trying, and completely respect your transparency and willingness to write 1,000 word responses to things, but something just doesn’t click about it all. I wish I had more actionable feedback other than: have you thought about asking someone else to write a succinct overview/survey of where the project sits towards its goals?
To answer this last part succinctly, I’m definitely open to another voice writing about it
There is a lot of dense material on Zulip to summarize, for better or worse, but it’s all there!
Also, you’re probably right that I’ve been “talking past” a lot of people with too many words, which does not lead to a good experience. That’s probably because I’m answering many messages at once, so I’ll take that as good feedback
Hm, some confusion is understandable, because certain things have changed over the years
To be short: we’re changing the name to make the two parts to the project clear:
OSH (compatible shell)
YSH (new shell)
There are also other tools that may fall under the “Oils for Unix” project
The old naming was confusing because people thought “OSH” ? Isn’t that “Oil Shell”? No there’s also another part called “Oil”
It also had a bad connotation for some people.
If it doesn’t click, that’s OK for now … Right now I’m looking for people to test OSH, and to contribute ideas to YSH, which may or may not work out.
A common disconnect is that probably 90% of shell users don’t use the shell as a programming language. I didn’t for the first 10 years I used shell. So a lot of the stuff I write isn’t relevant from that viewpoint.
Some people might also not see the relevance of the writing about grammars and regular languages and so forth. That’s OK, but my response is that the person who developed the first shell is the same person who brought regular languages to computing (Ken Thompson). So part of the project’s philosophy is to really go back to first principles. I think it will show up from the user’s POV, but not everyone will agree.
Hm, some confusion is understandable, because certain things have changed over the years
Of course! That’s why I’ve read millions (potentially?) of words!
Even this post is confusing.
OSH (compatible shell)
YSH (new shell)
Based on previous understanding, I assume “compatible” means POSIX shell compatible. Cool. But, I thought the whole idea of Oil was that it was a new Shell language that would always be lowerable to POSIX shell? So, now there’s a second shell, and that raises questions. Why are there two shells?
The old naming was confusing because people thought “OSH” ? Isn’t that “Oil Shell”? No there’s also another part called Oil
Yes, it’s incredibly confusing. What is Oil? I thought it was a shell. It’s not a shell. It’s two shells, and then a bunch of other things.
The homepage now has a more succinct definition (good!):
Oil is a new Unix shell. It runs your existing shell scripts, and it’s a new language for Python and JavaScript users who avoid shell!
It’s our upgrade path from bash to a better language and runtime.
This is literally soo much more valuable than every other blog post you’ve written about Oil, in my opinion. And, I am still confused by it!
What am I supposed to do with this? I know from it that you’re writing a new shell that is backwards compatible. And I’m enticed to believe that as a new language it’s POSIX shell compatible AND more expressive like Python or JavaScript. But, actually, it’s two different shells, and I have no idea how they work together to leave me in a better position than if I were to have just used POSIX shell, or suffered through writing “shell” code in Python / JavaScript. And, that’s understandable as it’s literally 2 sentences! It’s not enough text to describe everything. But if 2 sentences are already confusing to me, imagine the confusion that might ensue from 100 sentences!
Sorry. I’m trying here. I really am. I’m just consistently failing to understand this project, and I truly do not have this problem with any other project. Even abstract ones. I have imagination, and a whole mess of off the wall ideas of my own. I’m good at understanding and reasoning about abstraction.
Specifically, a bunch of shopt shell options is technically the only thing that differ in OSH and YSH. (You can think of this like from __future__ import in Python – it’s a gradual upgrade when there are minor breaking changes.)
which is equivalent to bin/osh with shopt --set ysh:all – it’s the same binary, with different symlinks!
However, the details may change, so it’s not necessarily stable. YSH isn’t stable either. So I haven’t overly emphasized this point, because some of it’s still aspirational.
However I think it is very interesting, and worth writing about, because 5 years ago most people basically thought it was impossible. It’s definitely not impossible now, because it runs and works.
So I talk about it as two shells, but really there’s blurry line between them that has to do with all the shell options. That’s the “upgrade path”.
I’ll take that as feedback on how I present it
Additionally I’d say it could be confusing because
It’s changed. The upgrade path is no longer based on automatic translation. I wrote some stuff many years ago that is now obsolete. That approach didn’t work technically.
The idea of having 2 shells in one could be inherently confusing. OSH and YSH/Oil are really the same interpreter. BUT if you change a bunch of options, I claim it’s a new language, and that’s surprising.
This whole page is written from the “new” perspective, AND the code parses and runs:
Really there is a hidden “old crappy language” in the background, but I claim it’s all hidden. Some people REALLY want this clean slate perspective, stripped of all legacy.
The project is weird because there are so many perspectives – some people don’t know shell at all, but want a better one. They want something like Python or JavaScript
Other people want “POSIX forever”. We just saw that on lobste.rs the other day
“Oils for Unix” is for both groups, and everyone in between. Just like people who write stuff like Cosmopolitan libc, BSD kernels, Google C++, or Boost C++ – wildly different projects with different dialects – all use GCC and Clang.
BTW Clang handles C, C++, and Objective C all with the same parser. So in that respect it’s very similar. There are C users using Clang that do not know Objective C – I’d say most of them. And there are Objective C users who don’t know C++, etc. Not to mention all the C users who don’t know C++.
So it’s not necessary to understand all of the project in order for it to be useful in some practical way. That is why I talk about “OSH” and “YSH” separately – I’m really talking to two different audiences with different expectations, desires, backgrounds, etc.
I wrote about the change in automatic translation on the blog at some point, and I linked the “OSH versus Oil” page, but I’m not at all surprised that people missed it and are confused :) Ironically, the early readers may be more confused than the people who just started reading, because things have changed.
So yeah the OSH vs. Oil naming is confusing, and the renaming will probably cause a bit more confusion. But I think it will be better in the long run.
I’m going to update the home page sentence as well :) I’m not looking forward to all the churn and renaming, but I think the end state will be much more clear. Again I wouldn’t have renamed it if I didn’t think it was confusing, and if I didn’t NOTICE that people were confused online!
I tried to put all the important info up front in this blog post: OSH is getting done, but it will take a lot of effort to polish it.
YSH is promising, and people have tried it and written code in it, but we still need more help.
There is an upgrade path, but people need to try it and provide feedback on it! It’s possible that nobody really wants to upgrade their shell scripts. Maybe they’re going to rewrite them in Python :)
I know I do though! We have lots of shell scripts that are begging to be upgraded.
I will take this all into account when writing the next blog post :) If you’re still confused let me know
Feedback: I mentioned in my previous response that the succinct 2 sentence thing on the homepage was good! Your reply here is literally 2 pages long.
Thank you for writing it. It did clear up some things! I found most of the other parts to be unnecessary, but did appreciate some of the extra things to think about.
–
Responding more directly to an earlier point you made:
If it doesn’t click, that’s OK for now … Right now I’m looking for people to test OSH, and to contribute ideas to YSH, which may or may not work out.
I think your hypothesis that 90% of people who interact with shell don’t actual program it, seems accurate, but I don’t believe that’s the reason you haven’t found a wider variety of users to test it out and contribute. I think (and btw, I’ve gotten at least 1 private message on Lobsters thanking me for engaging with you like this, and others in another private chat), the problem is that your communication style makes it inaccessible to a large chunk of potential users! Some of those users are probably actually stuck in shell-hell, too!
Shell is a wild place, and so often the right tool for the job. I truly believe a better language for it would be amazing. This is exactly why I’ve read (potentially?) millions of words trying to get to an understanding of Oil.
Yes, thank you apg for bringing this up. I too think Oil is a neat project, but I am utterly lost by what it is trying to be.
the problem is that your communication style makes it inaccessible to a large chunk of potential users
I really wonder if @andyc is too close to the project now. Another engineer, or even better a product manager, explaining things might help improve the signal to noise ratio.
I could have just dropped the link, but I thought I would provide some additional context and analogies
It’s OK if it didn’t help. I do believe there are very concrete benefits now, which I explained in the post, and that there is a path to do something that people thought was impossible 5 years ago
But I fully get that right now it’s not useful yet to many people, and may not be interesting if they’re not using shell in a certain way.
But I fully get that right now it’s not useful yet to many people,
To be perfectly clear. I am not saying that osh or ysh, is not useful. My comment was directed solely on the extra text in your response that went on tangents and clouded the helpful first part of the response.
I know oil isn’t “ready” yet.
Also, thank you for being so open to feedback here. I know it’s sometimes hard to read. It probably would have been better privately, rather than open in the comments as it happened; I acknowledge that and apologize for not seeing it sooner.
i didn’t think of the “shell oil” problem, so good call on moving away from “oil shell”. but “oils for unix” is a seriously clunky name. (that said, i’ll admit i can’t think of too many oil-based names that would not have google result pollution. “snake oil” is tempting due to the python heritage, but no way you’ll get a good search experience with that.)
It’s meant to be a bit long, so it can be available in many global namespaces, like oils-for-unix.org (which I bought),
twitter/oilsforunix etc. Surprisingly “oilshell” has been taken on Twitter for years, for some junk account
It’s also important that you don’t type it! You type the symlinks osh or ysh.
I suggested “oils” as the name for the binary, but then people wanted to type “oils”. So I think the long name is also good in that respect
I definitely agree avoiding the collision with Shell Oil is worth a rename, if for no other reason than that you’ll never out-SEO a major multinational. Is the Unix trademark a potential risk here?
Why have a third oils-for-unix binary? Couldn’t ysh symlink osh or vice-versa? If nobody is supposed to type it why is it possible for typing it to do something?
I don’t think that the name of the tarball is of great consequence. The “Oils for POSIX” project releasing oils-1.2.3.tar.gz makes perfect sense to me.
Where my brain goes after hearing “Oils for Unix/POSIX” is “this is a collection of Oils”, so does this mean OSH, YSH, and QSN are each an Oil? This just seems clunky.
The “Oils Project” or “Oils Collection” doesn’t lead my brain down that path at least.
Also, I want to add that existing languages/projects already have this problem, and you could look to how they name for reference. Rust is a programming language maintained by the Rust Team with a reference compiler rustc (for “Rust compiler”). Nix is a programming language maintained by the NixOS contributors that supports the Nixpkgs package distribution and the NixOS operating system distribution. (I should note that the naming of Nix projects is somewhat notorious; see https://www.haskellforall.com/2022/08/stop-calling-everything-nix.html. Also Nix has a (tiny) standard library, Nixpkgs is/has a standard library, NixOS has a little internal library also called lib, the same as Nixpkgs’s, it’s bad.) Bash is a programming language (descendent of POSIX shell) and a reference interpreter Bash. Same with most shells: Zsh, Dash, mksh, Fish.
Here’s an idea for Oil naming: The project distributes the “Oil Collection” (oil-collection for the multi-call executable), a runtime/interpreter/tool collection for multiple languages/things: the language/shell OSH, the language/shell YSH, and the (data) language QSN. This is no longer just a Unix thing, which is good if someone decides port the Oil Collection to e.g. Redox. OSH is POSIX shell & Bash compatible, YSH isn’t, and they’re related in the Oil Collection implementation but that’s not important right now. QSN is a handy, small data language that’s easy to use from/with OSH and YSH. I don’t have to wonder what “an Oil” is.
I always liked the Oil Shell name & Shell Oil pun. :) Also, I’m not really attached to “Oil Collection”, it’s just an initial idea for a canonical multi-call executable name.
Around that time I looked into forking html5lib due to the lack of maintenance (they aren’t great about merging PRs) and slow performance. My thought was to type annotate it enough to run mypyc on it. However, after triaging all the open issues and digging into the implementation I don’t think it’s really worth salvaging, for a number of reasons:
It’s surprisingly incomplete, lacking support for not-exactly-new stuff like <wbr> and <ol reversed> (and there are many more omissions in the sanitizer)
The parsing is done character-by-character and with a substantial amount of indirection — there’s no clear way to improve its performance without a radical re-architecture
There is a ton of internal layering to support generating different tree representations (ElementTree, DOM) that adds a ton of complexity and weird asymmetry between what the parser produces and what the serializer consumes (you can’t stream the parser’s tokens into the serializer directly; you must go through a tree builder even if you know the input is well-formed because the tokens are different)
The serializer tokens are dictly-typed, so pointlessly slow on modern Pythons
It doesn’t pass its own test suite
It feels like the victory of the project was html5lib-tests, which were used to build html5ever, not so much the actual software product.
I’ll probably end up porting my HTML processing code to Rust so I can use html5ever directly.
It’s kinda fun to see the inverse of the “mysterious network delay” genre of evergreen network debugging post (usual solution: set TCP_NODELAY!).
However, it would be nice if the author had done some more investigation before concluding that TCP_NODELAY is at fault. After all, setting TCP_NODELAY is pretty common for HTTP clients — e.g., curl, Python’s http.client and urllib3.
It seems more likely that git-lfs doesn’t buffer properly. After some code inspection I noticed:
Introducing the magic OS-level buffering of Nagle’s algorithm won’t fix tiny filesystem reads, nor undersized buffers. The argument against TCP_NODELAY by default seems specious when making bulk transfers fast always requires looking through the full stack.
Given the leap to conclusion and inflammatory title I think this is more a rant about Golang than technical networking content.
This is silly. ‘A message queue’ doesn’t equate to kafka. That a gigantic leap assumption.
More importantly: Does postgres offer any queue functionality? Are they talking about just inserting and querying a large table? That cannot possibly be a better message queue than any system that properly implements a queue with O(1) pushes and pops.
That cannot possibly be a better message queue than any system that properly implements a queue with O(1) pushes and pops.
They say that given their requirements (SLOs), it costs less to use their existing postgres database than adding new infrastucture (e.g. kafka) for this.
On my first reading of this headline I was wondering which cookies Avast had acquired, and how that is even technically possible to do.
It’s the “I Don’t Care About Cookies” extension that has acquired by Avast. Though it sounds like mostly an acquihire, as he talks about working on other products for them.
FWIW, I’ve never used this extension because I feel like it is dangerous to go randomly accepting T&Cs (which is what the cookies popups actually are) in a browser window where that might get linked to one of my accounts. It’s not uncommon for the wording to say not only do you accept cookies, but you fully agree with the privacy policy.
If it’s a site I’m going to come back to, I am definitely making sure I click on the correct button for “no I don’t agree to your random and otherwise not-enforceable nonsense that I don’t have time to read”.
I now tend to open most untrusted websites (such as links from orange or blue websites) in incognito mode, click on the most obvious “go away” link and close the window later, safe in the knowledge that I didn’t really agree to anything binding. I’d be reasonably happy with an extension set to incognito mode only, to save that click, but I’m pretty sure only the other way around is currently possible.
I’ve never used this extension because I feel like it is dangerous to go randomly accepting T&Cs (which is what the cookies popups actually are) in a browser window where that might get linked to one of my accounts.
Plus like … why accept the cookies if you can decline them? The whole premise is nonsensical.
Yes, that seems much better designed, since it lets you set your preferences in simple categories and then applies those choices everywhere - which is how compliance with the law should have been implemented in the first place.
Thanks for the suggestion, I’ve installed it to try it out. The only thing I can’t see is a way to override these preferences for a specific site if needed.
I block the consent banners & popups where I can with uBlock origin. Get out of my way, I don’t want you to solicit me.
I then use cookie autodelete to delete cookies for a site after I close its tabs. This is a bit like telling the browser to block cookies and localstorage completely, but websites that (pretend to) break still keep working.
This isn’t perfect. Youtube still has ways of tracking and remembering me (at least according to the suggested videos) but of course deleting cookies does make it forget that I turn off autoplay. Quite an interesting perspective on their priorities and methods.
The point of things like consent-o-magic isn’t to prevent tracking, it’s to prevent dark UI patterns from getting user consent before tracking. The goal is to ensure that companies like Google and Facebook are definitely in violation of the letter of the law, not just doing things that their users don’t understand and would hate if they did, so that information commissioners can collect the evidence that they need to impose the 5% of annual turnover fines that the GDPR permits.
Yeah, this is the way to go — uBlock Origin + EasyList Cookie blocks the obnoxious dialogs, and Cookie Auto-Delete cleans up the mess. Sadly the latter isn’t available on Android, though.
When using uBlock Origin + EasyList Cookie, I am often left with a website with a backdrop and not allowing scrolling. This can be fixed with the inspector tool, but I am wondering if I am missing something.
I had that issue only once, on a site that was completely broken with any ad blocker. I expect the answer is yes, but do you have uBlock’s cosmetic filters enabled?
TLDR: Akka is now source available, Alex is mad. Insofar as I can find a moral argument it’s that this represents a bait and switch. Alex also goes over the practical reasons devs like open source, which most significantly is avoiding bureaucratic processes.
I think this is right - any sustainable alternative to open source has to preserve that property. (My hunch is that a collecting society type model based on turnover is the way to do this). Alex mentions commoditization of ones complements. This points to open source being supported by businesses that specifically create it as a complement to their business; I think that’s fine but can only support a certain slice of software.
Agree. Data storage has had a reasonable path for monetizing open source by making it REALLY EASY to pay money, e.g., a cloud service that maintains my MongoDB cluster for me. That cuts a lot of corners for enterprises. If you want your open source project to churn out cash, make it easy to spend money on. There’s lots of ways to do that. Forceful license changes ignore what the customer wants, which is a recipe for a failing business.
I think what they’re aiming towards is something like what people like to do with physical things:
Buy them once
change, sell, share or modify as you want
distribute blueprints of their reverse engineering to maybe replace parts of it, or build something compatible
Problem is that this is software, which has no physical, single-quantity and changes a lot, so you would have to re-buy it (or pay monthly fees), to always have an “up to date purchase”. And the physical-object industry also tries to restrict such use cases (personal use only, no modifications, subscription based add-ons..)
Ultimately this is the GraalVM debate all over again: Do I want to settle my product runtime and core on something that may just go away at any point? Everyone that had to interact with the company-wide legacy ERP system, which they never managed to replace, knows the fear.
What I mean is that multiple open source projects would band together to basically commonly issue commercial licenses, and if a company buys in a licence the revenue gets allocated between projects. The more projects get under a given umbrella, the easier it is for companies to adopt them.
One project could be with multiple societies.
In terms of pricing I was thinking at the most basic level something linked to turn over and then something less convenient for companies that want to try to save money.
No, it’s very different. For a start tidelift doesn’t provide a better license. Tidelift is more like a consulting and support shop that kicks some money back to the free software projects.
Thanks for the summary! It’s exciting that Akka is doing this. With luck it’ll kill the whole misbegotten mess. The actor model on the JVM never made any sense.
Strongly agree with this — the snap, which I first encountered in 22.04 LTS — is in no way LTS quality.
The issue I encountered immediately on upgrade is that activating the save dialog by pressing Ctrl-S has completely botched focus management. The focus moves to the dialog so quickly that the key up event isn’t received on the original window, which pops a fresh dialog as soon as the first is closed. The only way to recover is to kill the whole app.
If packaging Firefox is so difficult Canonical should make a .deb that dumps it in /opt/firefox and be done with it.
If the goal is sandboxing, you don’t have to introduce a bunch of layered filesystem stuff to do it. Didn’t we already have AppArmor for this? Add portals to that or whatever. Why did we need a new artifact format and build toolchain?
Both Chromium and Firefox do their own sandboxing internally — why are they then the primary targets for snapification? Their security posture is waaay better than most apps that ship in the Ubuntu desktop. Sandbox Eye of Gnome like the thumbnailers!
Like, it’s great that I can install Discord as a Snap — I like having that sandboxed. But aggressively applying this tech to the most complicated and rapidly changing apps shipped on an Ubuntu system (web browsers) seems frightfully optimistic. You’re inevitably going to hit tons of long tail bugs and niche features like we’re seeing.
I have used the c920 on a mac for years, and it has always been overexposed. I’m not sure whether it’s Logitech or Apple or both to blame here. The solution for me is to install the app “Webcam Settings” from the Apple store (yeah it’s a generic name), which lets you tweak many settings on webcams and save profiles for them. It’s not perfect, but I already have the camera and it’s significantly easier to work with than hooking my DSLR up.
The equivalent to “Webcam Settings” on Linux is guvcview. I have a Microsoft LifeCam Studio and have to use this tool to adjust the exposure when I plug it into a new machine. Thereafter it persists… somehow.
Or qv4l2, depending on your taste — but one advantage of qv4l2 is that it lets you set the controls even while another app has the camera open, whereas guvcview wants the camera for its own preview window, and will decline to work at all if it can’t get the video stream.
update: someone anon-emailed me out of the blue to mention that guvcview has a -z or --control-panel option that will open the control panel without the preview window, letting you do the same thing as qv4l2. So use the one that makes you happy.
Congrats, you are working around a hardware problem with a software patch.
Me, I don’t care enough to spend the effort to get the software working. My audio input is an analog mixer, my audio output the same, and eventually my camera will be a DSLR because that way I don’t twiddle with software for something that really should just work on all my machines without me caring.
It’s a driver settings tool, not a patch. It doesn’t do post-processing. Every OS just fails to provide this tool, not sure why, possibly because webcam support is spotty and they don’t want to deal with user complaints. Some software (like Teams) include an interface for the settings. Changing it in Teams will make system wide changes. Others (like Zoom) only have post-processing effects, and these are applied after the changes you made in Teams.
I can confirm this tool definitely affects the camera hardware’s exposure setting. I’ve used it for adjusting a camera that was pointed at a screen on a remote system I needed to debug. The surrounding room was dark (yay timezones!) so with automatic exposure settings it was just an overexposed white blur on a dark background. This tool fixed it. There’s no way this would have been possible with just post-processing.
(No, VNC or similar would not have helped, as it was an incompatibility specific to the connected display, so I needed to see the physical output. And by “remote” I mean about 9000km away.)
The DSLR/mirrorless ILC (interchangeable lens camera) route is great for quality but it has its risks. I started off with a $200 entry level kit and now I’ve got two bodies, a dozen lenses, 40,000 pictures, and a creatively fulfilling hobby.
I fail to see how you’re going to use a DSLR as a webcam without “twiddling with software”. Sure, you’ll have a much better sensor, lens and resulting image quality. But I’ve yet to see a setup (at least with my Canon) that doesn’t require multiple pieces of software to make even work as a webcam. Perhaps other brands have a smoother experience. I still question how this won’t require at least as much software as my route.
There’s also the physical footprint that matters to me. A webcam sits out of the way on top of my monitor with a single cable that plugs into the USB on the monitor. A DSLR is obviously nowhere near this simple in wiring or physical space. It also has a pretty decent pair of microphones that work perfectly for my quiet home office.
Are either the audio or video studio quality? Nope, but that’s completely fine for my use case interacting with some coworkers on video calls.
Oil has all of these options under one group, oil:basic, so you don’t have to remember all of them. They’re on by default in bin/oil, or you can opt in with shopt --set oil:basic in bin/osh.
Also I think this title is a troll – it should be more like “Pitfalls of Shell” or something like that. Which are pretty well known by now
The answer is to fix shell, not write articles on the Internet telling people not to use it. They use it because it solves certain problems more effectively than other tools
Yeah this post omits the only piece of advice that would make it practical, which is pointing to another programming language that they consider better suited for the job. I’ve written code to launch and tend to processes in a lot of languages and they have all been as error prone as the shell. I don’t think people who bash on shells understand just how complex correct process handling is.
If you wouldn’t mind taking the opportunity to shill, how would you go about convincing somebody to switch to oil shell from bash, assuming they’re willing to ignore the lack of wide-spread deployment of oil? What’s your sales pitch?
That is, you have 3K lines of bash code, AND you want to switch to something else.
Well Oil is basically your only option! It’s the most bash compatible shell by a mile.
There are downsides like Oil needs to be faster, but if you actually have that much shell, it’s worth it start running your scripts under Oil right now.
Most of them amount to better tools and error messages. Oil is like a ShellCheck but at runtime. ShellCheck can’t catch certain things like bad set -e usage patterns because some of them can only be detected at runtime – or you would have a false positive on every line. (I should do a blog post about this.)
I also put some notes about “the killer use case” here:
i.e. I started running my CI in containers, and I think many people do. Oil not being installed isn’t a big issue there because you have to install everything into a container :)
Although we probably need a bootstrap script, e.g. like rustup, if your distro doesn’t have it. (Many do, but not all.)
I’d guess itamarst really meant the title (subject to the caveats in the article), but also that he wasn’t talking about alternate shells like Oil, as they are really a different matter. Nobody writes “don’t use fish” articles, and Oil is in the same boat — it isn’t available by default, waiting to blow your hand off, so there’s no need to warn folks away from it.
Any language that has been designed rather than duct-taped together over decades is going to avoid shell’s (bash/dash/ash/POSIX sh’s) faults. Please continue doing this! When /bin/oil is part of a stock Debian install we can start telling people to put #!/bin/env/oil at the top instead, but until then I think it’s sensible to post these warnings periodically since OSHA is unlikely to step in.
Strawberry on the Python backend. It generates GraphQL schema from type annotations.
Apollo Client on the Typescript side. It can generate Typescript stubs for GraphQL queries based on the schema.
I work on a project that gets type safety through the front-end this way. The downside is that GraphQL is JSON so it’s less efficient than something like Protobuf, Thrift, etc.
Thanks, I’d wondered about GraphQL. I’d been leaning toward a trpc with Prisma on the backend, so I wasn’t really considering GraphQL. But I’ll give it another look.
Thanks for the tip about Strawberry, too. TBH I’ve been pretty surprised that I haven’t found any active Python libraries that generate JSON Schema or Protobuf schemas from Python type annotations. Maybe I’ve just missed it, but I’ve done a fair amount of searching at this point.
The real scary thing is that it took users weeks to notice that it shipped, despite that it wasn’t obfuscated in any way. This shows how risky the ecosystem is, without enough eyes reviewing published crates. If any high profile crate author gets infected with malware that injects itself into crates, it’s going to be an apocalypse for Rust.
Maybe this is also a sign that the complaints themselves were incoherent?
I think it’s only a sign that we’re unaware until this hits a sandboxed / reproducable build system. I guess that’s currently distribution packaging or projects that otherwise use Nix or Bazel to build.
But it highlights how little else is sandboxed.
Exactly, these complaints are incoherent unless you were already doing the things that would cause you to notice the change!
I’m not sure that “No one’s looking anyway, so it’s totally fine” is the right takeaway from this.
If the complaint is that binaries are more difficult to audit than source, and no one is auditing, then it should make no difference either way from a security perspective.
it is perfectly coherent to advocate for other people.
I think “weeks” is a bit of an exaggeration. People were openly discussing it at least a week after release. It’s true though that it didn’t blow up on social media until weeks later and many people didn’t realise until then.
If it had been a security issue or it was done by someone much less reputable than the author of serde or if the author did not respond then I suspect rustsec may have been more motivated to post an advisory.
Something that I might have expected to see included in this comment, and that I instead will provide myself, is a plug for bothering to review the code in one’s (prospective) dependencies, or to import reviews from trusted other people (or, put differently, to limit oneself to dependencies that one is able and willing to review or that someone one trusts has reviewed).
I recall that kornel at least used to encourage the use of
cargo-crev
, and their Lib.rs now also shows reviews from the newer and more streamlinedcargo-vet
.I note that the change adding the blob to Serde was reviewed and approved through
cargo-vet
by someone at Mozilla. I don’t think that necessarily means these reviewing measures would not be useful in a situation that isn’t as much a drill (i.e., with a blob more likely to be malicious).Yeah - my recollection of crev is that libraries like serde often got reviews like “it’s serde, might as well be the stdlib, I trust this without reviewing it as the chances of it being malicious are basically zero”
What a ridiculous thing to have even happened in the first place, let alone refusing to acknowledge there could possibly be an issue for so long. I glad it’s been fixed but would make me think twice about using serde. I’m sure it’ll be fine, who’s ever heard of a security issue in a codec anyway?
Remember that there are real human being maintaining
serde
. It is not, in fact, blindingly obvious to all developers that the pre-compiled blobs were bad; on this site there were loud voices on both sides. Can you imagine suddenly getting caught in the crosshairs of angry developers like that? When I imagine it, it feels bad, and I’m liable to get defensive about it.It may also have been a failed attempt at fixing something you’ve heard people complain about all the time, probably even about your code that slows down peoples builds (*). So yeah it was a bad idea in hindsight, but we don’t need more burned out maintainers from this. And I say this as someone who is openly disappointed by this happening.
(*) I’m not going to discuss how much time it actually saved.
This overview by @cadey implied it did not save much time at all, basically only if you were running a CI setup without a cache.
https://xeiaso.net/blog/serde-precompiled-stupid
Running a CI setup without a cache is, for better or for worse, very common
Yeah, basically the biggest gains are offset by process creation being surprisingly slow. I’m working on a follow-up article where I talk about that in detail.
I posted your piece because it was the first one that explained in detail what the hell was going on, specifically how serde works. Looking forward to a followup.
My workplace is way too big. This describes our CI setup. Only the blessed JVM gets to have cached CI builds.
This is how shadow IT begins. Has anyone started running/sharing their locally setup CI for your project yet?
That’s how it started, then they centralized everything with one team that doles out the “managed CI” offering, with their own global library and controls. Any competing infra gets flagged and audited hardcore until you give up by attrition.
This seems to only be checking the performance under –release. Most compilation is done without –release, meaning that most of the proc macro will not be optimized.
As someone who packages software, I think it’s worth noting that packagers expect different things than end users, though they are compatible.
One of my wishes is to avoid blobs from a vendor, since we can’t always recompile those in the build process to work with the architectures we support.
(The other big difference is the DESTDIR env var. End users don’t generally care, but it becomes essential when preparing a package)
I therefore understand those who support their end users, before getting packaged.
The real human being maintaining serde knew about the pushback that would happen and did it on purpose to prove a point in a pre-RFC he submitted. I don’t feel particularly bad about him getting pushback for using half the Rust ecosystem as his guinea pigs. (In fact I would like to see more of it.)
What’s the reason to believe in this over any other explanation of the situation? E.g. that pushback was unexpected and that the RFC is the result of the pushback, rather than a cause?
I consider dtolnay a competent open source maintainer who understands the people who run his code well, and I would expect any competent open source maintainer to expect such pushback.
But how that necessary leads to “on purpose to prove a point”?
I don’t think dtolnay expected exactly zero pushback. But, given that some people in this thread argue quite a reasonable point that binaries are actually almost as fine as source, it is plausible that only bounded pushback was expected.
The excerpt from the RFC is:
I don’t see someone competent casually pushing such a controversial change, casually saying that this is now the only supported way to use serde, casually pushing a complete long pre-RFC that uses the controversial change to advance it, and then casually reverting the change in the span of a few days. That takes preparation and foresight.
I actually respect this move. It is exactly the kind of move I would do if I had goodwill to burn and was frustrated with the usual formal process, and it takes boldness and courage to pull it off the way he did it. I also think the pushback is entirely appropriate and the degree of it was quite mild.
Aha, thanks! I think that’s a coherent story to infer from this evidence (and I was wondering if there might be some missing bits I don’t know).
From where I stand, I wouldn’t say that this explanation looks completely implausible, but I do find it unlikely.
For me, the salient bits are:
I agree that there are multiple intepretations possible and that yours also follows from the evidence available. The reason I think it’s reasonable to consider something deeper to be going on is: every single Rust controversy I’ve discussed with key Rust people had a lot more going on than was there on the surface. Case in point: dtolnay was also the one thus far unnamed by anyone speaking for the project person who was involved in ThePHD’s talk being downgraded from a keynote. If I see someone acting surreptitiously in one case I will expect that to repeat.
O_o that’s news to me, thanks. It didn’t occur that dtopnay might have been involved there (IIRC, they aren’t a team lead of any top-level team, so I assume weren’t a member of the notorious leadership chat)
Maybe take hearsay from an anonymous Internet catgirl with a grain of salt.
Calling me anonymous is pretty funny, considering I’ve called myself “whitequark” for close to 15 years at this point and shipped several world-class projects under it.
whitequark would be pretty well known to an old Rust team member such as matklad, having been one themself, so no, not anonymous… buut we don’t know this is the same whitequark, so yes, still anonymous.
Hm? Neither of them are Rust team members, unless they are represented under different names in the Project.
I mean, I wrote both Rust language servers/IDEs that everyone is using and whitequark wrote the Ruby parser everyone is using (and also smaltcp). I think we know perfectly fine who we are talking with. One us might be secretly a Labrador in a trench coat, but that doesn’t have any bearing on the discussion, and speculation on that topic is hugely distasteful.
In terms of Rust team membership, I actually don’t know which team whitequark was on, but they are definitely on the alumni page right now. I was on the cargo team and TL for the IDE team.
Thank you for all the context I was missing. Is it just oversight you aren’t on the alumni for those team pages?
Turns out there were at least two bugs in the teams repo with respect to me, thanks for pointing this out!
I’m glad my bickering had at least some positive outcome :)
Probably! I think https://github.com/rust-lang/team/commit/458c784dda91392b710d36661f440de40fdac316should have added me as one, not sure why that didn’t happen
I don’t know what you mean by “the Project”, but the source of truth for Rust team membership is https://github.com/rust-lang/team.
You were talking about “Rust teams” and the only way I’ve seen that term used is to indicate those under the “Rust Project”. Neither person is on a Rust team or an alumni.
https://www.rust-lang.org/governance
That is what I meant, yes. Those pages are generated from the Git repo I linked. Ctrl-F on https://www.rust-lang.org/governance/teams/compiler and https://www.rust-lang.org/governance/teams/alumni.
A find would tell you matklad was not on a team. He was “just” a contributor. No real data exists about whitequark.
Tbh, it more reeks of desperation to make people’s badly configured CI flows faster. I think that a conspiratorial angle hasn’t been earned yet for this and that we should go for the most likely option: it was merely a desperate attempt to make unoptimized builds faster.
I think this is hard to justify when someone comes to you with a security issue, when your response is “fork it, not my problem”, and then closing the issue, completely dismissing the legitimate report. I understand humans are maintaining it, humans maintain all software I use in fact, and I’m not ok with deciding “Oh, a human was involved, I guess we should let security bad practices slide”. I, and I’m sure many others, are not frustrated because they didn’t understand the security implications, but because they were summarily dismissed and rejected, when they had dire implications for all their users. From my understanding, Serde is a) extremely popular in the Rust world, and b) deals in one of the most notoriously difficult kinds of code to secure, so seeing the developers’ reaction to a security issue is very worrying for the community as a whole.
The thing is, its not unambiguous whether this is a security issue. “Shipping precompiled binaries is not significantly more insecure than shipping source code” is an absolutely reasonable stance to have. I even think it is true if we consider only first-order effects and the current state of rust packaging&auditing.
Note also that concerns were not “completely dismissed”. Dismissal looks like “this is not a problem”. What was said was rather “fixing this problem is out of scope for the library, if you want to see it fixed, work on the underlying infrastructure”. Reflecting on my own behavior in this discussion, I might be overly sensitive here, but to me there’s a world of difference between a dismissal, and an acknowledgment with disagreement on priorities.
This is perhaps a reasonable take-away from all the internet discussions about the topic, but I don’t think this actually reflects what did happen.
The maintainer was responsive on the issue and they very clearly articulated that:
Afterwards, when it became obvious that the security concern is not niche, but have big implications for the whole ecosystem, the change was reverted and a lot of follow-up work landed.
I do think it was a mistake to not predict that this change will be this controversial (or to proceed with controversial change without preliminary checks with wider community).
But, given that a mistake had been made, the handling of the situation was exemplary. Everything that needed fixing was fixed, promptly.
I’m still waiting to hear what “security concern” there was here. Other language-package ecosystems have been shipping precompiled binaries in packages for years now; why is it such an apocalyptically awful thing in Rust and only Rust?
The main thing is loss of auditing ability — with the opaque binaries, you can not just look at the package tarbal from crates.io and read the source. It is debatable how important that is: in practice, as this very story demonstrates, few people look at the tarballs. OTOH, “can you look at tarballs” is an ecosystem-wide property — if we lose it, we won’t be able to put the toothpaste back into the tube.
This is amplified by the fact that this is build time code — people are in general happier with sandbox the final application, then with sandboxing the sprawling build infra.
With respect to other languages — of course! But also note how other languages are memory unsafe for decades…
It’s not that hard to verify the provenance of a binary. And it appears that for some time after serde switched to shipping the precompiled macros, exactly zero people actually were auditing it (based on how long it took for complaints to be registered about it).
The ecosystem having what boils down to a social preference for source-only does not imply that binary distributions are automatically/inherently a security issue.
My go-to example of a language that often ships precompiled binaries in packages is Python. Which is not exactly what I think of when I think “memory unsafe for decades”.
Verifying provenance and auditing source are orthogonal. If you have trusted provenance, you can skip auditing the source. If you audited the source, you don’t care about the provenance.
It’s a question which one is more practically important, but to weight this tradeoff, you need to acknowledge its existence.
This sounds like:
This doesn’t sound like:
I don’t know where your last two blockquotes came from, but they didn’t come from my comment that you were replying to, and I won’t waste my time arguing with words that have been put in my mouth by force.
That’s how I read your reply: as an absolute refusal to acknowledge that source auditing is a thing, rather than as a nuanced comparison of auditing in theory vs auditing in practice.
It might not have been your intention to communicate that, but that was my take away from what’s actually written.
Once again, I don’t intend to waste my time arguing with someone who just puts words in my mouth.
In the original github thread, someone went to great lengths to try to reproduce the shipped binary, and just couldn’t do it. So it is very reasonable to assume that either they had something in their build that differed from the environment used to build it, or that he binary was malicious, and without much deeper investigation, it’s nearly impossible to tell which is the answer. If it was trivial to reproduce to build with source code you could audit yourself, then there’s far less of a problem.
Rust doesn’t really do reproducible builds, though, so I’m not sure why people expected to be able to byte-for-byte reproduce this.
Also, other language-package ecosystems really have solved this problem – in the Python world, for example, PyPI supports a verifiable chain all the way from your source repo to the uploaded artifact. You don’t need byte-for-byte reproducibility when you have that.
Ah yes, garbage collected languages are famously ‘memory unsafe for decades’
I guesss I should clarify that in GP comment the problem is misalignment between maintainer’s and user’s view of the issue. This is a problem irrespective of ground truth value of security.
Maybe other language package ecosystems are also wrong to be distributing binaries, and have security concerns that are not being addressed because people in those ecosystems are not making as much of a fuss about it.
If there were some easy way to exploit the mere use of precompiled binaries, someone would have by now. The incentives to use such an exploit are just way too high not to.
There are ways to exploit binary releases. It’s certainly not easy, but this has definitely been exploited in the wild.
You can read this page https://reproducible-builds.org/docs/buy-in/ to get a high-level history of the “reproducible build” (and bootstrapping) movement.
Anecdotally, I almost always see Python malware packaged as source code. I think that could change at any time to compiled binaries fwiw, just a note.
I don’t think attackers choosing binary payloads would mean anything for anyone really. The fundamental problem isn’t solved by reproducible builds - those only help if someone is auditing the code.
The fundamental problem is that your package manager has near-arbitrary rights on your computer, and dev laptops tend to be very privileged at companies. I can likely go from ‘malicious build script’ to ‘production access’ in a few hours (if I’m being slow and sneaky) - that’s insane. Why does a build script have access to my ssh key files? To my various tokens? To my ~/.aws/ folder? Insane. There’s zero reason for those privileges to be handed out like that.
The real solution here is to minimize impact. I’m all for reproducible builds because I think they’re neat and whatever, sure, people can pretend that auditing is practical if that’s how they want to spend their time. But really the fundamental concept of “running arbitrary code as your user” is just broken, we should fix that ASAP.
Like I’ve pointed out to a couple people, this is actually a huge advantage for Python’s “binary” (
.whl
) package format, because its install process consists solely of unpacking the archive and moving files to their destinations. It’s the “source” format that can ship asetup.py
running arbitrary code at install time. So tellingpip
to exclusively install from.whl
(with--only-binary :all:
) is generally a big security win for Python deployments.(and I put “binary” in scare quotes because, for people who aren’t familiar with it, a Python
.whl
package isn’t required to contain compiled binaries; it’s just that the.whl
format is the one that allows shipping those, as well as shipping ordinary Python source code files)Agree. But that’s a different threat, it has nothing to do with altered binaries.
Code auditing is worthless if you’re not sure the binary you’re running on your machine has been produced from the source code you’ve audited. This source <=> binary mapping is precisely where source bootstrapping + reproducible builds are helping.
This is a false dichotomy. I think we agree on the fact we want code audit + binary reproducibility + proper sandboxing.
Well, we disagree, because I think they’re identical in virtually every way.
I’m highly skeptical of the value behind code auditing to begin with, so anything that relies on auditing to have value is already something I’m side eyeing hard tbh.
I think where we disagree on the weights. I barely care about binary reproducibility, I frankly don’t think code auditing is practical, and I think sandboxing is by far the most important, cost effective measure to improve security and directly address the issues.
I am familiar with the concept of reproducible builds. Also, as far as I’m aware, Rust’s current tooling is incapable of producing reproducible binaries.
And in theory there are many attack vectors that might be present in any form of software distribution, whether source or binary.
What I’m looking for here is someone who will step up and identify a specific security vulnerability that they believe actually existed in
serde
when it was shipping precompiled macros, but that did not exist when it was shipping those same macros in source form. “Someone could compromise the maintainer or the project infrastructure”, for example, doesn’t qualify there, because both source and binary distributions can be affected by such a compromise.Aren’t there links in the original github issue to exactly this being done in the NPM and some other ecosystem? Yes this is a security problem, and yes it has been exploited in the real world.
I’m going to quote my other comment:
If you have proof of an actual concrete vulnerability in
serde
of that nature, I invite you to show it.The existence of an actual exploit is not necessary to be able to tell that something is a serious security concern. It’s like laying an AR-15 in the middle of the street and claiming there’s nothing wrong with it because no one has picked it up and shot someone with it. This is the opposite of a risk assessment, this is intentionally choosing to ignore clear risks.
There might even be an argument to make that someone doing this has broken the law by accessing a computer system they don’t own without permission, since no one had any idea that this was even happening. To me this is up with with Sony’s rootkit back in the day, completely unexpected, unauthorised behaviour that no reasonable person would expect, nor would they look out for it because it is just such an unreasonable thing to do to your users.
I think the point is that if precompiled macros are an AR-15 laying in the street, then source macros are an AR-15 with a clip next to it. It doesn’t make sense to raise the alarm about one but not the other.
I think this is extreme. No additional accessing of any kind was done. Binaries don’t have additional abilities that
build.rs
does not have. It’s not at all comparable to installing a rootkit. The precompiled macros did the same thing that the source macros did.Once again, other language package ecosystems routinely ship precompiled binaries. Why have those languages not suffered the extreme consequences you seem to believe inevitably follow from shipping binaries?
Even the most extreme prosecutors in the US never dreamed of taking laws like CFAA this far.
I think you should take a step back and consider what you’re actually advocating for here. For one thing, you’ve just invalidated the “without any warranty” part of every open-source software license, because you’re declaring that you expect and intend to legally enforce a rule on the author that the software will function in certain ways and not in others. And you’re also opening the door to even more, because it’s not that big a logical or legal leap from liability for a technical choice you dislike to liability for, say, an accidental bug.
The author of
serde
didn’t take over your computer, or try to. All that happened wasserde
started shipping a precompiled form of something you were going to compile anyway, much as other language package managers already do and have done for years. You seem to strongly dislike that, but dislike does not make something a security vulnerability and certainly does not make it a literal crime.I think that what actually is happening in other language ecosystems is that while there are precompiled binaries sihpped along some installation methods, for other installation methods those are happening by source.
So you still have binary distribution for people who want that, and you have the source distribution for others.
I have not confirmed this but I believe that this might be the case for Python packages hosted on debian repos, for example. Packages on PyPI tend to have source distributions along with compiled ones, and the debian repos go and build packages themselves based off of their stuff rather than relying on the package developers’ compiled output.
When I release a Python library, I provide the source and a binary. A linux package repo maintainer could build the source code rather than using my built binary. If they do that, then the thing they “need to trust” is the source code, and less trust is needed on myself (on top of extra benefits like source code access allowing them to fix things for their distribution mechanisms)
I don’t know of anyone who actually wants the sdists from PyPI. Repackagers don’t go to PyPI, they go to the actual source repository. And a variety of people, including both me and a Python core developer, strongly recommend always invoking
pip
with the--only-binary :all:
flag to force use of.whl
packages, which have several benefits:--require-hashes
and--no-deps
, you get as close to perfectly byte-for-byte reproducible installs as is possible with the standard Python packaging toolchain..whl
has no scripting hooks (as opposed to an sdist, which can run arbitrary code at install time via itssetup.py
).I misread that as “sadists from PyPi” and could not help but agree.
I mean there are plenty of packages with actual native dependencies who don’t ship every permutation of platform/Python version wheel needed, and there the source distribution is available. Though I think that happens less and less since the number of big packages with native dependencies is relatively limited.
But the underlying point is that with an option of compiling everything “from source” available as an official thing from the project, downstream distributors do not have to do things like, say, confirm that the project’s vendored compiled binary is in fact compiled from the source being pointed at.
Install-time scripting is less of an issue in this thought process (after all, import-time scripting is a thing that can totally happen!). It should feel a bit obvious that a bunch of source files is easier to look through to figure out issues rather than “oh this part is provided by this pre-built binary”, at least it does to me.
I’m not arguing against binary distributions, just think that if you have only the binary distribution suddenly it’s a lot harder to answer a lot of questions.
As far as I’m aware, it was possible to build
serde
“from source” as a repackager. It did not produce a binary byte-for-byte identical to the one being shipped first-party, but as I understand it producing a byte-for-byte identical binary is not something Rust’s current tooling would have supported anyway. In other words, the only sense in which “binary only” was true was for installing fromcrates.io
.So any arguments predicated on “you have only the binary distribution” don’t hold up.
Hmm, I felt like I read repackagers specifically say that the binary was a problem (I think it was more the fact that standard tooling didn’t allow for both worlds to exist). But this is all a bit moot anyways
It’s a useful fallback when there are no precompiled binaries available for your specific OS/Arch/Python version combination. For example when pip installing from a ARM Mac there are still cases where precompiled binaries are not available, there were a lot more closer to the M1 release.
When I say I don’t know of anyone who wants the sdist, read as “I don’t know anyone who, if a wheel were available for their target platform, would then proceed to explicitly choose an sdist over that wheel”.
Argumentum ad populum does not make the choice valid.
Also, not for nothing, most of the discussion has just been assuming that “binary blob = inherent automatic security vulnerability” without really describing just what the alleged vulnerability is. When one person asserts existence of a thing (such as a security vulnerability) and another person doubts that existence, the burden of proof is on the person asserting existence, but it’s also perfectly valid for the doubter to point to prominent examples of use of binary blobs which have not been exploited despite widespread deployment and use, as evidence in favor of “not an inherent automatic security vulnerability”
Yeah, this dynamic has been infuriating. In what threat model is downloading source code from the internet and executing it different from downloading compiled code from the internet and executing it? The threat is the “from the internet” part, which you can address by:
Anyone with concerns about this serde change should already be doing one or both of these things, which also happen to make builds faster and more reliable (convenient!).
Yeah, hashed/pinned dependency trees have been around forever in other languages, along with tooling to automate their creation and maintenance. It doesn’t matter at that point whether the artifact is a precompiled binary, because you know it’s the artifact you expected to get (and have hopefully pre-vetted).
Downloading source code from the internet gives you the possibility to audit it, downloading a binary makes this nearly impossible without whipping out a disassembler and hoping that if it is malicious, they haven’t done anything to obfuscate that in the compiled binary. There is a “these languages are turing complete, therefore they are equivalent” argument to be made, but I’d rather read Rust than assembly to understand behaviour.
The point is that if there were some easy way to exploit the mere use of precompiled binaries, the wide use of precompiled binaries in other languages would have been widely exploited already. Therefore it is much less likely that the mere presence of a precompiled binary in a package is inherently a security vulnerability.
I’m confused about this point. Is anyone going to fix crates.io so this can’t happen again?
Assuming that this is a security problem (which I’m not interested in arguing about), it seems like the vulnerability is in the packaging infrastructure, and serde just happened to exploit that vulnerability for a benign purpose. It doesn’t go away just because serde decides to stop exploiting it.
I don’t think it’s an easy problem to fix: ultimately, package registry is just a storage for files, and you can’t control what users put there.
There’s an issue open about sanitizing permission bits of the downloaded files (which feels like a good thing to do irrespective of security), but that’s going to be a minor speed bump at most, as you can always just copy the file over with the executable bit.
A proper fix here would be fully sandboxed builds, but:
Who’s ever heard of a security issue caused by a precompiled binary shipping in a dependency? Like, maybe it’s happened a few times? I can think of one incident where a binary was doing analytics, not outright malware, but that’s it.
I’m confused at the idea that if we narrow the scope to “a precompiled binary dependency” we somehow invalidate the risk. Since apparently “curl $FOO > sh” is a perfectly cromulent way to install things these days among some communities, in my world (30+ year infosec wonk) we really don’t get to split hairs over ‘binary v. source’ or even ‘target v. dependency’.
I’m not sure I get your point. You brought up codec vulns, which are irrelevant to the binary vs source discussion. I brought that back to the actual threat, which is an attack that requires a precompiled binary vs source code. I’ve only seen (in my admittedly only 10 Years of infosec work) such an attack one time, and it was hardly an attack and instead just shady monetization.
This is the first comment I’ve made in this thread, so I didn’t bring up codecs. Sorry if that impacts your downplaying supply chain attacks, something I actually was commenting on.
Ah, then forget what I said about “you” saying that. I didn’t check who had commented initially.
As for downplaying supply chain attacks, not at all. I consider them to be a massive problem and I’ve actively advocated for sandboxed build processes, having even spoken with rustc devs about the topic.
What I’m downplaying is the made up issue that a compiled binary is significantly different from source code for the threat of “malicious dependency”.
So not only do you not pay attention enough to see who said what, you knee-jerk responded without paying attention to what I did say. Maybe in another 10 years…
Because I can
curl $FOO > foo.sh; vi foo.sh
then can choose tochmod +x foo.sh; ./foo.sh
. I can’t do that with an arbitrary binary from the internet without whipping out Ghidra and hoping my RE skills are good enough to spot malicious code. I might also miss it in some downloaded Rust or shell code, but the chances are significantly lower than in the binary. Particularly when the attempt from people in the original issue thread to reproduce the binary failed, so no one knows what’s in it.No one, other than these widely publicised instances in NPM, as well as PyPi and Ruby, as pointed out in the original github issue. I guess each language community needs to rediscover basic security issues on their own, long live NIH.
Am I missing something? Both links involve malicious source files, not binaries.
I hadn’t dived into them, they were brought up in the original thread, and shipping binaries in those languages (other than python with wheels) is not really common (but would be equally problematic). But point taken, shouldn’t trust sources without verifying them (how meta).
But the question here is “Does a binary make a difference vs source code?” and if you’re saying “well history shows us that attackers like binaries more” and then history does not show that, you can see my issue right?
But what’s more, even if attackers did use binaries more, would we care? Maybe, but it depends on why. If it’s because binaries are so radically unauditable, and source code is so vigilitantly audited, ok sure. But I’m realllly doubtful that that would be the reason.
There’s some interesting meat to think about here in the context of package management, open source, burden on maintainers, varying interest groups.
I’ve been waiting for this mostly to find an extension automating the refusal of GDPR cookie banners
Consent-O-Matic is the best one I’ve found for this (not on android yet).
Definitely my favourite of these. In particular, it’s developed by a team of privacy researchers, not by a commercial entity, so they both understand privacy and have no commercial incentive to avoid complying with their obligations.
The thing that made me switch to Firefox on Android was the Self-Destructing Cookies extension, which Firefox sadly broke in an upgrade a few years ago. There are a few reimplementations (for Firefox and other browsers) but they all miss what made the original great: it didn’t ask for permission up front, it provided an undo that always worked. When you moved away from a site, it moved all of the cookies to a saved location. When you visited the site again, the cookies were not exposed and the new cookies were there again. If you discovered something stopped working, you had an option to restore the deleted cookies and another option to never delete for that site. This meant that you could live in the default-delete world safely without worrying about data loss. If you realised after you’d closed a tab that some state was stored in cookies that you cared about (scores in a game, shopping basket contents, login details, whatever) then you had the ability to undelete it easily. If you went back without losing anything that you cared about, the cookies stayed deleted and you didn’t think about it.
I really wish browser vendors would just make that the default behaviour.
I wonder why it hasn’t been replicated yet. I think the webextension API is sufficient to replicate the destruction and undo-ing. The main hitch is that webextensions don’t receive an event for the browser quitting, and I can’t remember if webextension background scripts are guaranteed to run before any page’s network requests start flying, so that bit could be racy.
I’ve been using Cookie AutoDelete on desktop since the demise of Self-Destructing Cookies. I’m looking forward to it working on Android!
Does that have the undo feature David wants?
Unfortunately it doesn’t seem to, though I’ve never missed it.
I use Consent-o-Matic on android with something called Kiwi browser, which is an old fork of Chrome(/ium?) with extension support.
I migrated to it after firefox removed most of the extensions I used on android due to an API change (was a couple years ago I think?) and been using it since.
Maybe this news means I’ll be able to get back. But Mozilla is always screwing up the UX of the android version, so I wouldn’t hold my breath…
You may get those prs because mypy does not recognize reexports without
__all__
as valid in at least some configurations.interesting, I haven’t seen that issue & none have cited that/pushed back when I pointed out the redundancy. any chance you have an example handy? I’d love to better understand that case
you have code like this en-masse in
__init__.py
:and the only purpose is to make it more easily importable from a flat namespace.
you already use
noqa
or something like that to disable “unused import” warnings.in larger codebases, it can additioanlly make sense to turn on
--no-implicit-reexport
, because with a lot of people working on a codebase, code that imports objects from totally random and inappropriate places tends to creep in, especially when people use IDEs that auto-insert imports. we have this enabled at work.in those situations it is necessary to add reexports to
__all__
even though nobody uses star-importsI fixed this issue in a library I maintain just the other week. People couldn’t use MyPy or Pyright on their code that used the library because the tools would complain that names weren’t defined, until I added them to
__all__
. It’s annoying redundancy, but necessary to satisfy type checkers.Interestingly, you can satisfy the type checkers with a different kind of redundancy: instead of writing
from .charm import ActionEvent
, you sayfrom .charm import ActionEvent as ActionEvent
, and that tells the type checkers “they’re not just importing this name to use it here, they actually want to define it” and shuts them up. See “redundant symbol alias” in the Pyright docs.However, in the end we went with
__all__
anyway, because Sphinx, the API docs generator, didn’t like the “redundant symbol alias” technique. Specifically its autodoc extension doesn’t pick up any names from the__init__.py
as it still doesn’t see those names as public.Thankfully, when type checking your library, Pyright complains both if you import a name but forget to include it in
__all__
, and if you include a name in__all__
but forget to import it, so it’s hard for the two lists to get out of sync.So yeah, I wish that what your original article was saying good advice, but in light of people using type checkers with your library (which you almost certainly want to support), it probably needs updating.
Also,
pyflakes
(and derivatives likeflake8
andruff
) allow reexports if they appear in__all__
:This follows rather subtly from the notion of public names defined in the import reference:
That’s the only reason we keep writing
__all__
at work; disabling F401 for each “reexport” in amodule/__init__.py
file is possible too but not much better.this is a fair point, I plan to update the article to mention it. Personally disabling the error is better IMO, for one it doesn’t violate DRY and is a one time fix instead of one requiring maintenance.
Fair point.
Good article by the way :)
But then you have to repeat the ignores for each tool. Personally I think that “explicit is better than implicit” is more important than DRY.
Every time Ruff comes up, the fact that it’s super-hyper-mega-fast is promoted as making up for the fact that it’s basically not extensible by a Python programmer (while the tools it wants to replace all easily are extensible in Python).
But the speed benefit only shows up when checking a huge number of files in a single run. And even on a large codebase:
So I’m still not sure why I should give up easy extensibility for speed that seems to offer me no actual practical benefit.
A bunch of thoughts here!
For these kinds of tools performance (both speed and memory usage) matters a lot, because codebases are effectively unbounded in size, and because for interactive use, latency budgets are pretty tight. There’s also Sorbet’s observation that performance unlocks new features. “Why would you whatchexec this on the whole code base? Because I now can”.
Now, if we speak strictly about syntax-based formatting and linting, you can get quite a bit of performance from the embarrassingly parallel nature of the task. But of course you want to do cross-file analysis, type inference, duplicate detection and what not.
The amount of things you can do with a good static analysis base is effectively unbounded. At this point, maybe Java and C# are coming to the point of saturation, but everything else feels like a decade behind. The primary three limiting factors to deliver these kinds of tools are:
This is high-investment, high-value thing, which requires a great foundation. And I would actually call that, rather than today’s raw performance, the most important feature of Ruff. We can start from fast linting, and then move to import analysis, type inference, full LSP and what not.
From my point of view, Python’s attempt to self-host all dev tools is a strategic blunder. Python really doesn’t have performance characteristics to move beyond per file listing, so it’s not surprising that, eg, pyright does its own thing rather than re-use existing ecosystem.
All that being said, extensibility is important! And Python is a fine language for that. Long term, I see Ruff exposing a Python scripting interface for this. If slow Python scripting sits on top of fast native core that does 90% o the CPU work, that should be fine!
Yet as I keep pointing out, my actual practical use cases for linting do not involve constantly re-running the linter over a million files in a tight loop – they involve linting the file I’m editing, linting the files in a changeset, etc. and the current Python linting ecosystem is more than fast enough for that case.
But what’s the gain from doing that? Remember: the real question is why I should give up forever on being able to extend/customize the linter in exchange for all this speed. Even if the speed unlocks entirely new categories of use cases, it still is useless to me if I can’t then go implement those use cases because the tool became orders of magnitude less extensible/customizable as the cost of the speed.
I think the instant that interface is allowed, you’re going to find that the OMGFAST argument disappears, because there is no way a ruleset written in Python is going to maintain the speed that is the sole and only selling point of Ruff. But by then all the other tools will have been bullied out of existence, so I guess Ruff will just win by default at that point.
Importantly, they also involve only context free linting. Something like “is this function unused in the project?” wouldn’t work in this paradigm. My point is not that you, personally, could benefit from extra speed for your current workflow. It’s rather than there are people who would benefit, and that there are more powerful workflows (eg, typechecking on every keypress) which would become possible
At minimum, simplicity. I’d much rather just run
$ foo
than futz with git & xargs to figure out how to run it only on the changed files. Shaving off 10 seconds from the first CI check is also pretty valuable.If you do this in the stupidest possible way, then, sure, it’ll probably be even slower than pure Python due to switching back and forth between Python and native. But it seems to me that that custom linting is amenable to proper slicing into CPU-heavy part and scripting on top:
There are already flake8 plugins that detect that sort of thing.
All the existing tools have a “copy/paste this into your pre-commit config” snippet and then it Just Works. If you are indeed rolling your own solution to run only on the changed files, then I think you should probably pause and familiarize yourself with the current state of the art prior to telling everyone else to abandon it.
Sorry if my comments read as if I am pushing anyone to use Ruff, that definitely wasn’t my intention! Rather, I wanted to share my experience as implementer of similar tools, as that might be an interesting perspective for some.
That being said, I think I want to register a formal prediction that, in five years or so, something of Ruff’s shape (Python code analysis as a cli implemented in a faster language, not necessary specifically Ruff, and not counting already existing PyCharm) would meaningfully eat into Python’s dev tool “market”.
I think Ruff will gain significant “market share”, but for the wrong reasons – not because of any technical superiority or improved user experience, but simply because its hype cycle means people will be pushed into adopting it whether they gain from it or not. I’m already dreading the day someone will inevitably file a “bug” against one of my projects claiming that it’s broken because it hasn’t adopted Ruff yet.
The “not extensible by a $lang programmer” was a reason for not pursuing faster tooling in better suited languages for the web ecosystem, and everything was painfully slow.
In my experience, esbuild (Go) and swc (Rust) are a massive improvement and will trade extensibility for the speed boost every time.
I’ve been using Ruff’s
flake8-annotations
checks to get a very quick list of missing annotations as I port a codebase. In a watchexec loop it’s substantially faster than getting the same information from MyPy or Pyright.Likewise, in another codebase
ruff --fix
has already replaced isort (andflake8
and friends).I’ve never needed the extensibility, though. I’m curious, what do you do with it?
I’m not sure why you’d need to run it over the entire codebase in a loop, though. Isn’t that the kind of thing where you generate a report once, and then you only incrementally need to check a file or two at a time as you fix them up?
Again, I don’t get it:
isort
will fix up imports for you, and my editor is set to do it automatically on file save and if I somehow miss that I have a pre-commit hook running it too. So I’m never in a situation where I need to run it and apply fixes across thousands of files (or if I was, it’d be a one-time thing, not an every-edit thing). So why do I need to switch to another tool?There are lots of popular plugins. For example, pylint on a Django codebase is next to unusable without a plugin to “teach” pylint how some of Django’s metaprogramming works. As far as I can tell, Ruff does not have parity with that. Same for the extremely popular pytest testing framework; without a plugin, pylint gets very confused at some of the dependency-injection “magic” pytest does.
Even without bringing pylint into it, flake8 has a lot of popular plugins for both general purpose and specific library/framework cases, and Ruff has to implement all the rules from those plugins. Which is why it has to have a huge library of built-in rules and explicitly list which flake8 plugins it’s achieved parity with.
I like to work from a live list as autoformatting causes line numbers to shift around as annotations increase line length. Really I should set up
ruff-lsp
.I don’t use pre-commit because it’s excruciatingly slow. These things are really noticeable to me — maybe you have a faster machine?
Can you quantify “excruciatingly slow”? Like, “
n
milliseconds to run whenk
files staged for commit” quantification?Because I’ve personally never noticed it slowing me down. I work on codebases of various sizes, doing changesets of various sizes, on a few different laptops (all Macs, of varying vintages). Maybe it’s just that I zone out a bit while I’m mentally composing the commit message, but I’ve never found myself waiting for pre-commit to finish before being able to start typing the message (fwiw my workflow is in Emacs, using
magit
as the git interface and an Emacs buffer to draft and edit the commit message, so actually writing the message is always the last part of the process for me).I gave it another try and it looks like it’s not so bad after the first time. The way it’s intermittently slow (anytime the checkers change) is frustrating, but probably tolerable given the benefits.
I think my impression of slowness came from Twisted where it is used to run the lint over all files. This is very slow.
Thanks for prompting me to give it another look!
My experience is that the list of configured checks changes relatively rarely – I get the set of them that I want, and leave it except for the occasional version bump of a linter/formatter. But it’s also not really pre-commit’s fault that changing the set of checks is slow, because changing it involves, under the hood, doing a
git clone
and thenpip install
(from the cloned repo) of the new hook. How fast or slow that is depends on your network connection and the particular repo the hook lives in.Write bespoke lints for codebase specific usage issues.
Most of them should probably be semgrep rules, but semgrep is not on the CI, it’s no speed demon either, and last I checked it has pretty sharp edges where it’s pretty easy to create rules which don’t work it complex cases.
PyLint is a lot more work, but lints are pretty easy to test and while the API is I’ll documented it’s quite workable and works well once you’ve gotten it nailed down.
Ah, so you and ubernostrum are optimizing workflows on a (single?) (large?) codebase, and you’re after a Pylint, rather than a pyflakes/flake8.
I’m coming at this from an OSS-style many-small-repos perspective. I prefer a minimally-configurable tool so that the DX is aligned across repositories. I don’t install and configure many flake8 plugins because that increases per-repo maintenance burden (e.g., with flake8 alone W503/W504 debacle caused a bunch of churn as the style rules changed — thank goodness we now have Black!). Thus, I’m happy to drop additional tools like
isort
. So to me Ruff adds to the immediately-available capabilities without increasing overhead — seems like a great deal!It seems like Ruff might slot into your workflow as a flake8 replacement, but you get a lot from Pylint, so I’d keep using the latter. You could disable all the style stuff and use Pylint in a slower loop like a type checker.
I have both large and small codebases. I do use pylint in addition to flake8 – my usual approach is flake8 in pre-commit because it’s a decent quick check, and pylint in CI because it’s comprehensive. I’ve written up my approach to “code quality” tooling and checks in detail, and you can also see an example repository using that approach.
pylint is, in practice, very memory hungry and frankly slow.
Now i can’t go from there to recommending ruff for the simple fact that ruff is not checking nearly enough stuff to be considered a replacement IMO. Not yet at least. But I’ll be happy to see better stuff happening in this space (disclaimer: I’m writing a rust-based pylint drop-in replacement. Mostly for practice but also because I really suffered under pylint’s perf issues in a past life)
My admiration for ruff comes from the fact that I now have a single tool and a single configuration place. I don’t have to chase how to configure 10 different tools to do linting and ensuring that my python project has some guardrails. For example, my big annoyance with flake8 is that I can’t add it’s config in
pyproject.toml
, it has to be a separate file. I really, really, just want to flip the switch and have various checks done on the codebase, and not scour the internet on how to configure these tools, since each has it’s own (quite valid) interpretation of what’s the right way to do things. I just want to stay away from ever creatingsetup.py
and all those other things I never understood why are needed to package some interpreted code (my dislike for python’s packaging is leaking here :)).I’m curious, what do you need to change in the tools replaced by ruff? What additional checks do you need to implement?
I personally do not care about the config file thing, and I wish people would stop bullying the flake8 dude about it. Way too many people, back when
pyproject.toml
was introduced for a completely different purpose than this, still treated its existence as meaning “all prior config approaches are now illegal, harass everyone you can find until they give up or give in”. Which is what people have basically tried to do to flake8, and I respect the fact that the maintainer laid out clear criteria for a switch topyproject.toml
and then just aggressively locked and ignored every request that doesn’t meet those criteria.I already gave a reply to someone else talking about the whole ecosystem of plugins out there for flake8 and pylint, and Ruff is not at parity with them. So even if I wanted to switch to Ruff I would not be able to – it lacks the checks I rely on, and lacks the ability for me to go implement those checks.
I’ve been slowly but surely giving up on Python for some time, and I’ve often struggled to articulate the reasons why. But having just read some of the flake8 pyproject stuff, it’s hit me that most of it could be described as bullying at some level or other.
Python itself arguably bullies its users, with things like the
async
->ensure_future
change, sum’s special case for str because the devs don’t like it, blah blah. (I want to say something about the packaging situation here, and how much of a pain in the ass it is to maintain a Python project to popular opinion standards in 202x, but I recognise that not all of this is deliberate.) Black’s founding principle is that bludgeoning people into accepting a standard is better than wasting time letting them have preferences. Long ago, when I frequented #python, SOP whenever anyone wanted to use sockets was to bully them into using an external dependency instead. And longer ago, when I frequented python-ideas, ideas that the in-group didn’t take to were made, along with their champions, to run ridiculous gauntlets of nitpicking and whataboutism.Of course none of the exponents of this behaviour identify it as bullying, but then who would? The results are indistinguishable whether they’re being a dick, evangelizing best practices or just trying to save everyone’s time.
In short I think that, if you don’t want to be bullied into adopting a solution that doesn’t really work for you, you are in the wrong ecosystem.
Some of us use
pyflakes
on its own, and are thus used to the zero-configuration experience. The configurability ofpylint
is a net negative for me; it leads to bikeshedding over linter configuration.This is entirely reasonable. In my case, I started a new job and new project, and I’m not invested heavily in the existing python based toolchain, so
ruff
was the right choice for us. I don’t like the way these sorts of minor differences get amplified up into existential crises anyway. And no, I’m not new on the internet, just tired of it all.A thousand times this. $CUSTOMER had wildly divergent coding styles in their repos, and the project to streamline it meant configuring these traditional small tools and their plugins to conform to how PyCharm did things because it’s quite opinionated. And popular among people who are tired of it all.
The tooling included
darker
, which is fine, though I personally do not like all ofblack
’s choices.Eventually the whole codebase was blackened all at once and ruff replaced everything else.
The pre-commit is fast and my preferences aside, good arguments can be made for those two tools.
It is what it is, a business decision, and the right way to deal with it is to not elevate it to an existential crisis.
Outside the business world, if I had popular projects, I’d dislike ending up with PRs in the black style if the project wasn’t like that. Or having to set up glue for reformatting.
This is probably how all monopolization happens; people become rightfully tired of being ever-vigilant, and inevitably something bad will come out of the monopoly.
Like not getting OSS contributions because of the project’s formatting guidelines.
100% agree. This reminds me of the old “You have a problem and decide to use regexes to solve it. Now you have two problems.” Yes, your linting is faster but now “it’s basically not extensible by a Python programmer”, which means it’s more difficult for people to maintain their own tools.
the price was definitely the main thing keeping me from using this, so that’s neat
it used to cost $$$$ for each reader node beyond the first 1 or 2. and basically the whole point of datomic is reader scaling, that’s either going to be very expensive, or very useless
My production Datomic experience says: why not both?
does it not scale like it claims to, or was it the price stopping you from taking advantage of its capabilities?
(i don’t have any immediate plans to use it, but it’d be nice to know some stories from the trenches before trying it some time)
I’m not sure what the $$$$ cost was for us — the direct cost wasn’t the constraint — the real problem is that Datomic is so dang inefficient that you’re going to have to think about other costs, too.
Memory is the big issue. The transactor is very memory-intensive (JVM!) and if you need to use Datomic’s equivalent of stored procedures (you will) it’s going to need your whole working set in memory to be performant. You also pay that in-memory cost again for each reader, which leads to the usual pain of large heaps on the JVM.
Obviously storage also may be a problem depending on the write rate. Datomic is fundamentally about keeping all the history forever. While you can hackishly “excise” old facts it’s very expensive. I’d never use it in a domain where I might eventually have to deal with GDPR/right to be forgotten/CA privacy rights/etc.
These shortcomings are exacerbated by the slowness of the thing. Clojure and the JVM are such a weird choice for a database.
I can imagine wanting a system with Datomic’s architecture, if it were sufficiently performant, but the actual software as-implemented? No. Stuff it all in Postgres, which won’t blow up if you do full-text queries! Plus, it’ll tolerate blobs and documents (you probably have some of those).
Using unbufferred IO when sequentially parsing / serializing a big file line-by-line. This is the most funniest performance regression I’ve seen, since it is encountered very frequently.
A few years ago I traced Apache Cassandra while creating tables and discovered that it issues thousands of 1-byte writes. Someone forgot their
BufferedWriter
!https://discuss.python.org/t/announce-pybi-and-posy/23021
Some related discussion. Excited to see this taking next steps
There’s also an official PEP 711 discussion thread.
IT rant checklist:
There was a conclusion: “buy my book”.
As he says “Agile proponents say it hasn’t been tried right.”
I can’t say that I’m a fan of the name “Oils for Unix” for the project as a whole. If you’re gonna rename “Oil shell” the language to YSH, I think it’s still a good idea to keep “Oil shell” as the name for the shell implementation that interprets both the OSH & YSH languages.
The next post is supposed to explain the many reasons for that, but I’ll give the summary here.
First, some readers talked me out of “Soil” a few months ago :) So you might be glad for that
https://old.reddit.com/r/oilshell/comments/x0u7qw/new_names_renaming_for_oil/
In terms of accuracy, there are more parts of the project than OSH and YSH.
When writing YSH / Oil, it becomes pretty apparent that language for DATA are just as important, if not more important, than the shell language (languages for code).
Summary: QSN is moving toward JSON strings, and we’ll also have formats for tables and records.
So those data languages are part of “Oils for Unix”.
There could be other things too. I had to solve the “dev env” problem for our repo (related: Nix, gitpod), and the solution ended up being a mini-distro :-/ ! Or really a bunch of tools to compile from source in containers. I’m not sure if that will be exposed to users, but it’s possible.
mycpp and ASDL (translators to C++) are also things that could be exposed to users in some form.
The shell ends up “leaking” into a distributed operating system project pretty easily :) It’s a language of processes and files, and related tools for working with them.
So I wanted to leave room for other things under the “Oils for Unix” name, not just “Oil Shell”.
In terms of the connotation:
“Oil” reminds people of the energy commodity with the big bad industry behind it. “Oil Shell” further seems to remind people of the company “Shell Oil”. My brain doesn’t work that way, but it’s come up a surprising number of times, over a long period.
Many lobste.rs users are probably past the name by now, and think of it as a shell, but new people are encountering the project every day! There are maybe 10 K lobste.rs readers; there are probably at least 10 M shell users. Shell was the 8th most used language on Github last year, and the 6th fastest growing.
FWIW the original reason for Oil is that it’s an analogy to mechanical systems: http://www.oilshell.org/blog/2019/06/17.html#why-is-the-project-named-oil . i.e. it’s not about energy, but about systems that work well
Importantly, “Oils” has a different connotation than “Oil” (in English). Perhaps similar to “potions”, not the energy commodity
I would suggest a new connotation is what you see in the “Unix Magic poster”: https://jpmens.net/2021/04/09/the-unix-magic-poster/
https://news.ycombinator.com/item?id=27029196
Practically speaking:
There is a single
oils-for-unix
binary, which you do NOT type, hence the long name. And there are 2 symlinks, like busybox:The “Oil Shell” / OSH / Oil scheme has the problem that OSH would naturally stand for “Oil Shell”. So it’s emphasizing the old part over the new part.
In “Oils for Unix”, OSH is just opaque like YSH. (A user suggested “old shell” and “young shell” :) )
I’m actually open to any new suggestions, but I think it will be very difficult to find a name that is different, not taken, accurate, but people already “accept” as the same as “Oil Shell”. (I changed my twitter and Mastodon handle to oilsforunix and nobody noticed.) I don’t want a completely new name like “Zoo” or something.
(edited for clarity, I can see why the original was confusing)
I want to express that I mean this in the nicest way possible. I have tried for years to understand Oil, and have read (possible?) a million words on the subject, and I find that most of the time reading your posts leave me with less understanding than I started with.
The above post also fits this pattern.
I am trying, and completely respect your transparency and willingness to write 1,000 word responses to things, but something just doesn’t click about it all. I wish I had more actionable feedback other than: have you thought about asking someone else to write a succinct overview/survey of where the project sits towards its goals?
To answer this last part succinctly, I’m definitely open to another voice writing about it
There is a lot of dense material on Zulip to summarize, for better or worse, but it’s all there!
Also, you’re probably right that I’ve been “talking past” a lot of people with too many words, which does not lead to a good experience. That’s probably because I’m answering many messages at once, so I’ll take that as good feedback
Hm, some confusion is understandable, because certain things have changed over the years
To be short: we’re changing the name to make the two parts to the project clear:
There are also other tools that may fall under the “Oils for Unix” project
The old naming was confusing because people thought “OSH” ? Isn’t that “Oil Shell”? No there’s also another part called “Oil”
It also had a bad connotation for some people.
If it doesn’t click, that’s OK for now … Right now I’m looking for people to test OSH, and to contribute ideas to YSH, which may or may not work out.
A common disconnect is that probably 90% of shell users don’t use the shell as a programming language. I didn’t for the first 10 years I used shell. So a lot of the stuff I write isn’t relevant from that viewpoint.
Some people might also not see the relevance of the writing about grammars and regular languages and so forth. That’s OK, but my response is that the person who developed the first shell is the same person who brought regular languages to computing (Ken Thompson). So part of the project’s philosophy is to really go back to first principles. I think it will show up from the user’s POV, but not everyone will agree.
Of course! That’s why I’ve read millions (potentially?) of words!
Even this post is confusing.
Based on previous understanding, I assume “compatible” means POSIX shell compatible. Cool. But, I thought the whole idea of Oil was that it was a new Shell language that would always be lowerable to POSIX shell? So, now there’s a second shell, and that raises questions. Why are there two shells?
Yes, it’s incredibly confusing. What is Oil? I thought it was a shell. It’s not a shell. It’s two shells, and then a bunch of other things.
The homepage now has a more succinct definition (good!):
This is literally soo much more valuable than every other blog post you’ve written about Oil, in my opinion. And, I am still confused by it!
What am I supposed to do with this? I know from it that you’re writing a new shell that is backwards compatible. And I’m enticed to believe that as a new language it’s POSIX shell compatible AND more expressive like Python or JavaScript. But, actually, it’s two different shells, and I have no idea how they work together to leave me in a better position than if I were to have just used POSIX shell, or suffered through writing “shell” code in Python / JavaScript. And, that’s understandable as it’s literally 2 sentences! It’s not enough text to describe everything. But if 2 sentences are already confusing to me, imagine the confusion that might ensue from 100 sentences!
Sorry. I’m trying here. I really am. I’m just consistently failing to understand this project, and I truly do not have this problem with any other project. Even abstract ones. I have imagination, and a whole mess of off the wall ideas of my own. I’m good at understanding and reasoning about abstraction.
Ah OK, does this wiki page help?
https://github.com/oilshell/oil/wiki/OSH-versus-Oil
This person was similarly confused, and said that it did: https://news.ycombinator.com/item?id=35201263
Specifically, a bunch of
shopt
shell options is technically the only thing that differ in OSH and YSH. (You can think of this likefrom __future__ import
in Python – it’s a gradual upgrade when there are minor breaking changes.)But they add up to a new language.
https://github.com/oilshell/oil/wiki/OSH-versus-Oil#full-option-list-in-oilall
The upgrade path is meant to be:
bin/osh
shopt --set ysh:upgrade
bin/ysh
bin/osh
withshopt --set ysh:all
– it’s the same binary, with different symlinks!However, the details may change, so it’s not necessarily stable. YSH isn’t stable either. So I haven’t overly emphasized this point, because some of it’s still aspirational.
However I think it is very interesting, and worth writing about, because 5 years ago most people basically thought it was impossible. It’s definitely not impossible now, because it runs and works.
So I talk about it as two shells, but really there’s blurry line between them that has to do with all the shell options. That’s the “upgrade path”.
I’ll take that as feedback on how I present it
Additionally I’d say it could be confusing because
It’s changed. The upgrade path is no longer based on automatic translation. I wrote some stuff many years ago that is now obsolete. That approach didn’t work technically.
The idea of having 2 shells in one could be inherently confusing. OSH and YSH/Oil are really the same interpreter. BUT if you change a bunch of options, I claim it’s a new language, and that’s surprising.
This whole page is written from the “new” perspective, AND the code parses and runs:
https://www.oilshell.org/release/latest/doc/oil-language-tour.html
Really there is a hidden “old crappy language” in the background, but I claim it’s all hidden. Some people REALLY want this clean slate perspective, stripped of all legacy.
The project is weird because there are so many perspectives – some people don’t know shell at all, but want a better one. They want something like Python or JavaScript
Other people want “POSIX forever”. We just saw that on lobste.rs the other day
“Oils for Unix” is for both groups, and everyone in between. Just like people who write stuff like Cosmopolitan libc, BSD kernels, Google C++, or Boost C++ – wildly different projects with different dialects – all use GCC and Clang.
BTW Clang handles C, C++, and Objective C all with the same parser. So in that respect it’s very similar. There are C users using Clang that do not know Objective C – I’d say most of them. And there are Objective C users who don’t know C++, etc. Not to mention all the C users who don’t know C++.
So it’s not necessary to understand all of the project in order for it to be useful in some practical way. That is why I talk about “OSH” and “YSH” separately – I’m really talking to two different audiences with different expectations, desires, backgrounds, etc.
I wrote about the change in automatic translation on the blog at some point, and I linked the “OSH versus Oil” page, but I’m not at all surprised that people missed it and are confused :) Ironically, the early readers may be more confused than the people who just started reading, because things have changed.
So yeah the OSH vs. Oil naming is confusing, and the renaming will probably cause a bit more confusion. But I think it will be better in the long run.
I’m going to update the home page sentence as well :) I’m not looking forward to all the churn and renaming, but I think the end state will be much more clear. Again I wouldn’t have renamed it if I didn’t think it was confusing, and if I didn’t NOTICE that people were confused online!
I tried to put all the important info up front in this blog post: OSH is getting done, but it will take a lot of effort to polish it.
YSH is promising, and people have tried it and written code in it, but we still need more help.
There is an upgrade path, but people need to try it and provide feedback on it! It’s possible that nobody really wants to upgrade their shell scripts. Maybe they’re going to rewrite them in Python :)
I know I do though! We have lots of shell scripts that are begging to be upgraded.
I will take this all into account when writing the next blog post :) If you’re still confused let me know
Feedback: I mentioned in my previous response that the succinct 2 sentence thing on the homepage was good! Your reply here is literally 2 pages long.
Thank you for writing it. It did clear up some things! I found most of the other parts to be unnecessary, but did appreciate some of the extra things to think about.
–
Responding more directly to an earlier point you made:
I think your hypothesis that 90% of people who interact with shell don’t actual program it, seems accurate, but I don’t believe that’s the reason you haven’t found a wider variety of users to test it out and contribute. I think (and btw, I’ve gotten at least 1 private message on Lobsters thanking me for engaging with you like this, and others in another private chat), the problem is that your communication style makes it inaccessible to a large chunk of potential users! Some of those users are probably actually stuck in shell-hell, too!
Shell is a wild place, and so often the right tool for the job. I truly believe a better language for it would be amazing. This is exactly why I’ve read (potentially?) millions of words trying to get to an understanding of Oil.
Cheers!
Yes, thank you apg for bringing this up. I too think Oil is a neat project, but I am utterly lost by what it is trying to be.
I really wonder if @andyc is too close to the project now. Another engineer, or even better a product manager, explaining things might help improve the signal to noise ratio.
You asked “What is Oil?” Did the wiki page help you understand that?
https://github.com/oilshell/oil/wiki/OSH-versus-Oil
I could have just dropped the link, but I thought I would provide some additional context and analogies
It’s OK if it didn’t help. I do believe there are very concrete benefits now, which I explained in the post, and that there is a path to do something that people thought was impossible 5 years ago
But I fully get that right now it’s not useful yet to many people, and may not be interesting if they’re not using shell in a certain way.
To be perfectly clear. I am not saying that osh or ysh, is not useful. My comment was directed solely on the extra text in your response that went on tangents and clouded the helpful first part of the response.
I know oil isn’t “ready” yet.
Also, thank you for being so open to feedback here. I know it’s sometimes hard to read. It probably would have been better privately, rather than open in the comments as it happened; I acknowledge that and apologize for not seeing it sooner.
Yes. It is a helpful page.
i didn’t think of the “shell oil” problem, so good call on moving away from “oil shell”. but “oils for unix” is a seriously clunky name. (that said, i’ll admit i can’t think of too many oil-based names that would not have google result pollution. “snake oil” is tempting due to the python heritage, but no way you’ll get a good search experience with that.)
It’s meant to be a bit long, so it can be available in many global namespaces, like oils-for-unix.org (which I bought), twitter/oilsforunix etc. Surprisingly “oilshell” has been taken on Twitter for years, for some junk account
It’s also important that you don’t type it! You type the symlinks osh or ysh.
I suggested “oils” as the name for the binary, but then people wanted to type “oils”. So I think the long name is also good in that respect
i wasn’t thinking so much about typing it as simply talking about it. “oils for unix” lacks euphony.
That’s probably true, but I think people might just talk about OSH and YSH ?
(which BTW I pronounce as acronyms Oh-Ess-Aych and Why-Ess-Aych, but you’re free to pronounce however you like :) )
I also say “Oils” when the context is clear. (At least in English, “Oils” has different connotations than “Oil”)
Like the github repo may be oils-for-unix/oils, not oils-for-unix/oils-for-unix
Naming is hard :) Again I’m open to other suggestions, but this name already passed a lot of tests over the last few months …
Fwiw, my American brain goes straight to oh-sh, and yee-sh in the family of cee-sh and zee-sh.
I definitely agree avoiding the collision with Shell Oil is worth a rename, if for no other reason than that you’ll never out-SEO a major multinational. Is the Unix trademark a potential risk here?
Why have a third
oils-for-unix
binary? Couldn’tysh
symlinkosh
or vice-versa? If nobody is supposed to type it why is it possible for typing it to do something?I don’t think that the name of the tarball is of great consequence. The “Oils for POSIX” project releasing
oils-1.2.3.tar.gz
makes perfect sense to me.I thought about the trademark, but I think “for Unix” makes it OK ?
Technically the symlinks work like busybox, which has the main binary, and symlinks
I would like all of these to be the same, somewhat inspired by xkcd’s globally unique label:
… and more services, e.g. we are also using Docker Hub, but plan to move off
It annoys me to have slightly different names based on availability …
Suggestions welcome… I will have to make a decision soon, because I can’t blog about YSH/Oil unless it has a name :)
Where my brain goes after hearing “Oils for Unix/POSIX” is “this is a collection of Oils”, so does this mean OSH, YSH, and QSN are each an Oil? This just seems clunky.
The “Oils Project” or “Oils Collection” doesn’t lead my brain down that path at least.
Also, I want to add that existing languages/projects already have this problem, and you could look to how they name for reference. Rust is a programming language maintained by the Rust Team with a reference compiler rustc (for “Rust compiler”). Nix is a programming language maintained by the NixOS contributors that supports the Nixpkgs package distribution and the NixOS operating system distribution. (I should note that the naming of Nix projects is somewhat notorious; see https://www.haskellforall.com/2022/08/stop-calling-everything-nix.html. Also Nix has a (tiny) standard library, Nixpkgs is/has a standard library, NixOS has a little internal library also called
lib
, the same as Nixpkgs’s, it’s bad.) Bash is a programming language (descendent of POSIX shell) and a reference interpreter Bash. Same with most shells: Zsh, Dash, mksh, Fish.Here’s an idea for Oil naming: The project distributes the “Oil Collection” (
oil-collection
for the multi-call executable), a runtime/interpreter/tool collection for multiple languages/things: the language/shell OSH, the language/shell YSH, and the (data) language QSN. This is no longer just a Unix thing, which is good if someone decides port the Oil Collection to e.g. Redox. OSH is POSIX shell & Bash compatible, YSH isn’t, and they’re related in the Oil Collection implementation but that’s not important right now. QSN is a handy, small data language that’s easy to use from/with OSH and YSH. I don’t have to wonder what “an Oil” is.Yes, that’s the idea. Again see the Unix Magic poster – each language/tool is an “Oil”. There may be a lot of them, more than 3 :)
It’s basically like busybox or coreutils.
I think we’re avoiding the Nix problem by calling them OSH and YSH and QSN, not OSH and Oil and Oil Data
I actually mentioned that in the thread back in August where we decided on this name:
https://oilshell.zulipchat.com/#narrow/stream/325160-oil-discuss-public/topic/New.20Names.20.2F.20Renaming.3F
“Oil Collection” is a valid suggestion.
“oilutils” might be another one (similar to coreutils).
However the things that seals “Oils for Unix” is that if you Google or Bing it, it ALREADY points at https://www.oilshell.org/
“Oils Collection” is longer, and less descriptive IMO.
I’m open to more suggestions, but I’ll need to decide soon as it’s blocking the blog. We need something globally unique that will give us all of
I might post a top-level lobste.rs story and see if anyone has feedback
–
I can see why people think “Oils for Unix” is clunky, but if clunkiness is the worst thing, then it’s a pretty good name.
I’m surprised that anyone would go to bat for “Oil Shell”. I thought most people disliked the name :-P
Back in August nobody went to bat for it.
I always liked the Oil Shell name & Shell Oil pun. :) Also, I’m not really attached to “Oil Collection”, it’s just an initial idea for a canonical multi-call executable name.
In any case, it looks like the discussion will be continuing at https://lobste.rs/s/plmk9r/should_oil_project_oil_shell_be_renamed.
This feels like it’s been a while coming. Gloriously, html5lib deprecated its sanitizer in favor of Bleach in 2020, but the project’s owners haven’t passed the torch.
Around that time I looked into forking html5lib due to the lack of maintenance (they aren’t great about merging PRs) and slow performance. My thought was to type annotate it enough to run mypyc on it. However, after triaging all the open issues and digging into the implementation I don’t think it’s really worth salvaging, for a number of reasons:
<wbr>
and<ol reversed>
(and there are many more omissions in the sanitizer)It feels like the victory of the project was html5lib-tests, which were used to build html5ever, not so much the actual software product.
I’ll probably end up porting my HTML processing code to Rust so I can use html5ever directly.
If using rust, I can highly recommend https://github.com/rust-ammonia/ammonia which is already sitting on top of html5ever.
It’s kinda fun to see the inverse of the “mysterious network delay” genre of evergreen network debugging post (usual solution: set
TCP_NODELAY
!).However, it would be nice if the author had done some more investigation before concluding that
TCP_NODELAY
is at fault. After all, settingTCP_NODELAY
is pretty common for HTTP clients — e.g., curl, Python’s http.client and urllib3.It seems more likely that git-lfs doesn’t buffer properly. After some code inspection I noticed:
Read
call, then? Ouch.Introducing the magic OS-level buffering of Nagle’s algorithm won’t fix tiny filesystem reads, nor undersized buffers. The argument against TCP_NODELAY by default seems specious when making bulk transfers fast always requires looking through the full stack.
Given the leap to conclusion and inflammatory title I think this is more a rant about Golang than technical networking content.
+1
This is silly. ‘A message queue’ doesn’t equate to kafka. That a gigantic leap assumption.
More importantly: Does postgres offer any queue functionality? Are they talking about just inserting and querying a large table? That cannot possibly be a better message queue than any system that properly implements a queue with O(1) pushes and pops.
They say that given their requirements (SLOs), it costs less to use their existing postgres database than adding new infrastucture (e.g. kafka) for this.
That’s not better by any stretch of the word. That is good enough for their use case.
It was a better solution in their exact circumstance, taking the costs and effort into account. That’s the point of the article, and I buy it.
Totally true, but let’s take a moment to rejoice that ‘a message queue’ no longer equates to RabbitMQ.
On my first reading of this headline I was wondering which cookies Avast had acquired, and how that is even technically possible to do.
It’s the “I Don’t Care About Cookies” extension that has acquired by Avast. Though it sounds like mostly an acquihire, as he talks about working on other products for them.
FWIW, I’ve never used this extension because I feel like it is dangerous to go randomly accepting T&Cs (which is what the cookies popups actually are) in a browser window where that might get linked to one of my accounts. It’s not uncommon for the wording to say not only do you accept cookies, but you fully agree with the privacy policy.
If it’s a site I’m going to come back to, I am definitely making sure I click on the correct button for “no I don’t agree to your random and otherwise not-enforceable nonsense that I don’t have time to read”.
I now tend to open most untrusted websites (such as links from orange or blue websites) in incognito mode, click on the most obvious “go away” link and close the window later, safe in the knowledge that I didn’t really agree to anything binding. I’d be reasonably happy with an extension set to incognito mode only, to save that click, but I’m pretty sure only the other way around is currently possible.
Plus like … why accept the cookies if you can decline them? The whole premise is nonsensical.
Suggested replacement: https://addons.mozilla.org/en-US/firefox/addon/consent-o-matic/
Yes, that seems much better designed, since it lets you set your preferences in simple categories and then applies those choices everywhere - which is how compliance with the law should have been implemented in the first place.
Thanks for the suggestion, I’ve installed it to try it out. The only thing I can’t see is a way to override these preferences for a specific site if needed.
I go a very different route.
I block the consent banners & popups where I can with uBlock origin. Get out of my way, I don’t want you to solicit me.
I then use cookie autodelete to delete cookies for a site after I close its tabs. This is a bit like telling the browser to block cookies and localstorage completely, but websites that (pretend to) break still keep working.
This isn’t perfect. Youtube still has ways of tracking and remembering me (at least according to the suggested videos) but of course deleting cookies does make it forget that I turn off autoplay. Quite an interesting perspective on their priorities and methods.
The point of things like consent-o-magic isn’t to prevent tracking, it’s to prevent dark UI patterns from getting user consent before tracking. The goal is to ensure that companies like Google and Facebook are definitely in violation of the letter of the law, not just doing things that their users don’t understand and would hate if they did, so that information commissioners can collect the evidence that they need to impose the 5% of annual turnover fines that the GDPR permits.
Yeah, this is the way to go — uBlock Origin + EasyList Cookie blocks the obnoxious dialogs, and Cookie Auto-Delete cleans up the mess. Sadly the latter isn’t available on Android, though.
When using uBlock Origin + EasyList Cookie, I am often left with a website with a backdrop and not allowing scrolling. This can be fixed with the inspector tool, but I am wondering if I am missing something.
I had that issue only once, on a site that was completely broken with any ad blocker. I expect the answer is yes, but do you have uBlock’s cosmetic filters enabled?
TLDR: Akka is now source available, Alex is mad. Insofar as I can find a moral argument it’s that this represents a bait and switch. Alex also goes over the practical reasons devs like open source, which most significantly is avoiding bureaucratic processes.
I think this is right - any sustainable alternative to open source has to preserve that property. (My hunch is that a collecting society type model based on turnover is the way to do this). Alex mentions commoditization of ones complements. This points to open source being supported by businesses that specifically create it as a complement to their business; I think that’s fine but can only support a certain slice of software.
Agree. Data storage has had a reasonable path for monetizing open source by making it REALLY EASY to pay money, e.g., a cloud service that maintains my MongoDB cluster for me. That cuts a lot of corners for enterprises. If you want your open source project to churn out cash, make it easy to spend money on. There’s lots of ways to do that. Forceful license changes ignore what the customer wants, which is a recipe for a failing business.
I think what they’re aiming towards is something like what people like to do with physical things:
Problem is that this is software, which has no physical, single-quantity and changes a lot, so you would have to re-buy it (or pay monthly fees), to always have an “up to date purchase”. And the physical-object industry also tries to restrict such use cases (personal use only, no modifications, subscription based add-ons..)
Ultimately this is the GraalVM debate all over again: Do I want to settle my product runtime and core on something that may just go away at any point? Everyone that had to interact with the company-wide legacy ERP system, which they never managed to replace, knows the fear.
Can you elaborate on this? I’m pretty much guessing what this might mean.
What I mean is that multiple open source projects would band together to basically commonly issue commercial licenses, and if a company buys in a licence the revenue gets allocated between projects. The more projects get under a given umbrella, the easier it is for companies to adopt them.
One project could be with multiple societies.
In terms of pricing I was thinking at the most basic level something linked to turn over and then something less convenient for companies that want to try to save money.
Thanks. In theory, something like this approach could also be useful for the governance and supply chain issues that we’re worried about today.
How so?
That sounds like https://tidelift.com/
No, it’s very different. For a start tidelift doesn’t provide a better license. Tidelift is more like a consulting and support shop that kicks some money back to the free software projects.
Thanks for the summary! It’s exciting that Akka is doing this. With luck it’ll kill the whole misbegotten mess. The actor model on the JVM never made any sense.
Strongly agree with this — the snap, which I first encountered in 22.04 LTS — is in no way LTS quality.
The issue I encountered immediately on upgrade is that activating the save dialog by pressing Ctrl-S has completely botched focus management. The focus moves to the dialog so quickly that the key up event isn’t received on the original window, which pops a fresh dialog as soon as the first is closed. The only way to recover is to kill the whole app.
If packaging Firefox is so difficult Canonical should make a .deb that dumps it in
/opt/firefox
and be done with it.The packaging isn’t difficult. Snap exists not for packaging reasons but rather sandboxing reasons.
I don’t understand this.
Like, it’s great that I can install Discord as a Snap — I like having that sandboxed. But aggressively applying this tech to the most complicated and rapidly changing apps shipped on an Ubuntu system (web browsers) seems frightfully optimistic. You’re inevitably going to hit tons of long tail bugs and niche features like we’re seeing.
I have used the c920 on a mac for years, and it has always been overexposed. I’m not sure whether it’s Logitech or Apple or both to blame here. The solution for me is to install the app “Webcam Settings” from the Apple store (yeah it’s a generic name), which lets you tweak many settings on webcams and save profiles for them. It’s not perfect, but I already have the camera and it’s significantly easier to work with than hooking my DSLR up.
The equivalent to “Webcam Settings” on Linux is
guvcview
. I have a Microsoft LifeCam Studio and have to use this tool to adjust the exposure when I plug it into a new machine. Thereafter it persists… somehow.Or qv4l2, depending on your taste — but one advantage of qv4l2 is that it lets you set the controls even while another app has the camera open, whereas guvcview wants the camera for its own preview window, and will decline to work at all if it can’t get the video stream.
Oh very nice,
qv4l2
is exactly what I needed to adjust focus during a meeting. Thank you!update: someone anon-emailed me out of the blue to mention that guvcview has a
-z
or--control-panel
option that will open the control panel without the preview window, letting you do the same thing as qv4l2. So use the one that makes you happy.Congrats, you are working around a hardware problem with a software patch.
Me, I don’t care enough to spend the effort to get the software working. My audio input is an analog mixer, my audio output the same, and eventually my camera will be a DSLR because that way I don’t twiddle with software for something that really should just work on all my machines without me caring.
Different tradeoffs in different environments.
It’s a driver settings tool, not a patch. It doesn’t do post-processing. Every OS just fails to provide this tool, not sure why, possibly because webcam support is spotty and they don’t want to deal with user complaints. Some software (like Teams) include an interface for the settings. Changing it in Teams will make system wide changes. Others (like Zoom) only have post-processing effects, and these are applied after the changes you made in Teams.
I can confirm this tool definitely affects the camera hardware’s exposure setting. I’ve used it for adjusting a camera that was pointed at a screen on a remote system I needed to debug. The surrounding room was dark (yay timezones!) so with automatic exposure settings it was just an overexposed white blur on a dark background. This tool fixed it. There’s no way this would have been possible with just post-processing.
(No, VNC or similar would not have helped, as it was an incompatibility specific to the connected display, so I needed to see the physical output. And by “remote” I mean about 9000km away.)
Sounds like you had some fun
That’s definitely one way of describing it! Not necessarily my choice of words at the time.
Oh, Teams can do this? Thanks, I’ll have to check that out as an alternative.
The DSLR/mirrorless ILC (interchangeable lens camera) route is great for quality but it has its risks. I started off with a $200 entry level kit and now I’ve got two bodies, a dozen lenses, 40,000 pictures, and a creatively fulfilling hobby.
don’t forget the tripod! I like landscape photography, and a good tripod was surprisingly (> $200) expensive.
So the risks are spending too much money?
I fail to see how you’re going to use a DSLR as a webcam without “twiddling with software”. Sure, you’ll have a much better sensor, lens and resulting image quality. But I’ve yet to see a setup (at least with my Canon) that doesn’t require multiple pieces of software to make even work as a webcam. Perhaps other brands have a smoother experience. I still question how this won’t require at least as much software as my route.
There’s also the physical footprint that matters to me. A webcam sits out of the way on top of my monitor with a single cable that plugs into the USB on the monitor. A DSLR is obviously nowhere near this simple in wiring or physical space. It also has a pretty decent pair of microphones that work perfectly for my quiet home office.
Are either the audio or video studio quality? Nope, but that’s completely fine for my use case interacting with some coworkers on video calls.
My perception has been a DSLR with HDMI output gives you the ability to capture HDMI and just shove that as a webcam line.
The other things that a camera does can be tweaked with knobs instead of software.
See also: please stop writing Dockerfiles. Because they incorporate shell in a manner only mildly less deranged than Make.
Oil has all of these options under one group,
oil:basic
, so you don’t have to remember all of them. They’re on by default inbin/oil
, or you can opt in withshopt --set oil:basic
inbin/osh
.Also I think this title is a troll – it should be more like “Pitfalls of Shell” or something like that. Which are pretty well known by now
The answer is to fix shell, not write articles on the Internet telling people not to use it. They use it because it solves certain problems more effectively than other tools
Yeah this post omits the only piece of advice that would make it practical, which is pointing to another programming language that they consider better suited for the job. I’ve written code to launch and tend to processes in a lot of languages and they have all been as error prone as the shell. I don’t think people who bash on shells understand just how complex correct process handling is.
If you wouldn’t mind taking the opportunity to shill, how would you go about convincing somebody to switch to oil shell from bash, assuming they’re willing to ignore the lack of wide-spread deployment of oil? What’s your sales pitch?
Sure, the most compelling case is actually up the thread:
https://lobste.rs/s/iofste/please_stop_writing_shell_scripts#c_m4yng8
That is, you have 3K lines of bash code, AND you want to switch to something else.
Well Oil is basically your only option! It’s the most bash compatible shell by a mile.
There are downsides like Oil needs to be faster, but if you actually have that much shell, it’s worth it start running your scripts under Oil right now.
Some notes here: https://github.com/oilshell/oil/wiki/How-To-Test-OSH
Feel free to post on
#oil-help
on Zulip, file issues on Github, etc.I have a draft of other reasons here: https://www.oilshell.org/why.html
Most of them amount to better tools and error messages. Oil is like a ShellCheck but at runtime. ShellCheck can’t catch certain things like bad
set -e
usage patterns because some of them can only be detected at runtime – or you would have a false positive on every line. (I should do a blog post about this.)I also put some notes about “the killer use case” here:
http://www.oilshell.org/blog/2021/12/backlog-assess.html#punting-the-interactive-shell
i.e. I started running my CI in containers, and I think many people do. Oil not being installed isn’t a big issue there because you have to install everything into a container :)
Although we probably need a bootstrap script, e.g. like rustup, if your distro doesn’t have it. (Many do, but not all.)
Let me know if that makes sense!
I love your shill. It’s always upbeat and on topic.
I wrote ansible roles to install oil on my Linux/FreeBSD servers at home.
I’d guess itamarst really meant the title (subject to the caveats in the article), but also that he wasn’t talking about alternate shells like Oil, as they are really a different matter. Nobody writes “don’t use fish” articles, and Oil is in the same boat — it isn’t available by default, waiting to blow your hand off, so there’s no need to warn folks away from it.
Any language that has been designed rather than duct-taped together over decades is going to avoid shell’s (bash/dash/ash/POSIX sh’s) faults. Please continue doing this! When
/bin/oil
is part of a stock Debian install we can start telling people to put#!/bin/env/oil
at the top instead, but until then I think it’s sensible to post these warnings periodically since OSHA is unlikely to step in.GraphQL?
I work on a project that gets type safety through the front-end this way. The downside is that GraphQL is JSON so it’s less efficient than something like Protobuf, Thrift, etc.
Thanks, I’d wondered about GraphQL. I’d been leaning toward a trpc with Prisma on the backend, so I wasn’t really considering GraphQL. But I’ll give it another look.
Thanks for the tip about Strawberry, too. TBH I’ve been pretty surprised that I haven’t found any active Python libraries that generate JSON Schema or Protobuf schemas from Python type annotations. Maybe I’ve just missed it, but I’ve done a fair amount of searching at this point.