If you go to his web site (nexbridge.com), it appears he is actually selling a version of Git for NonStop computers. So he’s advocating for his own personal business, which is fine I guess, but doesn’t really compare to the needs of tens of millions of mainstream OS users who need Git.
If you want to play with this model and don’t have a Mac with 64gb RAM, you can on together.ai. They also have all the other open-source LLMs. There is a generous free tier, and then after that the costs are extremely low (orders of magnitude cheaper than the big companies).
A lot of the JS ecosystem was designed back in a time when web browsers sucked. DOM updates were very slow (necessitating a shadow DOM), there were all kinds of idiosyncrasies between browsers, and there weren’t tons of built-in tools like web workers, client side storage, crypto, canvas, components, etc. So you’d end up with hundreds of dependencies just to fill in holes in the browser functionality. But now browsers are much better and most of that stuff is unnecessary.
That article (from 2021) doesn’t seem to contradict the idea that browsers are way better now in a way that makes a lot of the JS ecosystem less necessary.
Browser differences really are far less common these days. I write a ton of vanilla JavaScript and I very rarely need to think about the differences between different browsers at all, beyond checking https://caniuse.com/ to make sure the feature I want to use is widely supported.
I’m concerned that Bluesky has taken money from VCs, including Blockchain Capital. The site is still in the honeymoon phase, but they will have to pay 10x of that money back.
This is Twitter all over again, including risk of a hostile takeover. I don’t think they’re stupid enough to just let the allegedly-decentralized protocol to take away their control when billions are at stake. They will keep users captive if they have to.
Hypothetically, if BlueSky turned evil, they could:
ban outside PDSes to be able to censor more content
block outside AppViews from reading their official PDS
This would give them more or less total control. Other people could start new BlueSky clones, but they wouldn’t have the same network.
Is this a real risk? I’m not sure. I do know it’s better than Twitter or Threads which are already monolithic. Mastodon is great but I haven’t been able to get many non-nerds to switch over.
Hypothetically, the admins of a handful of the biggest Mastodon instances, or even just the biggest one, could turn evil and defederate from huge swathes of the network, fork and build in features that don’t allow third-party clients to connect, require login with a local account to view, etc. etc.
Other people could start clones, of course, but they wouldn’t have the same network.
(also, amusingly, the atproto PDS+DID concept actually enables a form of account portability far above and beyond what Mastodon/ActivityPub allow, but nobody ever seems to want to talk about that…)
The two situations are not comparable. If mastodon.social disappeared or defederated the rest of the Mastodon (and AP) ecosystem would continue functioning just fine. The userbase is reasonably well distributed. For example in my personal feed only about 15% of the toots are from mastodon.social and in the 250 most recent toots I see 85 different instances.
This is not at all the case for Bluesky today. If bsky.network went away the rest of the network (if you could call it that at that point) would be completely dead in the water.
While I generally agree with your point (my timelines on both accounts probably look similar) just by posting here we’ve probably disqualified ourselves from the mainstream ;) I agree with the post you replied to in a way that joe random (not a software developer) who came from twitter will probably on one of the big instances.
For what its worth I did the sampling on the account where I follow my non-tech interests. A lot of people ended up on smaller instances dedicated to a topic or geographical area.
While it’s sometimes possible to get code at scale without paying – via open source – it’s never possible to get servers and bandwidth at scale without someone dumping in a lot of money. Which means there is a threshold past which anything that connects more than a certain number of people must receive serious cash to remain in operation. Wikipedia tries to do it on the donation model, Mastodon is making a go at that as well, but it’s unclear if there are enough people willing to kick in enough money to multiple different things to keep them all running. I suspect Mastodon (the biggest and most “central” instances in the network) will not be able to maintain their present scale through, say, an economic downturn in the US.
So there is no such thing as a network which truly connects all the people you’d want to see connected and which does not have to somehow figure out how to get the money to keep the lights on. Bluesky seems to be proactively thinking about how they can make money and deal with the problem, which to me is better than the “pray for donations” approach of the Fediverse.
Your point is valid, though a notable difference with the fediverse is the barrier to entry is quite low - server load starts from zero and scales up more or less proportionally to following/follower activity, such that smallish but fully-functional instances can be funded out of the hobby money of the middle+ classes of the world. If they’re not sysadmins they can give that money to masto.host or another vendor and the outcome is the same. This sort of decentralisation carries its own risks (see earlier discussion about dealing with servers dying spontaneously) but as a broader ecosystem it’s also profoundly resilient.
a notable difference with the fediverse is the barrier to entry is quite low
The problem with this approach is the knowledge and effort and time investment required to maintain one’s own Mastodon instance, or an instance for one’s personal social circle. The average person simply is never going to self-host a personal social media node, and even highly technical and highly motivated people often talk about regretting running their own personal single-account Mastodon instances.
I think Mastodon needs a better server implementation, one that is very low-maintenance and cheap to run. The official server has many moving parts, and the protocol de-facto needs an image cache that can get expensive to host. This is solvable.
Right! I’ve been eyeing off GoToSocial but haven’t had a chance to play with it yet. They’re thinking seriously about how to do DB imports from Mastodon, which will be really cool if they can pull it off: https://github.com/superseriousbusiness/gotosocial/issues/128
That’s true, but I’ve been hooked on Twitter quite heavily (I’ve been an early adopter), and invested in having a presence there.
The Truth Social switcheroo has been painful for me, so now I’d rather have a smaller network than risk falling into the same trap again.
Relevant blog post from Bluesky. I’d like to think VCs investing into a PBC with an open source product would treat this differently than Twitter, but only time will tell.
OpenAI never open sourced their code, so Bluesky is a little bit different. It sill has risks but the level of risk is quite a bit lower than OpenAI was.
OpenAI open sourced a lot and of course made their research public before GPT3 (whose architecture didn’t change much[1]). I understand the comparison, but notably OpenAI planned to do this pseudo-non-profit crap from the start. Bluesky in comparison seems to be “more open”. If Bluesky turned evil, then the protocols and software will exist beyond their official servers, which cannot be said for ChatGPT.
[1]: not that we actually know that for a fact since their reports are getting ever more secretive. I forget exactly how open that GPT3 paper was, but regardless the industry already understood how to build LLMs at that point.
I was actually talking to a friend yesterday about how hard it would be to host an AppView, which hosts the actual “live” copy of all the posts viewable in the BlueSky site. To my knowledge, BlueSky hasn’t released any hardware specs on this; however I went back through some old posts from Jake Gold, their infrastructure master, and it looks like as of about 9 months ago there were 2 ScyllaDB clusters (one on each coast), each with 8 maxed-out nodes. (By “maxed out” - 384 threads, 1.5 TB RAM, 360 TB NVMe storage.) The two clusters are redundant, so if you were hosting yourself you would only need one. Presumably there are more nodes per cluster today now that the network is bigger.
So that’s roughly ~$500,000 in hardware just to host the AppView. Not sure on bandwidth and other costs. Not really something you could do on your own, I think! But also orders of magnitude less expensive than the big social media sites, even proportional for the number of users.
If you were trying to self-host the AppView, I imagine you could restrict it to just a small number of users and a small time-window, which would greatly reduce the cost of course.
Another note: ScyllaDB does not have full-text search, so presumably there is another large cluster somewhere used for ElasticSearch or another full-text search engine.
If you’re just trying to host all the posts, hosting a relay is enough, and a few weeks ago it was claimed it’s like, $150/month. That was pre-Brazil though, so it’s probably more, but far cheaper than $500,000.
My current ballpark estimates make it into a rougly 200 EUR/month Hetzner server if you want your own Relay, which is still neat.
That being said, this gets us into the whole “what counts as hosting Bluesky” thing which is, eh. Yes you’ll have a copy of it all but no AppView to look at it unless you write your own or set up the official one, which is, well, complicated.
Why are there not Rails-like frameworks for Go and Rust? I mean, there are some attempts, but in those communities it seems much more popular to roll your own APIs.
Is anyone working on a serverless Rails-like framework? I know there are things like Serverless Framework, but they don’t fit the goal of rapidly creating an API and storage from specification and then being able to modify it.
There have certainly been a number of attempts at making “Rails but for Go.” See https://gobuffalo.io for example. I think it’s not really a market demand, so it hasn’t taken off the way Sinatra/Flask-style API routers for Go have.
Why are there not Rails-like frameworks for Go and Rust?
Because the old server side “webapp” with a sprinkle of JavaScript doesn’t make too much sense in the age where people expect GUI style interactivity in their browsers. Rails os from the age of clicking from document to document and add some forms for data input.
It survived because it has inertia, and because it is still a useful way of building web applications, but it’s not what the market wants.
I for one, wouldn’t mind a simpler web, retaining more of the old click and refresh paradigm. This very website is a good example. But that’s me. The market has spoken loud and clear the other direction.
Imo, microframeworks are easier to write so there are more of them by quantity. Fuller frameworks take many years so there are only a few of them (if any) in each language. I like to think that web apps or frameworks are near the tippy-top of complexity and high-level-ness. I like to imagine that a full framework that could tackle many of the common problems you will run into will take 5-7 years to make and refine under real developer use. Most people won’t go that far. So you end up with http routers, I wrote why I think this isn’t enough.
If you want to go back, in a way, Remix is sort of a throw-back. I wouldn’t say Remix is an auto-win framework to pick but I think it’s pretty interesting. Remix has a kind of actions/submits going to the same thing/file/route you are currently on, it stays close to web standards and you can get end-to-end type safety like trpc if you use typedjson. Remix is not a click to refresh type of experience though.
Why are there not Rails-like frameworks for Go and Rust?
Go’s web story was really bizarre to me when I was still spending time in that language. Rust has Axum and even Typescript has something like Hono. API based microservice frameworks can (should?) be small and tight.
I like that Google Cloud is easy to use, and has some nice modern features like OIDC authentication that are just newer than AWS’ method. Some very common things on AWS require writing custom Lambda functions which is error-prone and annoying, and GCP has that stuff built in. The Kubernetes support is top-notch, and the observability tooling is really good. It’s also substantially cheaper than AWS for most things.
On the other hand, Google Cloud is REALLY rough around the edges. For example Cloud SQL will lose connections during its maintenance window. On AWS, VPC’s are a first-class networking concept – on Google they support the bare minimum of functionality. When you get into more esoteric stuff like key management, Google has almost no support or documentation, but AWS has a rich set of functionality.
If you’re doing a startup, Google Cloud is a decent choice because it’s cheaper and you can work around the issues as you build. But for an established company I’d almost always recommend AWS.
I tried a number of textbook math and physics problems today, and o1 got most of them right. Granted this is not very scientific, because it could have been exposed to those exact problems in training data, but other LLMs including Claude Sonnet 3.5 and GPT 4o could not solve them correctly. We’re in for some interesting times.
I think the idea is you send out one PR, and then do a new PR that builds on the first (rather than waiting). Sounds like merge hell though if the first needs changes.
The merge hell is what the stacking tools are supposed to help with.
In my experience with just using git rebase --update-refs, if a PR lower in your stack needs changes you can amend those commits and then do a rebase against main or whatever branch is below you in the stack and it’ll also propagate the rebase in the upwards direction.
You know how you build one commit on top of the other, and it’s nice to be able to rebase commits around, or reorder them, squash them, etc?
PR stacking is basically the same but with PRs (i.e. git branches) rather than commits as the atomic unit of change, because GitHub and other mainstream code forges kind of force you to operate at the granularity of a PR (the whole PR review UI, in particular, is PR-oriented, not commit-oriented).
PR stacking needs tooling on top of git to work nicely. Graphite is the only such tool that I have experience with and it does it ok.
GitHub+PR stacking makes for a better developer experience than vanilla GitHub, but we are still fundamentally working around the fact that the patch review UX of vanilla GitHub is pretty bad, as often discussed on this site (and other similar code forges have the same issues, e.g. GitLab is the same in this respect)
the patch review UX of vanilla GitHub is pretty bad, as often discussed on this site
I see it mentioned often on this site, but I’ve never really understood why. The only major issue that I see is not being able to comment on commit messages, but there’s workarounds for that that are simple enough, like squashing and merging very PR such that at the level of the trunk every PR becomes a single commit, and therefore you can discuss the commit message at the level of the PR.
I’ve also rarely seen a critique of a comment message that didn’t border on pedantic, but that’s neither here nor there.
Thank you! So if I understand this correctly, the improvement the author would like would be for GitHub to make it easier to see the whole stack at a time while being able to drill into each individual change and its versions, which is what Graphite and the rest make possible. That’s a fair critique, and I guess working in orgs where we’ve mostly moved fast enough for stacks to not get too large has made it easier for me to not run into these issues.
Plus day-to-day stuff like iterating on iterations of the same PR without losing track of the conversations or losing the context of the conversation; or easily seeing the diff since your last review; or setting the state of a review to “just commenting”, “requesting changes before this can be merged” and “approved, but see comments”
Reading this was kind of depressing, because it makes Canonical hiring sound primarily like a charade based on how well you write and how well you can charm the reader. Like cool, but maybe you should be asking better questions that aren’t engineered to alienate people that don’t know the word games you’re expecting them to play.
I have not heard of a single good experience about Canonical’s hiring/interview process, but I have heard of plenty of bad ones. I feel like, if you can anticipate and identify these issues (e.g. “useless self-deprecation”, nervousness, lack of stories), as an recruiter or interviewer, part of your job is to help defuse them so you can assess the candidate better.
Some might say it’s on the interviewee, but the whole hiring process is so synthetic and unusual by comparison to what it’s like being on the job that I feel a company is doing themselves a huge disservice to leave it at that — you might miss out on the perfect candidate because they didn’t study some engineering director’s twelves rules for job applications.
I feel a company is doing themselves a huge disservice to leave it at that — you might miss out on the perfect candidate because they didn’t study some engineering director’s twelves rules for job applications.
Unless what they’re looking for specifically is candidates who will religiously study the edicts of every director and apply them right away :-). Some organisations thrive on autonomy, initiative and a profession-oriented culture, some thrive on orthodoxy, hierarchy, and a company-oriented culture.
Don’t get me wrong, I would hate to work for the latter kind. Things like asking engineering professionals about high school grades are a huge red flag for me, I wouldn’t work for a company like that if it were the last engineering job in the world and I’d have to go do something else. But there’s more than one way to skin a cat (and this particularly gory variant of this idiom is in fact the appropriate one in this case, I think :-) ).
I didn’t intend this to be a guide for Canonical (I don’t think it even mentions Canonical by name) but for a role anywhere. I think the rules apply generally.
It’s particularly aimed at people who don’t have the advantage of networks of good advice to help them navigate the way that hiring works in software companies, or don’t understand what the expectations are.
I’ve had multiple people at Canonical tell me that their application process is the most selective of any tech company. A lot of people want to work there, and they don’t have a lot of openings, so I guess they can do that. Most companies are selective but not THAT selective.
Let’s assume this is happening. What could the source be?
In most of the world, recording your audio and sending to a server would be considered “wiretapping” and be very illegal. However, doing speech-to-text on device and looking for keywords would probably be legal. Let’s assume it’s a legit company the people involved don’t want to go to jail.
Most phones can’t do speech-to-text onboard without you noticing (they’d lose battery and get hot). Maybe the very latest iPhones have enough spare cycles, but most phones don’t.
Apple or Google would get mad if their app stores were being used to distribute this kind of spyware. So probably no major apps.
Then we’re looking for devices with a large power budget, that aren’t under a lot of scrutiny. Amazon Echo is obvious since it’s got a great high-quality microphone, but Amazon has put a ton of marketing money into convincing people that Echo isn’t listening to your voice all the time, and they don’t want to blow it.
So we’re looking at other devices that are plugged in to the wall, and have a microphone. Cable set-top boxes stand out because they draw a lot of power, and have an independent network connection separate from your home Internet connection so you can’t see what they’re doing. They do have a microphone for voice commands now.
Smart TVs also stand out. No independent network connection, but they often have a microphone and a powerful CPU. Or other TV gadgets like Roku. These are often greatly subsidized by advertisers already.
Home security cameras might be another source, but they normally aren’t in living spaces.
Game consoles are another possibility, although maybe even less likely because the mic is typically in the controller (and would lose charge quickly).
Just some thoughts. If I were trying to track this down, I might start reverse engineering TV-related stuff.
I am firmly convinced that the story here is CMG media lying to their potential clients (in a high touch sales process), not that CMG media blew the lid on a multi-year conspiracy.
This really matters to me. If companies are genuinely doing this it shouldn’t be a “who knows?” situation, it should be a national/international scandal with legislative consequences.
Yeah the more I read about it, the more I’m not so convinced by it being actually true.
But it’s definitely good to keep an eye on this. We’ve had the Samsung patent about saying the brand name loud in front of the TV to pass the app break.
Yeah, Samsung are so bad around this stuff. That’s part of the problem: it’s hard to argue companies aren’t doing creepy unscrupulous things when there are so many companies out there blatantly doing creepy unscrupulous things.
This is why I care so much about accuracy: we need to know exactly who is doing what in order to effectively campaign for them to stop.
It doesn’t pass the “my smartphone isn’t constantly 50C and the battery hasn’t died in an hour” smell test. Checking some text is absolutely trivial in comparison.
It’s unlikely the can access the mic through the FB app itself. Maybe there were some weird workarounds, but I’d be surprised if you could upload any custom code to FB - it’s too tightly controlled.
It was a conspiracy theory that everyday conversations are converted to personalized advertising until now. It was unthinkable that Mark Zuckerberg would sell your data until the Facebook–Cambridge Analytica data scandal even though he made his approach to personal data quite clear much earlier (‘People just submitted it. I don’t know why. They “trust me”. Dumb fucks’). It was unthinkable that good guys sporting a slogan saying “Don’t be evil” will become the evil.
One may be looking at this from everyday-developer’s diluted perspective of restricted ABI/API access, which may (my assumption) be less limited for certain so-called “partners” (other big tech, advertisers, insurance, other entities interested in purchasing personal data).
A this point, there are no conspiracy theories around personal data farming. For me it’s a pure distrust towards entities which lose one privacy-related lawsuit after another, and appear to have the personal data abuse lawsuits costs simply written into annual budgets as the “cost of doing business”.
Facebook exploited bugs in Android to read all of your text messages (without permission) and do targeted advertising based on that. They also did an MITM attack using a VPN product that they bought to read all your Snapchat messages (although that likely affected only a small number of people). I’d believe they’d do almost anything to get your private data. Fortunately they’re under a lot of scrutiny now so it’s harder for them.
I find SSH CAs and TPMs/Trusted enclaves super fascinating, so I love this. I wish there was a fully featured open source SSH CA that one could use that also handled user/sudo provisioning.
that one could use that also handled user/sudo provisioning.
I’m not quite sure why you would want provisioning though? With ssh-tpm-ca-authority you can have the sudo delegation as part of the config. The user you are requesting doesn’t need to be your user, it could be root or some sudo-able user.
Of course this solution is not fully featured and has quite a road until it would be something you can actually deploy, but as a basic solution for this problem I think it works?
I guess I don’t really need provisioning, it was more that my understanding was that there’s no out of the box pam module that will take certificate attributes and use them for user/sudo privileges. I will say that true user provisioning is better for some places as you can easily tie audit logs to specific users.
This is a very good post. It is difficult to justify contributing to OSS when, if your project is successful, it will just be taken wholesale by a commercial vendor. And when the employees of those vendors make stratospheric salaries, but will generally not hire people with that open source background. This applies to both hobbyist projects and commercial open source projects.
but will generally not hire people with that open source background
That’s odd - why wouldn’t they hire the most experienced and most influential people on the project they’re using? There are many reasons why they would do that:
The person in question is known to be qualified, no lengthy interview process needed
The person in question already has the experience, so no training period needed before they become productive
The person in question has direct commit access, so (depending on the policy of course), no need to make PRs and submit them to review
It’d allow the company to influence the direction of the project without having to fork it
It buys goodwill in the community
And in fact, there have been plenty of cases where a core developer got hired precisely due to their work on an essential piece of technology. Or they just started a business themselves and hired a big chunk of the rest of the core team.
What a contrast in technology levels. They could launch and land a 100 ton reusable crewed orbiter, almost fully automatically, which we can’t get back to today. But printing out text messages was a huge problem.
There are companies that build airgapped platforms (usually for governments or government contractors). They have everything you’re used to, with YUM, Docker, Kubernetes, NPM, pip, etc, but all maintained with no internet access. And they are very, very expensive.
Crowdstrike is primarily an antivirus product. I think it’s telling that we don’t call it “antivirus” because Norton and McAfee gave the entire field of antivirus a terrible reputation, since they crashed all the time and demanded payment frequently. But basically it does the same thing: monitor all the processes on your machine and make sure none have the traits of known malware.
It is generally considered a very good product, and many security consultancies and even insurance companies recommend it be installed on all machines. It is pretty silent so it could be on your work computer and you wouldn’t know unless you looked for it.
It’s available for Linux too, and is often baked into companies’ server and cloud images. Fortunately the Linux version didn’t crash, since that would have been even more damaging. There is also a Mac version, which didn’t crash either.
Also notable that if a Windows cloud instance BSODs, there is usually no way to boot it in safe mode and remove the offending component. So many Windows cloud shops are just stuck rebooting over and over, hoping it will recover itself.
I yell a lot about the software crisis, but this is a symptom of it.
The trigger on this update was pulled presumably without testing. With the effects being as wide-spread as they are, I doubt this was a “works on my machine” situation.
With the pervasive devaluation of programming as a trade by managerial forces, and the progressive migration to fungible workers, I doubt this will be the last time this happens on this scale.
If you’ve ever done any driver programming, especially windows drivers, you will know that it is difficult to test the full matrix of every possible version / configuration / matrix of other drivers on the machine.
I would expect driver developers working on something as wide spread a crowdstrike to be well versed in the nuances… but when I worked at microsoft even the full time kernel team folks would make mistakes that would cause major issues internally for people self hosting windows. I maintained a relatively small boot driver that ran on windows 8 - windows 11 (client and server), and the matrix of possible interactions your code could have was overwhelming to test at times (https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/managing-hardware-priorities).
I’m not making excuses for them, but I imagine this is not something you can chalk up to just pure negligence.
Oh, I guarantee you the things that get shipped are difficult to test. That’s a given.
This isn’t on the engineers. This is on management. CrowdStrike has been undergoing rolling layoffs and I guarantee you some of the folks that were gutted were QA-related.
It’s unfortunately a common tale. Cut staff, cut morale, micromanage, construct a meat grinder, and things like this will happen.
I agree, but consider that it is pretty hard to test antivirus updates since they are extremely time-sensitive. They typically try to respond to the latest threats within a few days. There is no real possibility of weeks of testing, or staged rollouts, or other practices companies normally use.
Heartbreaking to find out how many embedded systems in the world run Windows (and then with a security snake-oil rootkit on top) when they really have no reason to.
None? Or one of the minimal embedded? I get where the convenience comes from, but when a fully dedicated terminal like an ATM or checkout machine boots windows, I just have to question it - 99% of the deployed code there does nothing but expand the vulnerable scope and wait for an opportunity to fail.
Not true. A windows environment helps them debug and speeds up development, even in deployment. It’s unreasonable to disregard the amount of engineering time saved because of a black swan event like today.
Speed up development: you can develop a QT app mostly on windows even if you run it under QNX on the target.
Debug: you do lose a full user interactive environment, but for proper testing you needed to run the target device anyway. From my experience once you have a debug interface working, it’s not much harder. On the other hand the number of moving pieces drops significantly - if it’s not in your code, it doesn’t happen.
But I’m serious about the inactive code. How much user specific things and random services are there, even on a cut down, embedded version on windows? Why do they need to exist there at all?
Speed up development: you can develop a QT app mostly on windows even if you run it under QNX on the target.
This is tangential but development environment that doesn’t mirror your deployment environment is a recipe for disaster.
But I’m serious about the inactive code. How much user specific things and random services are there, even on a cut down, embedded version on windows? Why do they need to exist there at all?
Yes but my point is that this is cherry picking. An equally fair and rational assessment could be “that code is never activated or exposed and we can afford having it there so it doesn’t matter,” and that would be good engineering. In this specific example, the downtime was not due to Windows but rather extra third-party components chosen by the developers.
Development environment not matching production is a given. It’s just a question of how different and how much testing we can do. I can’t get a graviton CPU that AWS uses in production on my machine for example. But we deploy to a test environment and cope. I’m not developing on ATmega chips either, because it’s impossible. It’s ok.
The dead code matters for security though - the multi user mode, the kernel parsers for unused formats, the various RPC demons, the ability to install drivers - that’s why crowdstrike was installed there in the first place. If you get rid of the ability to install known malware and run from a read-only device, you don’t need to worry about this protection mechanism in the first place. You can get rid of the whole vulnerability class.
Development environment not matching production is a given.
Why does it have to be? I previously did ATM OS engineering for a bank that rolled their own (uncommon versus just paying Diebold or whoever). The devs had their own ATM lab so they didn’t face this problem.
I think we have a naming issue. I totally agree with you that devs getting the target platform is normal and perfectly fine. I was responding to the previous poster about the dev environment not mirroring target as in “you don’t have to run QNX to develop for QNX”. Both remote debugging over JTAG/whatever and testing parts of the system on a different architecture is fine.
Not true. A windows environment helps them debug and speeds up development, even in deployment.
Do you have any data on this? I’m fairly sure not just money but also lives were lost during this outage, as hospitals and emergency services were impacted from it. It goes completely against my own personal experience, and also against Microsoft’s own messaging by including WSL as a feature focused at developers. You wouldn’t condemn people to death without hard data on quantifiable comforts their deaths bring you, yes?
No I don’t have data on whether or not a familiar OS environment speeds up development as a base platform upon which to build.
I’m fairly sure not just money but also lives were lost during this outage, as hospitals and emergency services were impacted from it.
You wouldn’t condemn people to death without hard data on quantifiable comforts their deaths bring you, yes?
There is a lot wrong with this “Windows is killing people!” argument.
The root cause had nothing to do with Windows. This was due to a CrowdStrike
On the same basis of data, what data do you have that Windows specifically causes people to die any more than any other software platform?
Asking whether I personally condemn people to death because I’m questioning your logic is a pretty blatant loaded question and example of ad hominem. Let’s get the facts straight first before we start assessing who is and to what extent they are to blame for human death.
The root cause had nothing to do with Windows. This was due to a CrowdStrike
Actually if Windows didn’t crash when its drivers segfaulted (microkernel designs solve this), or even if the design of Windows didn’t make IT departments believe that it was necessary to purchase and install so much additional IT infrastructure beyond what the base system provides, this problem would’ve been avoided completely so actually this is in fact also a problem with Windows for high reliability environments.
Garbage in, garbage out. You can trade crashing in for a different erroneous behavior but you can’t avoid the erroneous behavior altogether. There are ways to build high reliability systems with windows, despite the existence of BSODs. These systems just weren’t designed to be high reliability, that’s not the fault of the operating system.
A product is either made to achieve high reliability or is modified to achieve high reliability. the latter does not constitute a high-reliable product but a potentially reliable one. this is my point, which you agree with in other words.
I believe your
Original point was that using a microkernel would have avoided this issue because the BSOD would instead result in a process crash.
My counter point is that whether or not the error manifested itself as a BSOD or process crash, you still need to design your software to recover from a crash. I.e. microkernels don’t give you high reliability for free, you still need to handle the failure state. E.g. your micro kernel process could end up crash looping, effectively locking out the system in the same way as a BSOD.
The critical reason for the hard failure isn’t that it manifested as a BSOD, it’s that the software wasn’t designed to handle crashes. This isn’t the role of the operating system, it’s the role of application software. So no, windows is not to blame here.
And just FYI BSODs can be recovered from via a watchdog timer.
On the same basis of data, what data do you have that Windows specifically causes people to die any more than any other software platform?
Actually, I bet this is correct. But it’s correct because of its pervasiveness. This is like looking at how many people die in a country but not normalizing it to population; the most populous countries will invariably have the most deaths. Epidemiology 101 would demand a rate here, not an absolute count.
You wouldn’t condemn people to death without hard data on quantifiable comforts their deaths bring you, yes?
I’m pretty sure almost any lifestyle comfortable enough to give one a personal computer and free time to spend on Lobste.rs kills people (e.g. through contributing to greenhouse gas emissions that contribute to storms, floods, droughts, and famines; or through the mining of “conflict minerals” (even if the latter one is changing, and I’m not sure it is, we at least used to kill people through it for our technological conveniences)). I doubt most of us have hard data on the trade-off.
I’d guess it’s absurdly difficult to develop ATM software and then distribute it to bare metal or even Linux. Like have you seen the state of packaged software on Linux even recently?
Automated boot assessment means that if a system is updated and put into an unbootable state, even if it hard-freezes, a watchdog timer will reset it, and the bootloader will revert back to the last successfully booted whole-OS image.
Windows does not allow you to do this. Every software you run on it has its own ad-hoc update mechanism. You are completely misunderstanding what “A/B” means here; it has nothing to do with A/B testing in user studies.
even if it hard-freezes, a watchdog timer will reset it, and the bootloader will revert back to the last successfully booted whole-OS image.
Windows does not allow you to do this.
I see. I assume that generally developers would not be updating the bootloader / early-kernel level components of the system. This is all something that would be done in user-space after you’ve been passed control and Windows does allow you to build in this functionality.
Windows doesn’t have it built in, but I don’t think it prevents you from implementing it either. The watchdog can be enabled, the default boot entry can be changed from the booted system, and even an A/B/scratch split can be implemented with a minimum addition in the bootloader.
I read that a linux sysadmin was facing a similar issue recently because Crowdstrike was pushing a package incompatible with the distro they’re running
It has two different modes of operation on Linux depending on whether they have a build that matches your kernel. In one mode, it installs a kernel module, which is pretty much exactly as root-kitty as the Windows flavor. In the other, it runs as root and installs a lot of eBPF hooks to observe system behavior, which is slightly less intrusive and slightly less likely to take everything down.
What I always find fascinating in these projects is that commercial niche entity act as if the FOSS projects are vendors.
Do they significantly contribute, monetarily or in time or in expertise, to the project?
If not, listening to them is… Highly optional
Like, if git is so essential to strategic financial infrastructure, I am sure someone can find a budget for git.
If not… Well i suppose it is not that strategic
If you go to his web site (nexbridge.com), it appears he is actually selling a version of Git for NonStop computers. So he’s advocating for his own personal business, which is fine I guess, but doesn’t really compare to the needs of tens of millions of mainstream OS users who need Git.
If you want to play with this model and don’t have a Mac with 64gb RAM, you can on together.ai. They also have all the other open-source LLMs. There is a generous free tier, and then after that the costs are extremely low (orders of magnitude cheaper than the big companies).
Amusing fact: Windows still ships the command “pentnt” which exists to detect the 1993 FDIV bug, and enable a software workaround if necessary.
A lot of the JS ecosystem was designed back in a time when web browsers sucked. DOM updates were very slow (necessitating a shadow DOM), there were all kinds of idiosyncrasies between browsers, and there weren’t tons of built-in tools like web workers, client side storage, crypto, canvas, components, etc. So you’d end up with hundreds of dependencies just to fill in holes in the browser functionality. But now browsers are much better and most of that stuff is unnecessary.
I strongly disagree. The same problems that existed years ago exist today: https://bower.sh/my-love-letter-to-front-end-web-development
That article (from 2021) doesn’t seem to contradict the idea that browsers are way better now in a way that makes a lot of the JS ecosystem less necessary.
Browser differences really are far less common these days. I write a ton of vanilla JavaScript and I very rarely need to think about the differences between different browsers at all, beyond checking https://caniuse.com/ to make sure the feature I want to use is widely supported.
I’m concerned that Bluesky has taken money from VCs, including Blockchain Capital. The site is still in the honeymoon phase, but they will have to pay 10x of that money back.
This is Twitter all over again, including risk of a hostile takeover. I don’t think they’re stupid enough to just let the allegedly-decentralized protocol to take away their control when billions are at stake. They will keep users captive if they have to.
Hypothetically, if BlueSky turned evil, they could:
This would give them more or less total control. Other people could start new BlueSky clones, but they wouldn’t have the same network.
Is this a real risk? I’m not sure. I do know it’s better than Twitter or Threads which are already monolithic. Mastodon is great but I haven’t been able to get many non-nerds to switch over.
Hypothetically, the admins of a handful of the biggest Mastodon instances, or even just the biggest one, could turn evil and defederate from huge swathes of the network, fork and build in features that don’t allow third-party clients to connect, require login with a local account to view, etc. etc.
Other people could start clones, of course, but they wouldn’t have the same network.
(also, amusingly, the atproto PDS+DID concept actually enables a form of account portability far above and beyond what Mastodon/ActivityPub allow, but nobody ever seems to want to talk about that…)
The two situations are not comparable. If mastodon.social disappeared or defederated the rest of the Mastodon (and AP) ecosystem would continue functioning just fine. The userbase is reasonably well distributed. For example in my personal feed only about 15% of the toots are from mastodon.social and in the 250 most recent toots I see 85 different instances.
This is not at all the case for Bluesky today. If bsky.network went away the rest of the network (if you could call it that at that point) would be completely dead in the water.
While I generally agree with your point (my timelines on both accounts probably look similar) just by posting here we’ve probably disqualified ourselves from the mainstream ;) I agree with the post you replied to in a way that joe random (not a software developer) who came from twitter will probably on one of the big instances.
For what its worth I did the sampling on the account where I follow my non-tech interests. A lot of people ended up on smaller instances dedicated to a topic or geographical area.
While it’s sometimes possible to get code at scale without paying – via open source – it’s never possible to get servers and bandwidth at scale without someone dumping in a lot of money. Which means there is a threshold past which anything that connects more than a certain number of people must receive serious cash to remain in operation. Wikipedia tries to do it on the donation model, Mastodon is making a go at that as well, but it’s unclear if there are enough people willing to kick in enough money to multiple different things to keep them all running. I suspect Mastodon (the biggest and most “central” instances in the network) will not be able to maintain their present scale through, say, an economic downturn in the US.
So there is no such thing as a network which truly connects all the people you’d want to see connected and which does not have to somehow figure out how to get the money to keep the lights on. Bluesky seems to be proactively thinking about how they can make money and deal with the problem, which to me is better than the “pray for donations” approach of the Fediverse.
Your point is valid, though a notable difference with the fediverse is the barrier to entry is quite low - server load starts from zero and scales up more or less proportionally to following/follower activity, such that smallish but fully-functional instances can be funded out of the hobby money of the middle+ classes of the world. If they’re not sysadmins they can give that money to masto.host or another vendor and the outcome is the same. This sort of decentralisation carries its own risks (see earlier discussion about dealing with servers dying spontaneously) but as a broader ecosystem it’s also profoundly resilient.
The problem with this approach is the knowledge and effort and time investment required to maintain one’s own Mastodon instance, or an instance for one’s personal social circle. The average person simply is never going to self-host a personal social media node, and even highly technical and highly motivated people often talk about regretting running their own personal single-account Mastodon instances.
I think Mastodon needs a better server implementation, one that is very low-maintenance and cheap to run. The official server has many moving parts, and the protocol de-facto needs an image cache that can get expensive to host. This is solvable.
Right! I’ve been eyeing off GoToSocial but haven’t had a chance to play with it yet. They’re thinking seriously about how to do DB imports from Mastodon, which will be really cool if they can pull it off: https://github.com/superseriousbusiness/gotosocial/issues/128
Worst case one moves off again. That’s a problem for a future date.
That’s true, but I’ve been hooked on Twitter quite heavily (I’ve been an early adopter), and invested in having a presence there. The Truth Social switcheroo has been painful for me, so now I’d rather have a smaller network than risk falling into the same trap again.
Relevant blog post from Bluesky. I’d like to think VCs investing into a PBC with an open source product would treat this differently than Twitter, but only time will tell.
OpenAI was a “non profit” until it wasn’t.
OpenAI never open sourced their code, so Bluesky is a little bit different. It sill has risks but the level of risk is quite a bit lower than OpenAI was.
OpenAI open sourced a lot and of course made their research public before GPT3 (whose architecture didn’t change much[1]). I understand the comparison, but notably OpenAI planned to do this pseudo-non-profit crap from the start. Bluesky in comparison seems to be “more open”. If Bluesky turned evil, then the protocols and software will exist beyond their official servers, which cannot be said for ChatGPT.
[1]: not that we actually know that for a fact since their reports are getting ever more secretive. I forget exactly how open that GPT3 paper was, but regardless the industry already understood how to build LLMs at that point.
Thanks for writing this up, Alice!
I was actually talking to a friend yesterday about how hard it would be to host an AppView, which hosts the actual “live” copy of all the posts viewable in the BlueSky site. To my knowledge, BlueSky hasn’t released any hardware specs on this; however I went back through some old posts from Jake Gold, their infrastructure master, and it looks like as of about 9 months ago there were 2 ScyllaDB clusters (one on each coast), each with 8 maxed-out nodes. (By “maxed out” - 384 threads, 1.5 TB RAM, 360 TB NVMe storage.) The two clusters are redundant, so if you were hosting yourself you would only need one. Presumably there are more nodes per cluster today now that the network is bigger.
So that’s roughly ~$500,000 in hardware just to host the AppView. Not sure on bandwidth and other costs. Not really something you could do on your own, I think! But also orders of magnitude less expensive than the big social media sites, even proportional for the number of users.
If you were trying to self-host the AppView, I imagine you could restrict it to just a small number of users and a small time-window, which would greatly reduce the cost of course.
Another note: ScyllaDB does not have full-text search, so presumably there is another large cluster somewhere used for ElasticSearch or another full-text search engine.
If you’re just trying to host all the posts, hosting a relay is enough, and a few weeks ago it was claimed it’s like, $150/month. That was pre-Brazil though, so it’s probably more, but far cheaper than $500,000.
My current ballpark estimates make it into a rougly 200 EUR/month Hetzner server if you want your own Relay, which is still neat.
That being said, this gets us into the whole “what counts as hosting Bluesky” thing which is, eh. Yes you’ll have a copy of it all but no AppView to look at it unless you write your own or set up the official one, which is, well, complicated.
Two related questions:
There have certainly been a number of attempts at making “Rails but for Go.” See https://gobuffalo.io for example. I think it’s not really a market demand, so it hasn’t taken off the way Sinatra/Flask-style API routers for Go have.
Because the old server side “webapp” with a sprinkle of JavaScript doesn’t make too much sense in the age where people expect GUI style interactivity in their browsers. Rails os from the age of clicking from document to document and add some forms for data input.
It survived because it has inertia, and because it is still a useful way of building web applications, but it’s not what the market wants.
I for one, wouldn’t mind a simpler web, retaining more of the old click and refresh paradigm. This very website is a good example. But that’s me. The market has spoken loud and clear the other direction.
Imo, microframeworks are easier to write so there are more of them by quantity. Fuller frameworks take many years so there are only a few of them (if any) in each language. I like to think that web apps or frameworks are near the tippy-top of complexity and high-level-ness. I like to imagine that a full framework that could tackle many of the common problems you will run into will take 5-7 years to make and refine under real developer use. Most people won’t go that far. So you end up with http routers, I wrote why I think this isn’t enough.
If you want to go back, in a way, Remix is sort of a throw-back. I wouldn’t say Remix is an auto-win framework to pick but I think it’s pretty interesting. Remix has a kind of actions/submits going to the same thing/file/route you are currently on, it stays close to web standards and you can get end-to-end type safety like trpc if you use typedjson. Remix is not a click to refresh type of experience though.
Go’s web story was really bizarre to me when I was still spending time in that language. Rust has Axum and even Typescript has something like Hono. API based microservice frameworks can (should?) be small and tight.
I like that Google Cloud is easy to use, and has some nice modern features like OIDC authentication that are just newer than AWS’ method. Some very common things on AWS require writing custom Lambda functions which is error-prone and annoying, and GCP has that stuff built in. The Kubernetes support is top-notch, and the observability tooling is really good. It’s also substantially cheaper than AWS for most things.
On the other hand, Google Cloud is REALLY rough around the edges. For example Cloud SQL will lose connections during its maintenance window. On AWS, VPC’s are a first-class networking concept – on Google they support the bare minimum of functionality. When you get into more esoteric stuff like key management, Google has almost no support or documentation, but AWS has a rich set of functionality.
If you’re doing a startup, Google Cloud is a decent choice because it’s cheaper and you can work around the issues as you build. But for an established company I’d almost always recommend AWS.
I tried a number of textbook math and physics problems today, and o1 got most of them right. Granted this is not very scientific, because it could have been exposed to those exact problems in training data, but other LLMs including Claude Sonnet 3.5 and GPT 4o could not solve them correctly. We’re in for some interesting times.
I don’t think I learned from this what stacking is or how to do it, just the claimed benefits.
I think the idea is you send out one PR, and then do a new PR that builds on the first (rather than waiting). Sounds like merge hell though if the first needs changes.
The merge hell is what the stacking tools are supposed to help with.
In my experience with just using
git rebase --update-refs, if a PR lower in your stack needs changes you can amend those commits and then do a rebase against main or whatever branch is below you in the stack and it’ll also propagate the rebase in the upwards direction.You know how you build one commit on top of the other, and it’s nice to be able to rebase commits around, or reorder them, squash them, etc?
PR stacking is basically the same but with PRs (i.e. git branches) rather than commits as the atomic unit of change, because GitHub and other mainstream code forges kind of force you to operate at the granularity of a PR (the whole PR review UI, in particular, is PR-oriented, not commit-oriented).
PR stacking needs tooling on top of git to work nicely. Graphite is the only such tool that I have experience with and it does it ok.
GitHub+PR stacking makes for a better developer experience than vanilla GitHub, but we are still fundamentally working around the fact that the patch review UX of vanilla GitHub is pretty bad, as often discussed on this site (and other similar code forges have the same issues, e.g. GitLab is the same in this respect)
I see it mentioned often on this site, but I’ve never really understood why. The only major issue that I see is not being able to comment on commit messages, but there’s workarounds for that that are simple enough, like squashing and merging very PR such that at the level of the trunk every PR becomes a single commit, and therefore you can discuss the commit message at the level of the PR.
I’ve also rarely seen a critique of a comment message that didn’t border on pedantic, but that’s neither here nor there.
Check out this gist, which is the best writing I know on the internet about this subject: https://gist.github.com/thoughtpolice/9c45287550a56b2047c6311fbadebed2
Thank you! So if I understand this correctly, the improvement the author would like would be for GitHub to make it easier to see the whole stack at a time while being able to drill into each individual change and its versions, which is what Graphite and the rest make possible. That’s a fair critique, and I guess working in orgs where we’ve mostly moved fast enough for stacks to not get too large has made it easier for me to not run into these issues.
Plus day-to-day stuff like iterating on iterations of the same PR without losing track of the conversations or losing the context of the conversation; or easily seeing the diff since your last review; or setting the state of a review to “just commenting”, “requesting changes before this can be merged” and “approved, but see comments”
Reading this was kind of depressing, because it makes Canonical hiring sound primarily like a charade based on how well you write and how well you can charm the reader. Like cool, but maybe you should be asking better questions that aren’t engineered to alienate people that don’t know the word games you’re expecting them to play.
I have not heard of a single good experience about Canonical’s hiring/interview process, but I have heard of plenty of bad ones. I feel like, if you can anticipate and identify these issues (e.g. “useless self-deprecation”, nervousness, lack of stories), as an recruiter or interviewer, part of your job is to help defuse them so you can assess the candidate better.
Some might say it’s on the interviewee, but the whole hiring process is so synthetic and unusual by comparison to what it’s like being on the job that I feel a company is doing themselves a huge disservice to leave it at that — you might miss out on the perfect candidate because they didn’t study some engineering director’s twelves rules for job applications.
Unless what they’re looking for specifically is candidates who will religiously study the edicts of every director and apply them right away :-). Some organisations thrive on autonomy, initiative and a profession-oriented culture, some thrive on orthodoxy, hierarchy, and a company-oriented culture.
Don’t get me wrong, I would hate to work for the latter kind. Things like asking engineering professionals about high school grades are a huge red flag for me, I wouldn’t work for a company like that if it were the last engineering job in the world and I’d have to go do something else. But there’s more than one way to skin a cat (and this particularly gory variant of this idiom is in fact the appropriate one in this case, I think :-) ).
I didn’t intend this to be a guide for Canonical (I don’t think it even mentions Canonical by name) but for a role anywhere. I think the rules apply generally.
It’s particularly aimed at people who don’t have the advantage of networks of good advice to help them navigate the way that hiring works in software companies, or don’t understand what the expectations are.
I’ve had multiple people at Canonical tell me that their application process is the most selective of any tech company. A lot of people want to work there, and they don’t have a lot of openings, so I guess they can do that. Most companies are selective but not THAT selective.
Let’s assume this is happening. What could the source be?
Just some thoughts. If I were trying to track this down, I might start reverse engineering TV-related stuff.
Funny how they all suddenly “remove the Partnership” or “investigate” it. Why does an ad-partner even have the right to access the microphone?
Unless something has drastically changed since I worked at FB a few years ago, they don’t.
I think they removed the partnership because their partner was lying about what their technology could do in order to scam that partner’s customers.
Could be, though we’ve seen the clipboard-sniffing already, so who knows.
See comment here: https://lobste.rs/s/mf7guc/leak_facebook_partner_brags_about#c_jxayna
I am firmly convinced that the story here is CMG media lying to their potential clients (in a high touch sales process), not that CMG media blew the lid on a multi-year conspiracy.
This really matters to me. If companies are genuinely doing this it shouldn’t be a “who knows?” situation, it should be a national/international scandal with legislative consequences.
Yeah the more I read about it, the more I’m not so convinced by it being actually true.
But it’s definitely good to keep an eye on this. We’ve had the Samsung patent about saying the brand name loud in front of the TV to pass the app break.
Yeah, Samsung are so bad around this stuff. That’s part of the problem: it’s hard to argue companies aren’t doing creepy unscrupulous things when there are so many companies out there blatantly doing creepy unscrupulous things.
This is why I care so much about accuracy: we need to know exactly who is doing what in order to effectively campaign for them to stop.
It doesn’t pass the “my smartphone isn’t constantly 50C and the battery hasn’t died in an hour” smell test. Checking some text is absolutely trivial in comparison.
It’s unlikely the can access the mic through the FB app itself. Maybe there were some weird workarounds, but I’d be surprised if you could upload any custom code to FB - it’s too tightly controlled.
It was a conspiracy theory that everyday conversations are converted to personalized advertising until now. It was unthinkable that Mark Zuckerberg would sell your data until the Facebook–Cambridge Analytica data scandal even though he made his approach to personal data quite clear much earlier (‘People just submitted it. I don’t know why. They “trust me”. Dumb fucks’). It was unthinkable that good guys sporting a slogan saying “Don’t be evil” will become the evil.
One may be looking at this from everyday-developer’s diluted perspective of restricted ABI/API access, which may (my assumption) be less limited for certain so-called “partners” (other big tech, advertisers, insurance, other entities interested in purchasing personal data).
A this point, there are no conspiracy theories around personal data farming. For me it’s a pure distrust towards entities which lose one privacy-related lawsuit after another, and appear to have the personal data abuse lawsuits costs simply written into annual budgets as the “cost of doing business”.
Facebook exploited bugs in Android to read all of your text messages (without permission) and do targeted advertising based on that. They also did an MITM attack using a VPN product that they bought to read all your Snapchat messages (although that likely affected only a small number of people). I’d believe they’d do almost anything to get your private data. Fortunately they’re under a lot of scrutiny now so it’s harder for them.
I find SSH CAs and TPMs/Trusted enclaves super fascinating, so I love this. I wish there was a fully featured open source SSH CA that one could use that also handled user/sudo provisioning.
smallstep can create SSH keys, but it’s not really intended for this.
I’m not quite sure why you would want provisioning though? With
ssh-tpm-ca-authorityyou can have the sudo delegation as part of the config. The user you are requesting doesn’t need to be your user, it could berootor some sudo-able user.Of course this solution is not fully featured and has quite a road until it would be something you can actually deploy, but as a basic solution for this problem I think it works?
I guess I don’t really need provisioning, it was more that my understanding was that there’s no out of the box pam module that will take certificate attributes and use them for user/sudo privileges. I will say that true user provisioning is better for some places as you can easily tie audit logs to specific users.
This is a very good post. It is difficult to justify contributing to OSS when, if your project is successful, it will just be taken wholesale by a commercial vendor. And when the employees of those vendors make stratospheric salaries, but will generally not hire people with that open source background. This applies to both hobbyist projects and commercial open source projects.
That’s odd - why wouldn’t they hire the most experienced and most influential people on the project they’re using? There are many reasons why they would do that:
And in fact, there have been plenty of cases where a core developer got hired precisely due to their work on an essential piece of technology. Or they just started a business themselves and hired a big chunk of the rest of the core team.
What a contrast in technology levels. They could launch and land a 100 ton reusable crewed orbiter, almost fully automatically, which we can’t get back to today. But printing out text messages was a huge problem.
There are companies that build airgapped platforms (usually for governments or government contractors). They have everything you’re used to, with YUM, Docker, Kubernetes, NPM, pip, etc, but all maintained with no internet access. And they are very, very expensive.
Crowdstrike is primarily an antivirus product. I think it’s telling that we don’t call it “antivirus” because Norton and McAfee gave the entire field of antivirus a terrible reputation, since they crashed all the time and demanded payment frequently. But basically it does the same thing: monitor all the processes on your machine and make sure none have the traits of known malware.
It is generally considered a very good product, and many security consultancies and even insurance companies recommend it be installed on all machines. It is pretty silent so it could be on your work computer and you wouldn’t know unless you looked for it.
It’s available for Linux too, and is often baked into companies’ server and cloud images. Fortunately the Linux version didn’t crash, since that would have been even more damaging. There is also a Mac version, which didn’t crash either.
Also notable that if a Windows cloud instance BSODs, there is usually no way to boot it in safe mode and remove the offending component. So many Windows cloud shops are just stuck rebooting over and over, hoping it will recover itself.
I yell a lot about the software crisis, but this is a symptom of it.
The trigger on this update was pulled presumably without testing. With the effects being as wide-spread as they are, I doubt this was a “works on my machine” situation.
With the pervasive devaluation of programming as a trade by managerial forces, and the progressive migration to fungible workers, I doubt this will be the last time this happens on this scale.
If you’ve ever done any driver programming, especially windows drivers, you will know that it is difficult to test the full matrix of every possible version / configuration / matrix of other drivers on the machine.
I would expect driver developers working on something as wide spread a crowdstrike to be well versed in the nuances… but when I worked at microsoft even the full time kernel team folks would make mistakes that would cause major issues internally for people self hosting windows. I maintained a relatively small boot driver that ran on windows 8 - windows 11 (client and server), and the matrix of possible interactions your code could have was overwhelming to test at times (https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/managing-hardware-priorities).
I’m not making excuses for them, but I imagine this is not something you can chalk up to just pure negligence.
Oh, I guarantee you the things that get shipped are difficult to test. That’s a given.
This isn’t on the engineers. This is on management. CrowdStrike has been undergoing rolling layoffs and I guarantee you some of the folks that were gutted were QA-related.
It’s unfortunately a common tale. Cut staff, cut morale, micromanage, construct a meat grinder, and things like this will happen.
Bugs happen. Rolling them out to your entire deployment this fast is negligence.
I agree, but consider that it is pretty hard to test antivirus updates since they are extremely time-sensitive. They typically try to respond to the latest threats within a few days. There is no real possibility of weeks of testing, or staged rollouts, or other practices companies normally use.
I mean, you can absolutely stage rollouts, even if it takes 24 hours you’d still probably have quite a bit less damage.
Fully agreed. My bet is on management/staff cuts rather than engineering incompetence.
Heartbreaking to find out how many embedded systems in the world run Windows (and then with a security snake-oil rootkit on top) when they really have no reason to.
Why is that heartbreaking? What would be a non-heartbreaking OS choice?
None? Or one of the minimal embedded? I get where the convenience comes from, but when a fully dedicated terminal like an ATM or checkout machine boots windows, I just have to question it - 99% of the deployed code there does nothing but expand the vulnerable scope and wait for an opportunity to fail.
Not true. A windows environment helps them debug and speeds up development, even in deployment. It’s unreasonable to disregard the amount of engineering time saved because of a black swan event like today.
Speed up development: you can develop a QT app mostly on windows even if you run it under QNX on the target.
Debug: you do lose a full user interactive environment, but for proper testing you needed to run the target device anyway. From my experience once you have a debug interface working, it’s not much harder. On the other hand the number of moving pieces drops significantly - if it’s not in your code, it doesn’t happen.
But I’m serious about the inactive code. How much user specific things and random services are there, even on a cut down, embedded version on windows? Why do they need to exist there at all?
This is tangential but development environment that doesn’t mirror your deployment environment is a recipe for disaster.
Yes but my point is that this is cherry picking. An equally fair and rational assessment could be “that code is never activated or exposed and we can afford having it there so it doesn’t matter,” and that would be good engineering. In this specific example, the downtime was not due to Windows but rather extra third-party components chosen by the developers.
Development environment not matching production is a given. It’s just a question of how different and how much testing we can do. I can’t get a graviton CPU that AWS uses in production on my machine for example. But we deploy to a test environment and cope. I’m not developing on ATmega chips either, because it’s impossible. It’s ok.
The dead code matters for security though - the multi user mode, the kernel parsers for unused formats, the various RPC demons, the ability to install drivers - that’s why crowdstrike was installed there in the first place. If you get rid of the ability to install known malware and run from a read-only device, you don’t need to worry about this protection mechanism in the first place. You can get rid of the whole vulnerability class.
Why does it have to be? I previously did ATM OS engineering for a bank that rolled their own (uncommon versus just paying Diebold or whoever). The devs had their own ATM lab so they didn’t face this problem.
I think we have a naming issue. I totally agree with you that devs getting the target platform is normal and perfectly fine. I was responding to the previous poster about the dev environment not mirroring target as in “you don’t have to run QNX to develop for QNX”. Both remote debugging over JTAG/whatever and testing parts of the system on a different architecture is fine.
Do you have any data on this? I’m fairly sure not just money but also lives were lost during this outage, as hospitals and emergency services were impacted from it. It goes completely against my own personal experience, and also against Microsoft’s own messaging by including WSL as a feature focused at developers. You wouldn’t condemn people to death without hard data on quantifiable comforts their deaths bring you, yes?
No I don’t have data on whether or not a familiar OS environment speeds up development as a base platform upon which to build.
There is a lot wrong with this “Windows is killing people!” argument.
Actually if Windows didn’t crash when its drivers segfaulted (microkernel designs solve this), or even if the design of Windows didn’t make IT departments believe that it was necessary to purchase and install so much additional IT infrastructure beyond what the base system provides, this problem would’ve been avoided completely so actually this is in fact also a problem with Windows for high reliability environments.
Garbage in, garbage out. You can trade crashing in for a different erroneous behavior but you can’t avoid the erroneous behavior altogether. There are ways to build high reliability systems with windows, despite the existence of BSODs. These systems just weren’t designed to be high reliability, that’s not the fault of the operating system.
A product is either made to achieve high reliability or is modified to achieve high reliability. the latter does not constitute a high-reliable product but a potentially reliable one. this is my point, which you agree with in other words.
I believe your Original point was that using a microkernel would have avoided this issue because the BSOD would instead result in a process crash.
My counter point is that whether or not the error manifested itself as a BSOD or process crash, you still need to design your software to recover from a crash. I.e. microkernels don’t give you high reliability for free, you still need to handle the failure state. E.g. your micro kernel process could end up crash looping, effectively locking out the system in the same way as a BSOD.
The critical reason for the hard failure isn’t that it manifested as a BSOD, it’s that the software wasn’t designed to handle crashes. This isn’t the role of the operating system, it’s the role of application software. So no, windows is not to blame here.
And just FYI BSODs can be recovered from via a watchdog timer.
I just said what my point is you don’t have to Believe
Actually, I bet this is correct. But it’s correct because of its pervasiveness. This is like looking at how many people die in a country but not normalizing it to population; the most populous countries will invariably have the most deaths. Epidemiology 101 would demand a rate here, not an absolute count.
A rate was implied, otherwise the point is clearly obtuse. It’s common knowledge that windows is one of the most deployed platforms.
I’m pretty sure almost any lifestyle comfortable enough to give one a personal computer and free time to spend on Lobste.rs kills people (e.g. through contributing to greenhouse gas emissions that contribute to storms, floods, droughts, and famines; or through the mining of “conflict minerals” (even if the latter one is changing, and I’m not sure it is, we at least used to kill people through it for our technological conveniences)). I doubt most of us have hard data on the trade-off.
I’d guess it’s absurdly difficult to develop ATM software and then distribute it to bare metal or even Linux. Like have you seen the state of packaged software on Linux even recently?
QNX.
Anything where you have some unified control over the update mechanism and can do whole system A/B updates with boot assessment.
I’m not sure what boot assessment is but Windows gives you control over the update mechanism and nothing is stopping you from doing A/B testing.
Automated boot assessment means that if a system is updated and put into an unbootable state, even if it hard-freezes, a watchdog timer will reset it, and the bootloader will revert back to the last successfully booted whole-OS image.
Windows does not allow you to do this. Every software you run on it has its own ad-hoc update mechanism. You are completely misunderstanding what “A/B” means here; it has nothing to do with A/B testing in user studies.
I see. I assume that generally developers would not be updating the bootloader / early-kernel level components of the system. This is all something that would be done in user-space after you’ve been passed control and Windows does allow you to build in this functionality.
Windows doesn’t have it built in, but I don’t think it prevents you from implementing it either. The watchdog can be enabled, the default boot entry can be changed from the booted system, and even an A/B/scratch split can be implemented with a minimum addition in the bootloader.
I learned about Windows To Go today, and perhaps this could be used in conjunction with dual image storage to swap between two versions.
the obvious one
Crowdstrike is also commonly used on Linux machines… although in this case, the Linux version did not crash.
Yeah, It’s a Crowdstrike quality issue.
I read that a linux sysadmin was facing a similar issue recently because Crowdstrike was pushing a package incompatible with the distro they’re running
Does it run with as elevated privileges on Linux as it does on Windows?
It has two different modes of operation on Linux depending on whether they have a build that matches your kernel. In one mode, it installs a kernel module, which is pretty much exactly as root-kitty as the Windows flavor. In the other, it runs as root and installs a lot of eBPF hooks to observe system behavior, which is slightly less intrusive and slightly less likely to take everything down.