Or you can, you know, use language the way it is intended to be. What is the value of starting a comment with suggestion: rather than “This is just a suggestion, but ..”?
Metadata aggregation might’ve been an argument before, but isn’t anymore now that we’ve entered a time when software can categorize and understand context almost as well as another human can.
I’ll never cease to be surprised by the desire some programmers have to formalize and constrain the interactions they have with their coworkers.
we’ve entered a time when software can categorize and understand context almost as well as another human can.
I don’t think “software misunderstands no worse than humans” is a great sell. I’ll take completely reliable interpretation over that, for both humans and machines.
I’ll never cease to be surprised by the desire some programmers have to formalize and constrain the interactions they have with their coworkers.
And I’ll never cease to be surprised by the desire of some people to go through endless rounds of clarifications because they refuse to be clear and precise. High context cultures are an impediment to science and engineering, and a major safety problem.
High context cultures are an impediment to science and engineering, and a major safety problem.
Hyperbole much?
Maybe this rubbed me the wrong way precisely because i’m from one of those so called high context cultures (Argentinian), but i mean come on, wouldn’t that statement sound awful if you replaced “high context cultures” with “Latinamerican and Indian cultures”? (i know there are other cultures also considered high context too)
Thing is, yes, we love having high context, having internal jokes, saying a lot without saying much. And we do find it stupid to make all the most obvious things explicit, even infantilizing. And no, i don’t think this is any impediment to our work. Cannot speak for science, but i feel pretty confident about this regarding software engineering. I’ve worked with people from all around the world, and i’ve seen Argentinian coworkers produce code of higher or equal quality to code from overpaid developers from FAANG companies on the US. There’s good and bad developers everywhere.
And regarding conventional comments in particular, i suggested using them on my previous company, and even tried using them myself for a couple of PRs. It didn’t go well. People unanimously preferred just saying things with normal language (myself included). It’s quite easy to communicate when something is just a suggestion vs when something should be changed before merging, and it feels very forced and unnatural to say that with a strict syntax. If we can’t communicate that effectively with natural language, we have bigger communication problem in general. My teammates were mostly Min/Eastern Europeans BTW, and they were totally against this convention.
but isn’t anymore now that we’ve entered a time when software can categorize and understand context almost as well as another human can.
Simple formalized grammar can be parsed by a simple and transparent program/script, that is cheap to write and cheap to run (CPU, RAM, bandwidth…). Compare it to AI/LLM – it runs often in the cloud (on other people’s computers), consumes tremendous power of CPU/GPU/RAM and is not transparent at all.
AI is useful for some tasks and sometimes can do a job that would be impossible (too expensive) to do without it. But intentionally generating garbage instead of structured well-defined data just because „AI can process garbage“ is not a good idea (at least at current state – resource consumption, lack of transparency, centralization, monopoly/oligopoly…).
desire some programmers have to formalize and constrain the interactions they have with their coworkers.
I agree that in this case the formalization is bit unnecessary. Of course, I also have a script that parses and summarizes TODO/FIXME comments, but… when something is really important, it should be managed in a bug/requirement/change tracking system, not just be scattered in some comments (structured or unstructured, does not matter).
Why are you looking at this as “generate structured data” or “generate garbage”? You’ve very quickly assumed the only alternative is garbage, while michiel is suggesting that writing your comments in a way that a human can disambiguate what you mean is just as likely to be automatically classified by an LLM.
I agree with your cost analysis, but you lost me after that.
What is the value of starting a comment with suggestion: rather than “This is just a suggestion, but ..”?
Less friction on the commenter. That alone would be worth it for me.
I think it may also work to help reduce the amount of effort a newcomer spends trying to not to come across as too aggressive before they can properly integrate into the project culture. And that’s assuming they’ll be interacting on regular basis in the first place. For me personally, each time I interact with a new community or project tends to be fraught with awkward and inordinate amount of time spent on phrasing
I came across the site a while ago and funny enough, machine parsability was the one thing I didn’t remember about the idea. It was the soft benefits alone that resonated for me
What is the value of starting a comment with suggestion: rather than “This is just a suggestion, but ..”?
If I’m looking at a PR with sixty comments, being able to quickly scan them by prefix match is nice. This kind of consistency lets me look for a particular comment (or skip over comments) without really engaging the language center of my brain. That means I am less likely to lose (or corrupt) the context I have in my head.
Metadata aggregation might’ve been an argument before, but isn’t anymore now that we’ve entered a time when software can categorize and understand context almost as well as another human can.
This does not reflect my experience of software, and certainly not with the degree of confidence I can get from fixed tokens.
The biggest drawback of all [of identity being a key-pair] is that securely and resiliently maintaining custody of a cryptographic key over a long period of time is hard. If you want your service to have truly mainstream adoption, you can’t expect end-users to be able (or willing) to do so.
This is indeed a problem. But I’m not satisfied with the alternatives; in the case of DIDs you either assume every user owns a DNS record (nope) or fall back to a centralized system like Bluesky’s.
In the system I’m (interminably) developing, the private key is held in the most secure storage available, like the Mac/iOS Keychain. The drawback is that an identity is tied to a physical device, so if you use multiple devices you’re a different user on each one. It’s ok for now but I’m not happy with it.
The best solution I see is to treat a person’s identity as an aggregate of device identities. Under the hood each device identity cross-certifies the others (requiring some sort of pairing UX when setting up a new device.). At the UI layer, all identities belonging to the same user are displayed as that user.
The remaining problem is losing control of an active device, like if your phone gets stolen and you can’t remotely wipe it. In that case your other devices would post revocations of the lost ID. (What happens if you had only two devices, and you and the attacker each revoke the other’s, is left as an exercise for the reader.)
In the system I’m (interminably) developing, the private key is held in the most secure storage available, like the Mac/iOS Keychain. The drawback is that an identity is tied to a physical device, so if you use multiple devices you’re a different user on each one. It’s ok for now but I’m not happy with it.
The Passkey mechanism is intended to somewhat address this. The keys stored in one device’s Secure Element can be encrypted with a public key of another Secure Element. This makes iCloud a somewhat scary single point of failure, because (as I understand the totally undocumented protocol) the key exchange happens via iCloud and iCloud attests to the target system’s key being valid. At least it depends on user action (though someone who can compromise iCloud could probably push put a malicious OS update that removed that requirement).
The best solution I see is to treat a person’s identity as an aggregate of device identities
This was the model for the Pico project from the University of Cambridge. Their idea was to have large numbers of ‘pico siblings’ that would, between them, provide an attestation of identity. If you lost a few of your devices, that wouldn’t be enough for an attacker to impersonate you. You might put some more-privileged ones in a safe somewhere so someone who can open the safe and provide the device with the relevant biometrics would be able to invalidate all other devices. Their goal was to make these things cheap so that they could be embedded in things like earrings, watches, and so on, so you’d typically carry half a dozen of them around with you and that would be sufficient for most systems.
All of these feel somewhat unsatisfactory. I’ve never found a system that seems like it works in both the case that someone breaks into your house and steals most of your devices and in the case that your house burns down with all of your devices inside. Either you allow the attacker to impersonate you in the first case, or you lose the ability to become you in the latter case.
I thought Passkeys were an open protocol (FIDO?) Or is the mechanism to propagate them between devices an Apple extension? I was just assuming the passkeys’ private keys lived in the Keychain, which is shareable between devices and E2E encrypted.
(I could share my protocol’s private keys via the Keychain, but that’s platform-specific. And also the protocol is based on an append-only signed log, so multiple devices would create conflicts…)
I’ve never found a system that seems like it works in both the case that…
Those are both very bad/rare situations; maybe it’s ok if a system can’t survive every disaster. Or if disaster resilience requires you take extra measures like putting backup keys in a bank safety-deposit box.
Or is the mechanism to propagate them between devices an Apple extension?
That’s the proprietary bit.
I was just assuming the passkeys’ private keys lived in the Keychain, which is shareable between devices and E2E encrypted.
Yes, but the mechanism for sharing is complicated. You don’t want to allow arbitrary software to dump the contents of the keychain, but for passwords you need to because the password is often the thing that you send to the server. For passkeys, you perform a computation that proves that you hold the key (for example, sign a nonce with the private key, which the server can then decrypt with the corresponding public key). Much of the increase in security comes from the fact that these things never leave trusted hardware. On Windows, they’re typically stored in the TPM, with the Apple ecosystem they’re stored in Apple’s Secure Element (this might be one key per server, but often just one private key in the hardware and a key-derivation function that derives a key from it and something OS-provided and then allows using the derived key for signing).
You really don’t want to allow a generic API to extract these keys because an attacker can use it to dump things. The normal work around for this (and I’m completely guessing at what Apple does) is to use a key-wrapping protocol. You do some communication between two HSMs that allows them to exchange a key and then you encrypt the key with that negotiated key. For this to be safe, you need some kind of attestation that the target HSM is one that the source should trust. I believe Apple does this via iCloud, with iCloud providing that guarantee. Or possibly part of that guarantee (I’d probably do it with something baked into the hardware and a revocation list in iCloud, so iCloud could say ‘don’t trust this device, I have reason to believe it’s compromised’ but not ‘trust this device, I promise it’s not a SE-emulator designed to steal all of your credentials, honest!’).
Those are both very bad/rare situations; maybe it’s ok if a system can’t survive every disaster. Or if disaster resilience requires you take extra measures like putting backup keys in a bank safety-deposit box.
The problem with that as a backup is that now the bank needs some mechanism for establishing my identity. Which they often do with login to some computer system these days. And I can’t do that if my keys are lost. I’m more inclined to think in the direction of Shamir Secret Sharing and share a recovery key between several friends and family members so that a sufficient number of them colluding can recover my key, and set that number to be large enough that the ones that know each other can’t do it. With a sufficiently dysfunctional family, you get an extra layer of protection if you share with the people that have personal reasons not to collude under any circumstances.
Safe deposit boxes (at least here in Sweden) are on their way out. They used to be an attempt to monetize the area of the vault, which is needed way less nowadays as less and less physical cash is stored in a bank branch. Hence, safe deposit boxes are less and less prevalent, and they are pretty expensive.
It’s economically infeasable to demand that each and every bank customer has a safe deposit box just to enable a backup identity token.
so iCloud could say ‘don’t trust this device, I have reason to believe it’s compromised’ but not ‘trust this device, I promise it’s not a SE-emulator designed to steal all of your credentials, honest!’).
If memory serves correct, I’ve been asked for the user password of a existing device when provisioning a new device
I’m inclined to agree that the author’s proposed changes could be small improvements, but:
My biggest problem with writing on my Android phone is how such phones have become tall and skinny nowadays. I used a 2014 phone until the early 2020s, and its slightly more squarish aspect ratio gave it slightly wider keys that I found easy to use. After finally switching to a modern phone that’s slightly narrower and so has slightly narrower keys, I find I press the wrong keys significantly more often, which slows me down as I either slow down in typing to avoid typos or stop to fix the typos I made. (If I rotate the phone into landscape mode, then, because the phone is tall (now wide), I find the keys become impractically wide, and, because the phone is skinny (now short), I find the area of text I can see impractically short.)
The narrowness of phones also exacerbates the difficulty of text selection that the author laments, as it makes the letters be smaller much like it makes the keys be smaller (although this can be changed by raising the default font size, at the cost of seeing less text at once, unlike the width of keys, which would require a fundamentally changed keyboard layout to be unconstrained by the width of the display).
Second after that problem, I would put the lack of undo/redo as a globally available function of the keyboard (although, e.g., Google Docs implements its own undo/redo).
Hey, I’ve just started learning GPU programming too. If you want, I’d love to share resources as we learn. Currently, I’m focused on Metal, though I think everything except shared memory should applicable.
Here’s some resources that helped me understand how modern GPU APIs are architected:
or as some people say, how “PostgreSQL used fsync incorrectly for 20 years”)
I don’t think there is a correct usage of fsync. AFAIK, Linux still marks unflushed dirty pages as clean. Crashing the application isn’t enough, you need to purge the page cache. Maybe going as far as rebooting the entire OS
It’s platform specific. I believe illumos systems have a correct implementation of fsync, for example. We inherited it from Solaris all those years ago. I expect at least FreeBSD does as well.
This article singlehandedly convinced me that time zones are a good idea and solve a real human problem. I am not in favor of abolishing time zones.
I do think that not having daylight savings time would be a good idea, although not because it would make computer timekeeping meaningfully simpler - once you have the concept of time zones, you need something like the tz database to keep track of when the administrative boundaries of time zones move, and adding dst rules to that is not a huge amount of additional complexity. The reason I care about DST is mostly because an extra hour of sleep in November doesn’t make up for the lost hour in March.
I also think that it might be a good idea to split the continental US into two time zones, the boundary following the current Mountain/Central time zone boundary, where the clock times of the zones differ by two hours. This would reduce the effective time difference between the east and west coasts, without significantly changing the mapping of the position of the sun to clock time compared to the status quo (at least not any more than daylight savings currently does).
I remember there being measurable health effects depending on which side of a time zone you lived. Essentially being: “How dark is it outside when you wake up.”
I enjoyed reading this, but Fastly’s Fast Forward provides free CDN services for open source projects which might help with the “most Linux distributions are free and thus don’t have a project budget to spin up a global CDN” problem.
I have no experience of Fast Forward myself, but I’ve heard that OpenStreetMap and meta::cpan have benefited from it, and I’m sure others have too.
Still, running a CDN off repurposed thin clients is very impressive.
It’s also a unknown when Fastly will do a rug-pull and stop supporting Open Source projects with a free CDN, so building your own, as well as using Fastly’s is probably the best way to go.
Interesting, I know for certain there are some void hosts in that region. Probably need to make sure their mirrors are pointed correctly to warm the cache.
I’ve been using discord heavily since 2015, and I’m fine with moving to standard usernames. If you put a gun to my head, I couldn’t tell you my discriminator, and I’m not sure that I knew it was called that before the announcement. It has always been a completely random number to me, and I have to open discord every time I want to share it with someone.
However, some users wanted to be unsearchable, because they had stalkers or were very popular and didn’t want random people finding their discord account. Discriminators and case-sensitivity essentially created a searchability problem which users were utilizing on purpose to make it harder for people to search them.
& from the linked tweet:
Now that an account won’t use a discriminator, users are more prone to getting harassed and/or scammed, be it a targeted campaign or simply for owning that specific username.
I 100% agree people should be able to opt out of user searching. There are already some good privacy options in place, such as disabling messages from people that aren’t friends with you. I don’t however think this change impacts harassment much, or at least I’d need to see some data to backup that claim.
What I’m failing to understand is how switching your discriminator from jado#1234 to jado#4321 to avoid harassment is substantially different from switching @jado1234 to @jado4321. They seem identical to me, but maybe the search feature could/would interact with this change differently.
Additionally, impersonation is already a problem on discord, and I would say the discriminator system facilitates it because (ignoring Nitro) everyone has a random number. Prominent community members in one server will probably have a rank, but when they go to another server they look like every other member.
If your users are confused and angry about it, then it’s your fault, not theirs.
This can certainly be true, but I don’t think it’s guaranteed. Every time we push a user facing change that breaks some workflow, it is met with 1000 twitter replies saying “this new XYZ looks awesome” and 1000 replies saying “this new XYZ looks like trash”.
One aspect is massively psychological. A “discriminator” discriminates, visually, literally. You are not jado1234. You are jado, the 1,234th. Nobody can take the identity of jado away from you. Nobody is going to try and steal your Discord account because they’re upset they can’t be jado or their new cryptocurrency scam is jadocoin. They can be jado too.
(r.e. impersonation - you can’t impersonate someone’s discriminator, which is why people often stick with the same one. even between servers you’re still jado#1234, people often persist with one tag for this consistency to avoid impersonation.)
Can someone explain what this shows. All I see is a screenshot from a unknown[0] website with a username @s3.amazonaws.com. So what are the security implications of this?
[0] as far as I have understand it’s from Bluesky. This doesn’t help, because I don’t know what Bluesky is or does.
Bluesky is a social media site. Part of their take on ‘verified users’ (and a part of how they notionally do ‘portable identity’) is to have a process where a user can claim a domain, and it gets verified. But they used a janky home rolled solution which did not think it through (rather than webfinger or something like the verification part of ACME/letsencrypt or one of the cert signing/identification standards or one of many other already existing options), making it easier than it should be for random people to claim your identity and get verified.
That’s disappointing. I saw that they use DNS names in their IDs but I thought it was the more typical/scalable “user@domain”, not just “@domain”… [facepalm]
Could you elaborate on scalability? I’m assuming you mean creating DNS records vs serving http responses, but I don’t see where wildcard records + Origin: fall short?
Don’t lock your FOSS project into proprietary communications. If you believe in FOSS for your project, you can believe in FOSS for your communications too… and the wealth of alternative clients and supported platforms (and decentralization!) will lead to greater accessibility.
I am fully on board with decentralization, federation, open platforms and clients. I think they will always win in the end, and they should be given the priority. I use Matrix and pay Element for a server even if though I am capable of hosting it myself. I use Mastodon. I use IRC, but only through the Matrix clients.
I also use Discord. I have spent a lot of time on this platform as well, and I’m sorry to say that it’s one of the best experiences out there. None are perfect, but the Discord client is years ahead of all the Matrix clients combined. Drew mentions accessibility in his post, but that information is a bit dated as Discord has progressed in this area. Enough that a blind student of mine could easily navigate and use the platform a year ago.
FOSS projects should definitely prioritize FOSS modes of communication. But they should also be accepting of proprietary paths, in an effort to grow their communities. Let’s face it, there are large groups of people on this planet that will never sign up for Matrix, but use Discord on a daily basis. If we want to grow our projects, and make things accessible to everyone, we also need to consider where all the users are instead of requiring them to join yet another service. This is surely an area where I disagree with Drew, as he seems to neglect the value of any one else’s preferred modes of communication/work.
Chat isn’t going anywhere, so projects need to accept and embrace it as best as we can. My personal preference is that projects should bridge the chat systems, and Matrix obviously provides that capability. With Matrix bridges one can also chat with Discord and IRC users. Bridges have their problems, but we can meet the users where they are without compromising our beliefs.
It doesn’t hurt a project to give people from proprietary networks the ability to join the conversation, in fact I’d argue it benefits them.
Aren’t people always posting about FOSS maintainer burnout? I’d wager that adding additional users who aren’t willing or able to use simple FOSS tools will be more of the burnout-inducing kind than the helpful community-minded sort. If you present your FOSS project as a product with all the regular support channels that proprietary products use, you’re gonna get people whose relationship to your project is as consumers of a product.
I do wonder how closely I will follow this idea myself, though. I’m developing a project on sourcehut and am considering setting up a copy of it on github so people can find it and file issues and such.
Yes, each maintainer should make their own decisions on how wide of a net they wish to cast. If you’re making a personal project and don’t want a bunch of consumers, then you’re probably not to the level I was thinking of when I wrote that. A single Matrix channel is actually a lot less maintenance than an entire Discord server.
Personally I mirror my repos across providers for a number of reasons. That includes discoverability, but also resilience and availability. I use codeberg for some things, but have scaled back from sourcehut because I personally don’t like the interfaces as much (and I can’t automate pushing from my gitea instance to sourcehut yet).
Bridging from Matrix also has (what some would call) an advantage - Matrix folks get a first-class experience because Matrix was designed for bridging, whereas Discord looks just a little funky because all the bridged users show up as “bots”. This way there’s a natural encouragement to “upgrade” to Matrix to get a more natural experience. And it lets people show up to the Discord and ask why half the people are labeled bots, instead of bailing out on Matrix before anyone gets a chance to engage in a conversation with them. (Disclaimer: I’ve never used this bridge, I’m going off educated guesses and the screenshot I linked above.)
Discord bridges on Librea are comically bad. I’ve been in multiple channels where introducing a Discord bridge caused the channel to just disintegrate.
Matrix bridges are like night and day. There’s a few quirks around messages that get edited, but overall the integration is just leaps and bounds better. Messages show as being sent by the sender instead of the bridge-user.
I accidentally closed this tab and everything after the horizontal break was OCRed from a frame buffer in Android Recents which was still there for some reason. Also I’m sleep deprived
Bridges can have bad, heavy handed translations of certain semantics such as replies and message links. Doesn’t help that those features are my first go-to when referencing past knowledge in conversation.
Especially when they’re implemented as Rich Embeds (Rich Embeds being the part of the message representation OpenGraph and Cards are parsed into. Bots can generate their own freely. Not mere media embeds aka attachments). You don’t mind when it’s a handful of people in a small friendly community, but at some threshold of something you really do start to mind. (Though it might really have everything to do how a bridge decides to render and translate everything and that I’m merely misplacing my frustrations. There are definitely communities I lurk in where don’t even notice the seams)
Ultimate answer is that it doesn’t matter if some minor thing like the Bot tag looks funky or the bot embeds the username in the message content instead of changing the represented username per message, so long as the experience is enjoyable for both natives
(If you post a message via webhook you can add the username and avatar to be used for that message but you can’t use replies. Pick your poison)
FOSS projects should definitely prioritize FOSS modes of communication. But they should also be accepting of proprietary paths
I hope I didn’t imply meaning we should completely shun such platforms, however there are five big issues with the current state of choosing something like Discord or Slack
An increasing number of projects, especially by newer to FOSS maintainers, exclusively support Discord or Slack. This shouldn’t be acceptable.
Matrix/XMPP/IRC are treated as subordinate rather than the home with a bridge or secondary support on proprietary networks. The goal should be to get them to cross over and long-term maintenance as proprietary options need to make no promises to maintaining backwards compatibility to support bridges or even stable ToSs.
With Matrix Spaces/Rooms and XMPP MUCs, users can join in a decentralized manner so they’re not required to create an account with your provider, and it’s should be totally acceptable to host a server behind a proxy for anonymity if those are a user’s desires.
As a workplace cooler or the hallway track of a conference, community building is between humans and private discussions should be encouraged but E2EE should be the default. Just as you wouldn’t want eavesdroppers recording private conversation (and then selling to data brokers or giving to the cops), private coms should remain private and this should be valued by the community.
Public communications should in most cases be search engine indexable and archivable.
Yeah, and Python, Julia’s language of choice, has about the world’s only easily accessible implementation of IEEE 754 decimals. Little known fact, Python’s Decimal class is IEEE 754-compliant arithmetic!
I mean it solves this loosely. The places where decimal vs. non-decimal matters - certainly where this seems to come up - are generally places where I would question the use of floating vs fixed point (of any or arbitrary precision).
Base 10 only resolves the multiples of 1/10 that binary can’t represent, but it still can’t represent 1/3, so it seems like base 30 would be better as it can also accurately represent 1/3, 1/6, in addition to 1/2, 1/5, and 1/10. Supporting this non binary format necessarily results in slower operations.
Interestingly to avoid a ~20% reduction in precision the decimal ieee754 actually works in base 1000.
I have yet to see a currency that is not expressed in the decimal system.
I have yet to see an order form that does not take its quantities in the decimal system.
Yes, which is my point, there are lots of systems for which base 10 is good for humans, but that floating point in any base is inappropriate.
In fact, if there’s any type that we do not need, it’s binary floating point, i.e. what programmers strangely call “float” and “double”.
Every use case for floating point requires speed and accuracy. Every decimal floating point format is significantly more expensive to implement in hardware area, and is necessarily slower than binary floating point. The best case we have for accuracy is ieee754’s packed decimal (or compressed? I can’t recall exactly) which takes a 2.3% hit to precision, but is even slower than the basic decimal form which takes a 20% precision hit.
For real applications the operations being performed typically cannot be exactly represented in base 10 (or 1000) or base 2, so the belief that base 10 is “better” is erroneous. It is only a very small set of cases where a result would be exactly representable in base 10 where this comes up. If the desire is simply “be correct according to my intuition” then a much better format would be base-30, which can also represent 1/(3^n) correctly. But the reality is that the average precision is necessarily lower than base-2 for every non-power of 2 base, and the performance will be slower.
Floating point is intended for scientific and similar operations which means it needs to be as fast as possible, with as much precision as possible.
Places where human decimal behaviour is important are almost universally places where floating point is wrong: people don’t want their bank or order systems doing maths that says x+y==x when y is not zero, which is floating point does. That’s because people are dealing with quantities that generally have a minimum fractional quantity. Once you recognize that, your number format should become an integer count of that minimum quantity.
Yes, for currencies, you can use integers. Who would want to say x * 1.05 when they could say multFixPtDec(x, 105, 2);
To some extent, this is why we use standards like IEEE 754. Some of us remember the bad old days, when every CPU had a different way of dealing with things. 80 bit floats for example. Packed and unpacked decimal types on x86 for example. Yay, let’s have every application solve this in its own unique way!
Or maybe instead, let’s just use the standard IEEE 754 type that was purpose-built to hold decimal values without shitting itself 🤷♂️
[minor edit: I just saw both my wall of text replies were to u/cpurdy which I didn’t notice. This isn’t meant to have been a series of “target cpurdy” comments]
Yes, for currencies, you can use integers. Who would want to say x * 1.05 when they could say multFixPtDec(x, 105, 2);
I mean, sure if you have a piss poor language that doesn’t let you define a currency quantity it will be annoying. It sounds like a poor language choice if you writing something that is intended to handle money, but more importantly, using floating point for currency is going to cause much bigger problems.
And this has nothing to do with ieee754, that is merely a specific standard detailing how the storage bits for the format work, the issue is fundamental to any floating point format: floating point is not appropriate to anything where use are expecting exact quantities to be maintained (currencies, order quantities, etc) and it will bite you.
Some of us remember the bad old days, when every CPU had a different way of dealing with things. 80 bit floats for example.
So as a heads up assuming you’re complaining about x87’s 80bit floats: those are ieee754 floating point, and are the reason ieee754 exists: every other manufacturer said the ieee754 could not be implemented efficiently until intel went and produced it. The only issue is that being created before finalization of the ieee754 specification it uses an explicit 1-bit which turns out to be a mistake.
Packed and unpacked decimal types on x86 for example.
You’ll be pleased to know ieee754’s decimal variant has packed and unpacked decimal formats - unpacked taking a 20% precision hit but being implementable in software without being catastrophically slow, and packed having only a 2.3% precision hit but being pretty much hardware only (though to be clear as I’ve said elsewhere, still significantly and necessarily slower than binary floating point)
Or maybe instead, let’s just use the standard IEEE 754 type that was purpose-built to hold decimal values without shitting itself 🤷♂️
If you are hell bent on using an inappropriate format for your data then maybe decimal is better, but you went wrong when you started using a floating point representation for values that don’t have significant dynamic range where gaining and adding value due to precision limits is not acceptable.
[minor edit: I just saw both my wall of text replies were to u/cpurdy which I didn’t notice. This isn’t meant to have been a series of “target cpurdy” comments]
No worries. I’m not feeling targeted.
I mean, sure if you have a piss poor language that doesn’t let you define a currency quantity it will be annoying.
C. C++. Java. JavaScript.
Right there we have 95% of the applications in the world. 🤷♂️
How about newer languages with no decimal support? Hmm … Go. Rust.
And this has nothing to do with ieee754
Other than it actually specifies a standard binary format, operations, and defined behaviors thereof for decimal numbers.
So as a heads up assuming you’re complaining about x87’s 80bit floats: those are ieee754 floating point
Yes, there are special carve-outs (e.g. defining “extended precision format”) in IEEE754 to allow 8087 80-bit floats to be legal. That’s not surprising, since Intel was significantly involved in writing the IEEE754 spec.
ieee754’s decimal variant has packed and unpacked decimal formats - unpacked taking a 20% precision hit but being implementable in software without being catastrophically slow, and packed having only a 2.3% precision hit but being pretty much hardware only
I’ve implemented IEEE754 decimal with both declet and binary encoding in the past. Both formats have the same ranges, so there is no “precision hit” or “precision difference”. I’m not sure what you mean by packed vs unpacked; that seems to be a reference to the ancient 8086 instruction set, which supported both packed (nibble) and unpacked (byte) decimal arithmetic. (I used both, in x86 assembly, but probably not in the last 30 years.)
you went wrong when you started using a floating point representation for values that don’t have significant dynamic range where gaining and adding value due to precision limits is not acceptable
I really do not understand this. It is true that IEEE754 floating point is very good large dynamic ranges, but that does not mean that it should only be used for values with a large dynamic range. In fact, quite often IEEE754 is used to deal with values limited between zero and one 🤷♂️
How about newer languages with no decimal support? Hmm … Go. Rust.
You can also do similar in rust. I did not say “has a built in currency type”.
You can also add one to python, or a variety of other languages. I’m only partially surprised that Java still doesn’t provide support for operator overloading.
And this has nothing to do with ieee754
Other than it actually specifies a standard binary format, operations, and defined behaviors thereof for decimal numbers.
No. It defines the operations on floating point numbers. Which is a specific numeric structure, and as I said one that is inappropriate for the common cases where people are super concerned about handling 1/(10^n) accurately.
I’ve implemented IEEE754 decimal with both declet and binary encoding in the past. Both formats have the same ranges, so there is no “precision hit” or “precision difference”. I’m not sure what you mean by packed vs unpacked; that seems to be a reference to the ancient 8086 instruction set, which supported both packed (nibble) and unpacked (byte) decimal arithmetic.
I had to go back and re-read the spec, I misunderstood the two significand encodings. derp. I assumed your reference to the packed and unpacked was those.
On the plus side, this means that you’re only throwing out 2% of precision for both forms.
I really do not understand this. It is true that IEEE754 floating point is very good large dynamic ranges, but that does not mean that it should only be used for values with a large dynamic range.
No, I mean the kind of things that people care about/need accurate representation over multiples 1/(10^n) do not have dynamic range, fixed/no-point are the correct representation. So optimizing the floating point format for fixed point data, instead of the actual use cases that have widely varying ranges (scientific computation, graphics, etc)
In fact, quite often IEEE754 is used to deal with values limited between zero and one 🤷♂️
There is a huge dynamic range between 0 and 1. The entire point of floating point is that all numbers can be represented as a value between [1..Base) with a dynamic range. The point I am making is that the examples where decimal formats is valuable do not need that at all.
What is the multiplication supposed to represent? Are you adding a 5% fee? You need to round the value anyway, the customer isn’t going to give you 3.1395 dollars. And what if the fee was 1/6 of the price? Decimals aren’t going to help you there.
It never ceases to amaze me how many people really work hard to avoid obvious, documented, standardized solutions to problems when random roll-your-own solutions can be tediously written, incrementally-debugged, and forever-maintained instead.
Help me understand why writing your own decimal support is superior to just using the standard decimal types?
I’m going to go out on a limb here and guess that you don’t write your own “int”, “float”, and “double”. Why is decimal any different?
This whole conversation seems insane to me. But I recognize that maybe I’m the one who is insane, so please explain it to me.
No, I’m saying that you don’t need a decimal type at all. If you need to represent an integral value, use an integer. If you want to represent an approximation of a real number, use a float. What else would you want to represent?
I would like to have a value that is a decimal value. I am not the only developer who has needed to do this. I have needed it many times in financial services applications. I have needed it many times in ecommerce applications. I have needed it many times in non-financial business applications. This really is not a crazy or rare requirement. Again, why would you want to use a type that provides an approximation of the desired value, when you could just use a type that actually holds the desired value? I’m not talking crazy, am I?
What do you mean by “a decimal value”? That’s not an established mathematical term. If you mean any number that can be expressed as m/10ⁿ for some integers m, n, you need to explain precisely why you’d want to use that in a real application. If you mean any number that can be expressed as m/10ⁿ forsome integer m and a fixed integer n, why not just use an integer?
Being able to say x * 1.05 isn’t a property of the type itself, it’s just language support. If your language supports operator overloading you could use that syntax for fixed point too.
Oh, you are using a language with fixed point literals? I have (in the past). I know that C#/VB.NET has its 128-bit non-standard floating point decimal type, so you’re not talking about that. Python has some sort of fixed point decimal support (and also floating point decimal). What language are you referring to?
Oh, you are using a language with fixed point literals?
You don’t need to. Strings are a good substitute
For Kotlin it doesn’t really even matter what the left operand is
fun main() {
println("1.05" * 3)
}
operator fun String.times(right_operand: Int): FixedDecimal {
// Do math
return FixedDecimal(); // Return placeholder
}
class FixedDecimal;
So your idea is to write your own custom decimal type? And that is somehow better than using an international well-established standard IEEE-754?
I think Kotlin is a nice language, and it’s cool that it allows you to write new classes, but being forced to build your own basic data types (”hey look ma! I invented a character string!”) seems a little crazy to me 🤷♂️
The idea is that the type represents an underlying standard as well as its defined operations. You don’t need native support for a standard in order to support said standard
Edit:
but being forced to build your own basic data types
I was giving an example about ergonomics and language support rather than using an opaque dependency
The world is boring enough as is. Let’s add more whimsy and cuteness through our service and project names.
I find that I have more time for whimsy when I’m not fielding the umpteenth “what was the purpose of omega star again?” request. I’m not at work for cuteness–I’m there to get paid, support my family, and reduce the annoyance of the lives of my coworkers.
There’s also the unfortunate thing about whimsy: team composition changes over time, and what you find funny today may not be funny later to other folks. Consider the issues you might run into with:
fargo as the name for a service for removing user objects
gestapo for the auditing system
dallas as a name for the onboarding service (since everybody does it)
kali as a name for a resource management and quota enforcement engine
miniluv for your customer service back-office suite
hyrda2 because hydra sucks but version 2 needs to coexist for a while yet with it
Quetzalcoatl after a character from that silly anime Kobayashi’s Dragon Maid (bonus points if you are concerned about second-hand cultural appropriation)
basically any character from Neon Genesis Evangelion
fido (or whatever your dog’s name is) for the fetching service might not be so pleasant after the namesake is sunset from production
2consumers1queue for a data ingest worker pool probably isn’t gonna go over well if anybody on staff has any ranks in knowledge (cursed)
And so on and so forth. I’m put in mind of an old article by Rachel Kroll talking about some logging service or format that was named rather blatantly in reference to either a porno or a sexual act–while this might be a source of chuckles for the team at launch, later hires may object.
As an industry we haven’t gotten over the whimsy of blacklists and whitelists, master and slave, female and male connectors–or hell, even the designation of “user”. If we can’t handle those things, what chance do we think we have with names that seem entertaining in the spur of the moment?
If you still doubt me, consider that AWS seems to engage in both conventions. Which is the easier product to identify, Kinesis or Application Load Balancer? Athena or Secrets Manager? IOT Core or Cognito?
~
I’ll grant that for hobby projects, sure, go nuts. I enjoy thematic character names from anime for my home lab. I used to use names from a particularly violent action movie for cluster nodes.
I’ll also grant that for marketing purposes (at least at a project level) it isn’t bad to have a placeholder name sometimes while things are getting hashed out–though those names often stick around and end up serving as inside baseball for folks to flaunt their tenure.
Lastly, I’ll totally grant that if you by design are trying to exclude people, then by all means really just go hog wild. There are all kinds of delightfully problematic names that function like comic sans to filter folks. Just don’t be surprised if you get called on it.
Lastly, I’ll totally grant that if you by design are trying to exclude people, then by all means really just go hog wild. There are all kinds of delightfully problematic names that function like comic sans to filter folks. Just don’t be surprised if you get called on it.
Meh. The problem with this is that doing gratuitously offensive stuff, like deliberately making your presentations harder to look at, also attracts a bunch of people who think being gratuitously offensive is, like, the coolest thing ever. And when those people find a home in the tech world they soon set about being offensive in the wider community.
Having helped moderate a once fairly significant FOSS thing, I’m pretty convinced that the assholery-positive branding is a bad thing for all of us. It breeds a culture of assholery that we all have to live with no matter where we are.
With all of that said, some cute names don’t concern any sensitive subjects, so I feel like you’re tearing down a straw man, or at least the more strawy half of a composite man. At a previous job we created a service called “counter”, which did have a serious job, but we liked to describe it to management—mostly truthfully—as just being for counting things. You know, like, one apple, two apples… I don’t know if this is funny in anyone’s mind but mine, but the name certainly wasn’t chosen to be a useful description.
Homebrew seem to be using beer brewing names and metaphors throughout. As someone who doesn’t know brewing beer nothing makes sense there. It feels to me like they’re taking a subject I know something about (packaging software) and deliberately make it obscure by renaming everything.
I’m similarly put off Kubernetes. Why invent a whole new vocabulary to name existing concepts? I don’t care enough about Kubernetes to try deciphering their jargon.
If you take an existing domain and change the names of all the things then anyone wanting to participate has to relearn everything you’ve made up (poorly in most cases.)
It makes interacting with you like speaking in a second language you have little command of.
EDIT: Just pause for a second and imagine reading a codebase where classes, functions and variables all have cute names…
I think cutesy naming has its place, but I agree that sub-naming of cutesy things (e.g. cheese shop, wheels, eggs from Python packaging) is bad. Your cutesy name should just be a pun with a more or less deducible relationship to the thing it does (e.g. Celery feeds RabbitMQ). You can have jokes in the documentation, but no joke nomenclature.
TIL! I worked at a company where we used Celery – but with Redis as the backing store – for years and I never made this Celery/RabbitMQ connection before.
I don’t know that I disagree in general, either with you or with friendlysock’s comment. I was responding to a specific thing in it that I found significant. If you want to talk about naming more broadly, know that—at least from my perspective—people don’t all engage with names in the same way, and the purpose of names is situational, learned (think of mathematician vs programmer attitudes to naming variables), and at least in some cases a tradeoff between the interests of the various different people who will be using the name. So I don’t think it’s very useful to have a blanket opinion.
I find that I have more time for whimsy when I’m not fielding the umpteenth “what was the purpose of omega star again?” request. I’m not at work for cuteness–I’m there to get paid, support my family, and reduce the annoyance of the lives of my coworkers.
So much this, and also:
The joke is not going to hold up. It probably wasn’t the funny the first time except to you and maybe one other person, and it’s certainly not going to stand the test of time. I can count on one hand the number of “jokes” in comments and names I’ve come across that have actually made me laugh.
You might think you are adding more “whimsy” into a drab world… in fact, you are probably inducing eye rolls for the decade of programmers who will have to maintain your work.
I’m put in mind of an old article by Rachel Kroll talking about some logging service or format that was named rather blatantly in reference to either a porno or a sexual act–while this might be a source of chuckles for the team at launch, later hires may object.
As an industry we haven’t gotten over the whimsy of blacklists and whitelists, master and slave, female and male connectors–or hell, even the designation of “user”. If we can’t handle those things, what chance do we think we have with names that seem entertaining in the spur of the moment?
Honestly, I think this is very easy to avoid if you have a diverse team and take a minimum amount of care in choosing a name. These examples are valid, but I think they are clearly influenced by the insularity and whiteness of software developers.
I’ve built a series of services over the past few years, and my preference has been animal names.
Coyote is an ACME service
Hedgehog is a routing layer for our Fastly ingress
Mockingbird is an API polling service
Cerberus is an IAM/OAuth/SAML service (I will grant you this one is the least creative)
They were not chosen at random, they were designed to be mnemonic.
While I take your point about AWS, Google does the opposite approach and names everything generically, which makes searching for them a pain in the ass. Also, I think there are distinctly different tradeoffs involved in choosing internal names vs external product names.
I think this is very easy to avoid if you have a diverse team and take a minimum amount of care in choosing a name.
As a practical matter, early-stage startups and small companies do not tend to have diverse teams. Remember, naming is an issue that happens with a team size of 1 and which impacts a team size of…however many future selves or people work on a codebase.
You use Coyote as a harmless example (because ACME like in the coyote and roadrunner cartoons, right?) but similarly banal things pulled from the cartoons like gonzales for a SPDY library in this day and age cannot be guaranteed to be inoffensive. Even if you take care for today’s attitudes there is no guarantee of tomorrow.
On a long enough timeline, anything not strictly descriptive becomes problematic to somebody (and if you don’t like where the industry is heading in that regard, well, that ship has sailed).
As a practical matter, early-stage startups and small companies do not tend to have diverse teams.
The less diverse your team is, the more care you should take.
You use Coyote as a harmless example (because ACME like in the coyote and roadrunner cartoons, right?) but similarly banal things pulled from the cartoons like gonzales for a SPDY library in this day and age cannot be guaranteed to be inoffensive.
I think this only buttresses my point: Gonzalez is clearly a racial/ethnic stereotype. I would never even think of choosing that. This is not rocket science!
On a long enough timeline, anything not strictly descriptive becomes problematic to somebody
Whereas using a purely descriptive name is much more likely to become problematic on a shorter timeline for the reasons stated in the article.
According to Wikipedia, even though Speedy Gonzales is clearly a problematic ethnic stereotype, there has been a thing where the Mexican community has decided they like him, so he’s been uncanceled. Who can predict these things? https://en.wikipedia.org/wiki/Speedy_Gonzales#Concern_about_stereotypes
You use Coyote as a harmless example (because ACME like in the coyote and roadrunner cartoons, right?) but similarly banal things pulled from the cartoons like gonzales for a SPDY library in this day and age cannot be guaranteed to be inoffensive.
Gonzales was once taken off the air out of racial concerns. Hispanic groups campaigned against this decision because Speedy was an a cultural icon and a hero. Eventually he went back on air
Assuming a person or demographics’s view point is it’s own danger. I am not innocent in this regard
I’m well aware of the history there, never fear. That’s why I used him as an example: using that name may annoy well-meaning people, changing that name may annoy somebody with that Hispanic background.
If we’d gone with the boring utilitarian name of spdy-client, instead of being cutesy, we could’ve sidestepped the issue altogether.
Assuming a person or demographics’s view point is it’s own danger.
Sadly, that is not the direction well-meaning people influencing our industry have taken. So, in the meantime, I suggest boring anodyne utilitarian names until the dust settles.
Sadly, that is not the direction well-meaning people influencing our industry have taken. So, in the meantime, I suggest boring anodyne utilitarian names until the dust settles.
That’s a good point
Also sorry for my wording and tone. I’m not happy about how I wrote that
As an industry we haven’t gotten over the whimsy of blacklists and whitelists, master and slave, female and male connectors–or hell, even the designation of “user”.
I’d argue that such metaphors have always been viewed as more on the descriptive side of the spectrum than whimsical or cute. In fact the idea that these terms are descriptive and in some sense “objective” is the most common defense I’ve seen of keeping such terms around, not that people enjoy them. I didn’t have to have anyone explain to me the meaning of master/slave replication when I first heard that term, the meaning is very intuitive. That’s an indictment of our culture and origins, not of metaphor.
My point is not that cute names are always great, or that it’s easy to avoid their pitfalls. It’s not. But I think holding space to be playful with language is often more generative than trying to be dry and descriptive. And it’s when we think we’re at our most objective that our subjectivity can lead us furthest astray.
I think you’re completely missing that every product and open source project that you simply use will have such a name (maybe except OBS), so why is it different for local homegrown stuff?
For the same reason that brand names can be “Amazon” or “Apple” but we don’t start renaming “computers” or “shopping websites” to something else. OSS projects exist in the same space as “brands” – typically a competitive space in which there are multiple competing solutions to the same problem. In that context, differentiating serves a purpose. However, it also has a cost: what your product does needs to be taught and remembered.
It’s possible that at a very large company some of those same forces take effect, but at small and medium companies they don’t. There, following the same principle would be like insisting everyone in the house call the kitchen sink “waterfall”. Why would I do that? “kitchen sink” works perfectly.
So I guess we slightly differ on what cute and descriptive mean. sqlite has sql in it, descriptive, but -lite gives it a double meaning and makes it unique.
logstash_json is a horrible name, imho - because you could theoretically do the same project in many languages, it could be for logstash itself, or some intermediary product. (I don’t remember the specific one in question). Also libpng is very old and the assumption would be “this is the most popular PNG image library, written in C”, maybe because of the (weakly held) naming convention. These days we get many more new projects per year, but in theory a PNG lib in Erlang/Ruby/Brainfuck could also be called libpng, it’s just that the name was taken.
Maybe I am completely wrong here, but I understood the OP’s post more as “don’t use bland descriptive, pidgeonholed names” and you argue more over “don’t do cute names” - so maybe it’s a middle ground.
And yes, I still remember when a coworker wanted to call 2 projects “Red Baby” and “Blue Frog” or something and no one had any clue why, he couldn’t explain it, and we said: Why would one be red and one be blue?
logstash_json has the project slug “Formats logs as JSON, forwards to Logstash via TCP, or to console.”. That’s roughly what I’d expect from the name, something to do with Logstash and something to do with JSON.
libpng is…“libpng is the official PNG reference library.” Sure, a slightly better name would’ve been “png_ref_implementation” or whatever, but again, the name tells me what to expect.
sqlite “implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine. “ So, you know, a SQL thingy but not large. Light, if you would. Again, maybe the name could mention the standalone nature, but that’s neither here nor there.
I think that bland, descriptive names are in fact exactly the right answer.
Again, examples from the other side:
nokogiri is a library for Japanese saws…no, wait, it’s for parsing XML and HTML.
Angular is a library for angles…no, wait, it’s an HTML templating and data binding library (among other things).
Beautiful soup is a library about soup…no, wait, another HTML/XML munging library.
Ristretto is a library for short shots of espresso…no, wait, it’s a caching library.
kubernetes is some tool for pilots…wait, no, it’s container orchestration and provisioning.
Prometheus is tool for giving fire to mortals…wait, crap, it’s a time series DB and monitoring setup.
He’s obviously being defensive, but he has a good point about considering other types of safety than just memory. For example, languages without something like RAII don’t have a good way to enforce the cleanup of resources in a timely way — you have to remember to use optional constructs like “defer” to call cleanup code, otherwise the cleanup won’t happen until the GC decides to finalize the owning object, or maybe never. The arbitrary nature of finalizers has been a pain point of Java code for as long as I can remember, when working with any resource that isn’t Pure Java(tm).
Part of the problem though is that:
a) That is a deflection from the entire point of the NSA thing that Stroustrup is ostensibly replying to, which is that almost all serious safety problems are memory safety problems of some kind, which C++ can not seriously mitigate
b) The ‘other forms of safety’ that Stroustrup talks about in the linked letter and positions as being better without actually explicitly arguing for it (what he calls ‘type-and-resource safety’) are also things that C++ just can fundamentally never do - the linked documents are about as serious an approach to getting the described properties for C++ as the smart pointer work was about getting memory safety for C++.
Like, C++ doesn’t have memory safety (and also some related things like ‘iterator invalidation safety’) and fundamentally cannot get it without massively breaking changes, and (specifically) the lack of temporal memory safety and aliasing safety means that their approaches to ‘type-and-resource safety’ will fundamentally do essentially nothing.
This is part of a long pattern of Stroustrup trying to stop any possibility of progress on safety by (amongst other things) using his name, reputation, and any position he is able to get to push for the diversion of effort and resources into big and difficult projects of work that will look like progress but fundamentally cannot ever achieve anything good.
I would argue that memory safety is not a problem of the C++ language, it’s a problem of implementations. Real Soon Now[1], my team is going to be open sourcing a clean slate RTOS targeting a CHERI RISC-V core. The hardware enforces memory safety, the key RTOS components are privilege separated and the platform has been an absolute joy to develop.
Languages like Rust have a stronger guarantee: they check a lot of properties at compile time, which avoids the bugs, rather than simply preventing them from being exploitable. This comes with the caveat that the only data structure that you can express is a tree without dipping into unsafe (either explicitly or via the standard library) and then you need to reason about all of the ways in which those unsafe behaviours interact, without any help from the type system. The Oakland paper from a while back that found a couple of hundred CVEs in Rust crates by looking for three idioms where people misuse things that hide unsafe behind ‘safe’ interfaces suggests that people are not good at this.
The other problem that we’ve seen with Rust is that the compiler trusts the type system. This is fine if all of the code is within the Rust abstract machine, but is a nightmare for systems that interact with an adversary. For example, we saw some code that read a Rust enumeration from an MMIO register and checked that it was in the expected range. The compiler knew that enumerations were type safe so elided the check, introducing a security hole. The correct fix for this is to move the check into the unsafe block that reads from the MMIO register, but that’s the kind of small detail that’s likely to get overlooked (and, in code review, someone may well say ‘this check isn’t doing anything unsafe, can you move it out of the unsafe block?’ because minimising the amount of unsafe code is normally good practice). We need to check a bunch of things at API boundaries to ensure that the caller isn’t doing anything malicious and, in Rust, all of those things would be things that the compiler would want to assume can never happen.
We will probably rewrite a chunk of the code in Rust at some point (once the CHERI support in the Rust compiler is more mature) because there are some nice properties of the language, but we have no illusions that a naive Rust port will be secure.
[1] I think we have all of the approvals sorted now…
The other problem that we’ve seen with Rust is that the compiler trusts the type system. This is fine if all of the code is within the Rust abstract machine, but is a nightmare for systems that interact with an adversary. For example, we saw some code that read a Rust enumeration from an MMIO register and checked that it was in the expected range. The compiler knew that enumerations were type safe so elided the check, introducing a security hole.
Heh, Mickens was right – you can’t just place a LISP book on top of an x86 chip and hope that the hardware learns about lambda calculus (or, in this case, type theory…) by osmosis :-).
This is one of the things I also struggled with back when I thought I knew enough Rust to write a small OS kernel and I was a) definitely wrong and b) somewhat disappointed. I ran into basically the same problem – reading an enum from a memory-mapped config register. As usual, you don’t just read it, because some ranges are valid, some are reserved, some are outright invalid, and of course they’re not all consecutive ranges, so “reading” is really just the happy ending of the range checks you do after a memory fetch.
At the time, I figured the idiomatic way to do it would be via the TryFrom trait, safely mapping config register values to my enum data type. The unsafe code block would read a word and not know/care what it means, then I’d try build the enum separately from that word, which was slower and more boilerplatey than I’d wished but would prevent the compiler from “helping” me along. That looked cool both on paper and on screen, until I tried to support later revisions of the same hardware. Teaching it to deal with different hardware revisions, where valid and reserved ranges differ, turned out to be really stringy and more bug-prone than I would’ve wanted.
My first instinct had been to read and range-check the values in the unsafe block, then build the enum From that, which was at least slightly faster and more condensed (since it was guaranteed to succeed) – or skip enumerating values separately altogether. However, it seemed that was just safety theater, as the conversion was guaranteed to succeed only insofar as the unsafe check was right, thus reducing the whole affair to C with a very verbose typecast syntax.
Frankly, I’m still not sure what the right answer would be, or rather, I haven’t found a satisfactory one yet :(.
It’s hard to say without looking a specific example, but a common trap with Rust enums is that often time you don’t want an enum, you want an integer with a bunch of constants:
struct Flag(u8);
impl Flag {
const Foo: Flag = Flag(0);
const Bar: Flag = Flag(1);
}
I may be misunderstanding some details about how this works, but in the context of interfacing with the underlying hardware I think I generally want both: a way to represent related values (so a struct Flag(u8) with a bunch of constant values) and an enumerated set of valid flag values, so that I can encode range checks in TryFrom/TryInto. Otherwise, if I do this:
let flags = cfg.get_flags(dev.id)?
where
fn get_flags(&self, id: DeviceId) -> Result<Flag>
I will, sooner or later, write get_flags in terms of reading a byte from a corrupted flash device and I’ll wind up trying to write Flag(42) to a config register that only takes Flag::Foo or Flag::Bar.
Having both means that my config read/write chain looks something like this: I get a byte from storage, I build my enum Flag instance based on it. If that worked, I now know I have a valid flag setting that I can pass around, modulo TryFrom<u8> implementation bugs. To write it, I hand it over to a function which tl;dr will turn my flags into an u8 and yell it on the bus. If that function worked, I know it passed a valid flag, modulo TryInto<u8> implementation bugs.
Otherwise I need to hope that my read_config function checked the byte to make sure it’s a valid flag, and that my set_config function checked the flag I got before bus_writeing it, and I do not want to be that optimistic :(.
I would argue that memory safety is not a problem of the C++ language, it’s a problem of implementations. Real Soon Now[1], my team is going to be open sourcing a clean slate RTOS targeting a CHERI RISC-V core. The hardware enforces memory safety, the key RTOS components are privilege separated and the platform has been an absolute joy to develop.
That’s cool. I’m quite excited for CHERI. My question is this - when you do run into a memory safety issue with CHERI what is the dev experience? In Rust you get a nice compiler error, which feels much “cheaper” to handle. With CHERI it feels like it would be a lot more expensive to have these bugs show up so late - although wayyyyyyy better than having them show up and be exploitable.
The Oakland paper from a while back that found a couple of hundred CVEs in Rust crates by looking for three idioms where people misuse things that hide unsafe behind ‘safe’ interfaces suggests that people are not good at this.
For sure. Rudra is awesome. Unsafe is hard. Thankfully, the tooling around unsafe for Rust is getting pretty insane - miri, rudra, fuzzing, etc. I guess it’s probably worth noting that the paper is actually very positive about Rust’s safety.
My opinion, and what I have observed, is that while there will be unsafety in rust it’s quite hard to exploit it. The bug density tends to be very low, low enough that chaining them together can be tough.
This is fine if all of the code is within the Rust abstract machine, but is a nightmare for systems that interact with an adversary. For example, we saw some code that read a Rust enumeration from an MMIO register and checked that it was in the expected range. The compiler knew that enumerations were type safe so elided the check, introducing a security hole.
I don’t understand this. What are you referring to with regards to “an adversary”. Did an attacker already have full code execution and then leveraged a lack of check elsewhere? Otherwise if the compiler eliminated the check it shouldn’t be possible to reach that without unsafe elsewhere. Or did you do something like cast the enum from a value without checking? I don’t really understand.
We need to check a bunch of things at API boundaries to ensure that the caller isn’t doing anything malicious and, in Rust, all of those things would be things that the compiler would want to assume can never happen.
That’s cool. I’m quite excited for CHERI. My question is this - when you do run into a memory safety issue with CHERI what is the dev experience? In Rust you get a nice compiler error, which feels much “cheaper” to handle. With CHERI it feels like it would be a lot more expensive to have these bugs show up so late - although wayyyyyyy better than having them show up and be exploitable.
It’s all run-time trapping. This is, I agree, much worse than catching things at compile time. On the other hand, running existing code is a better developer experience than asking people to rewrite it. If you are writing new code, please use a memory-safe (and, ideally, type-safe language).
I don’t understand this. What are you referring to with regards to “an adversary”.
One of the problems with Rust is that all non-Rust code is intrinsically unsafe. For example, in our model, we can pull in things like the FreeRTOS network stack, mBedTLS, and the Microvium JavaScript VM without having to rewrite them. In Rust, any call to these is unsafe. If an attacker compromises them, then it’s game over for Rust (this is no different from C/C++, so Rust at least gives you attack-surface reduction).
If a Rust component is providing a service to untrusted components then it can’t trust any of its arguments. You (the programmer) still need to explicitly check everything.
Did an attacker already have full code execution and then leveraged a lack of check elsewhere?
This case didn’t have an active adversary in software. It had an attacker who could cause power glitches that caused a memory-mapped device to return an invalid value from a memory-mapped register. This is a fairly common threat model for embedded devices. If the out-of-range value is then used to index something else, you can leverage it to gain memory corruption and possibly hijack control flow and then you can use other paths to get arbitrary code execution.
I’m just not understanding who this attacker is.
Everyone else who provides any code that ends up in your program, including authors of libraries that you use. Supply chain vulnerabilities are increasingly important.
On the other hand, running existing code is a better developer experience than asking people to rewrite it.
For sure. Mitigations like CHERI are critical for that reason - we can’t just say “well you should have used Rust”, we need practical ways to make all code safer. 100%.
If an attacker compromises them,
So basically the attacker has full code execution over the process. Yeah, unless you have a virtual machine (or hardware support) I don’t think that’s a problem you can solve in Rust or any other language. At that point the full address space is open to the attacker.
It had an attacker who could cause power glitches that caused a memory-mapped device to return an invalid value from a memory-mapped register.
This sounds like rowhammer, which I can’t imagine any language ever being resistant to. That has to happen at a hardware level - I think that’s your point? Because even if the compiler had inserted the check, if the attacker here can flip arbitrary bits I don’t think it matters.
Supply chain vulnerabilities are increasingly important.
For sure, and I think perhaps we’re on the same page here - any language without a virtual machine / hardware integration is going to suffer from these problems.
So basically the attacker has full code execution over the process
That’s the attacker’s goal. Initially, the attacker has the ability to corrupt some data. They may have the ability to execute arbitrary code in some sandboxed environment. They are trying to get arbitrary code outside of the sandbox.
This sounds like rowhammer, which I can’t imagine any language ever being resistant to. That has to happen at a hardware level - I think that’s your point? Because even if the compiler had inserted the check, if the attacker here can flip arbitrary bits I don’t think it matters.
You get equivalent issues from converting an integer from C code into an enumeration where an attacker is able to do something like a one-byte overwrite and corrupt the value.
Typically, attacks start with something small, which can be a single byte corruption. They then chain together exploits until they have full arbitrary code execution. The problem is when the Rust compiler elides some of the checks that someone has explicitly inserted defensively to protect against this kind of thing. Note that this isn’t unique to Rust. C/C++ also has this problem to a degree (for example, eliding NULL checks if you accidentally dereference the pointer on both paths) but it’s worse in Rust because there’s more in type-safe Rust that the language abstract machine guarantees in C.
You get equivalent issues from converting an integer from C code into an enumeration where an attacker is able to do something like a one-byte overwrite and corrupt the value.
I’m confused, you mean copying the int into a rust enum too narrow for it?
The problem is when the Rust compiler elides some of the checks that someone has explicitly inserted defensively to protect against this kind of thing.
Are you referring to checks at the boundary, or checks far behind it?
I’m confused, you mean copying the int into a rust enum too narrow for it?
No, the flow is a C function returning an enumeration that you coerce into a Rust enumeration that holds the same values. An attacker is able to trigger a one-byte overwrite in the C code that means that the value returned is not actually a valid value in that enumeration range. The Rust programmer doesn’t trust the C code and so inserts an explicit check that the enumeration is a valid value. The Rust compiler knows that enumerations are type safe and so elides the check. Now you have a way for an attacker with a one-byte overwrite in C code to start a control-flow hijacking attack on the Rust code.
Are you referring to checks at the boundary, or checks far behind it?
Checks in the trusted (Rust) code, outside of unsafe blocks.
The correct fix for this is to move the check into the unsafe block that reads from the MMIO register, but that’s the kind of small detail that’s likely to get overlooked (and, in code review, someone may well say ‘this check isn’t doing anything unsafe, can you move it out of the unsafe block?’ because minimising the amount of unsafe code is normally good practice).
Nit: moving code out of an unsafe block will never affect its semantics - the only thing it might do is stop the code from compiling.
Unsafe is a magic keyword that’s required when calling certain functions, dereferencing raw pointers, and accessing mutable statics (there might be a few other rare ones I’m forgetting). Beyond allowing those three operations to compile, it doesn’t affect semantics; if a statement/expression compiles without an unsafe block (i.e. it doesn’t use any of those three operations), wrapping it in an unsafe block will not change your program.
The correct fix here is to check the value is within range before casting it to the enum (incidentally, an operation that requires an unsafe block).
All that being said, your broader point is true: Rust’s stricter rules mean that it may well be easier to write undefined behavior in unsafe Rust than C.
The Oakland paper from a while back that found a couple of hundred CVEs in Rust crates by looking for three idioms where people misuse things that hide unsafe behind ‘safe’ interfaces suggests that people are not good at this.
For example, we saw some code that read a Rust enumeration from an MMIO register and checked that it was in the expected range. The compiler knew that enumerations were type safe so elided the check, introducing a security hole.
Does the compiler at least emit a warning like “this comparison is always true” that could signal that one’s doing this incorrectly?
Languages like Rust have a stronger guarantee: they check a lot of properties at compile time, which avoids the bugs, rather than simply preventing them from being exploitable. This comes with the caveat that the only data structure that you can express is a tree without dipping into unsafe
(Tracing) gc has no trouble with actual graphs, and still prevents all those nasty bugs by construction.
The other problem that we’ve seen with Rust is that the compiler trusts the type system
Yes—I am still waiting for capability-safety to be table stakes. Basically no one should get the ‘unsafe’ god-capability.
(Tracing) gc has no trouble with actual graphs, and still prevents all those nasty bugs by construction.
But it does have problems with tail latency and worst-case memory overhead, which makes it unfeasible in the kind of scenarios where you should consider C++. If neither of those are constraints for your problem domain, C++ is absolutely the wrong tool for the job.
Yes—I am still waiting for capability-safety to be table stakes. Basically no one should get the ‘unsafe’ god-capability.
Unfortunately, in Rust, core standard-library things like RC depend on unsafe and so everything would need to hold the capability to perform unsafe to be able to pass it down to those crates, unless you have a compile-time capability model at the module level.
Unsafe can be switched of at the module level and the module is indeed also the boundary of unsafe in Rust.
A mistake with unsafe may be triggered from the outside, but a correct unsafe implementation is well-encapsulated. That very effectively reduces the scope of review.
I basically agree with you! I haven’t been aware of these tendencies of his, but I’m not surprised.
But I think the types of safety provided by RIAA are valuable too. My day-job these days is mostly coding in Go and I miss RIAA a lot. Just yesterday I had to debug a deadlock produced by a high-level resource issue (failure to return an object to a pool) that wouldn’t have occurred in C++ because I would have used some RIAA mechanism to return it.
a) That is a deflection from the entire point of the NSA thing that Stroustrup is ostensibly replying to, which is that almost all serious safety problems are memory safety problems of some kind, which C++ can not seriously mitigate
Thank you. I’m soooo sick of seeing “but Rust doesn’t solve all forms of safety so is it even safe?”. “Rust is safe” means “Rust is memory safe”. That’s a big deal, memory safety vulnerabilities are highly prevalent and absolutely worst-case.
That is a deflection from the entire point of the NSA thing that Stroustrup is ostensibly replying to, which is that almost all serious safety problems are memory safety problems of some kind,
That would have to be heavily qualified to a domain – otherwise I’d say it’s just plain untrue.
String injection like HTML / SQL / Shell are arguably worse problems in the wide spectrum of the computing ecosystem, in addition to plain mistakes like logic errors and misconfiguration.
As far as I can tell, none of these relate to memory safety:
The way this PR was written made it almost seem like a joke
Nobody really likes C++ or CMake, and there’s no clear path for getting off old toolchains. Every year the pain will get worse.
and
Being written in Rust will help fish continue to be perceived as modern and relevant.
To me this read a lot like satire poking fun at the Rust community. Took me some digging to realize this was actually serious! I personally don’t care what language fish happens to be written in. As a happy user of fish I just really hope this doesn’t disrupt the project too much. Rewrites are hard!
Poe’s Law is strong with this one. Not knowing the author of Fish, I genuinely can’t tell whether the commentary is 100% in earnest, or an absolutely brilliant satire.
For sure! After I looked deeper and found that this person is a main contributor to fish things made more sense. I totally respect their position and hope things go well. I just thought the way it was phrased made it hard to take seriously at first!
The author understands some important but often underappreciated details. Since they aren’t paying anyone to work on the project, it has to be pleasant and attractive for new contributors to want to join in.
It only “has to be” if the project wants to continue development at an undiminished pace. For something like a shell that seems like a problematic mindset, albeit an extremely common one.
For something like a shell that seems like a problematic mindset
Must it?
Fish seldom plays the role of “foundational scripting language”. More often it’s the interactive frontend to the rest of your system. This port enables further pursuit of UX and will allow for features I’ve been waiting for for ages
For something like an interactive shell, I generally feel that consistency beats innovation when it comes to real usability. But if there are features that still need to be developed to satisfy the fish user base, I suppose more development is needed. What features have you been waiting for?
One large project has been to run multiple fish builtins and functions “at the same time”, to enable things like backgrounding functions (ideally without using “subshells” because those are an annoying environment boundary that shows up in surprising places in other shells), and to simply be able to pipe two builtins into each other and have them actually process both ends of the pipe “simultaneously”.
Nobody really likes C++ or CMake, and there’s no clear path for getting off old toolchains. Every year the pain will get worse.
I think that the “Nobody” and “pain” there may have been referring to the dev team, not so much everyone in the world. In that context it’s a little less outlandish a statement.
It’s also not really outlandish in general. Nobody likes CMake. How terrible CMake is, is a common topic of conversation in the C++ world, and C++ itself doesn’t exactly have a reputation for being the language everyone loves to use.
I say as someone who does a whole lot of C++ development and would pick it above Rust for certain projects.
Recent observation from Walter Bright on how C++ is perceived:
He then said that he had noticed in discussions on HN and elsewhere a tectonic shift appears to be going on: C++ appears to be sinking. There seems to be a lot more negativity out there about it these days. He doesn’t know how big this is, but it seems to be a major shift. People are realizing that there are intractable problems with C++, it’s getting too complicated, they don’t like the way code looks when writing C++, memory safety has come to the fore and C++ doesn’t deal with it effectively, etc.
My retirement gig: maintaining and rescuing old C++ codebases that most devs are too scared/above working on. I expect it to be gross, highly profitable, and not require a ton of time.
And yet, it was the ‘language of the year’ from TIOBE’s end-of-year roundup for 2022, because it showed the largest growth of all of the languages in their list, sitting comfortably at position 3 below Python and C. D shows up down at number 46, so might be subject to some wishful-thinking echo-chamber effects. Rust was in the top 20 again, after slipping a bit.
TIOBE has +/- 50% error margin and even if the data wasn’t unusable, it’s misrepresented (measuring mentions picked by search engine algorithms over a historical corpus, not just current year, not actual usage). It’s so bad that I think it’s wrong to even mention it with “a grain of salt”. It’s a developer’s horoscope.
TIOBE thinks C popularity has halved one year and tripled next year. It thinks a niche db query language from a commercial product discontinued in 2007 is more popular in 2023 than TypeScript. I can’t emphasize enough how garbage this data is, even the top 10. It requires overlooking so many grave errors that it exists only to reinforce preexisting beliefs.
Out of all flawed methods, I think RedMonk is the least flawed one: https://redmonk.com/rstephens/2022/10/20/top20-jun2022/ although both RedMonk and OpenHub are biased towards open-source, so e.g. we may never learn how much Ada DoD actually uses.
My favourite part about the RedMonk chart is that it shows Haskell going out through the bottom of the chart, and Rust emerging shortly afterwards, but in a slightly darker shade of red which, erm, explains a lot of things.
The rationale provided tracks for me as someone who is about to replace an unpopular C++ project at work with Rust. Picking up maintenance of someone else’s C++ project who is no longer at the company vs. picking up someone else’s Rust project have looked very different in terms of expected pain / risk IME.
“Getting better at C++” isn’t on my team’s dance card but “getting better at Rust” is which helps here. Few working programmers know anything about or understand native build tooling these days. I’m the resident expert because I know basics like why you provide a path argument to cmake. I’m not actually an expert but compared to most others in my engineering-heavy department I’m as good as it gets. Folks who do a lot of C++ at work or at home might not know how uncommon any thoroughgoing familiarity with C and C++ is getting these days. You might get someone who took one semester of C to say “yeah I know C!” but if you use C or C++ in anger you know how far that doesn’t go.
I’m 34 years old and got my start compiling C packages for Slackware and the like. I don’t know anyone under 30 that’s had much if any exposure unless they chose to work in embedded software. I barely know what I’m doing with C/C++ despite drips and drabs over the years. I know enough to resolve issues with native libraries, FFI, dylibs, etc. That’s about it beyond modest modifications though.
tl;dr it’s difficult getting paid employees to work on a C++ project. I can’t imagine what it’s like getting unpaid volunteers to do so.
It does seem weird. We find it easier to hire C programmers than Rust programmers and easier to hire C++ programmers than either. On the other hand, there do seem to be a lot of people that want a project to hack on to help them learn Rust, which might be a good opportunity for an open source project (assuming that you are happy with the code quality of learning-project Rust contributions).
The difficulty is that you need to hire good C++ programmers. Every time some vulnerability or footgun in C++ is discussed, people say it’s not C++’s fault, is just a crappy programmer.
OTOH my experience from hiring at Cloudflare is that it’s surprisingly easy to onboard new Rust programmers and have them productively contribute to complex projects. You tell them not to use unsafe, and they literally won’t be able to cause UB in the codebase.
I personally don’t care what language fish happens to be written in
You might not, but a lot of people do.
I wrote an tool for myself on my own time that I used often at work. Folks really liked what it could do, there’s not a tool like it, and it handled “real” workloads being thrown at it. Not a single person wanted anything to do with it, since it was written in an esoteric language. I’m rewriting it in a “friendlier” language.
It seems like the Fish team thought it through, weighed risks and benefits, have a plan, and have made good progress, so I wish them the best.
I’d rather not say, I don’t want anyone to feel bad. It’s sufficient to say, “As of today, not in the TIOBE Index top 20.”
The bigger point is that it was a tool I had been using for over a year, which significantly improved my efficiency and quality of life, and it got rejected for being an esoteric tech, even though I provided executable binaries.
That sucks. Yeah, I don’t mean to ask to hurt anyone’s feelings, I’m just always curious to know what people think are “esoteric”, cuz esoteric on lobste.rs (Factor, J, one of the advent of code langs) is going to be very different than esoteric at my job (haskell, rust).
As a happy user of fish I just really hope this doesn’t disrupt the project too much. Rewrites are hard!
Same here. As a user, it doesn’t bother me in which language it is written in.
They should absolutely pick the language that allows them to be more productive and deliver more.
I have been an happy fish user for 13 years, it is a software that proved useful from.day one. And every realease there are clear important improvements, often times new UX additions. I wish them a smoot migration.
If you’re curious about the size of the rewriting project: I ran tokei on the repo and it counted
49k lines of C++
8k lines of headers
1k lines of CMake
(and 57k lines of Fish, so there’s also a lot that won’t need to be rewritten)
Since this PR clearly escaped our little bubble, I feel like we should add some context, because I don’t think everyone caught on to the joking tone of the opening message (check https://fishshell.com/ for similar writing - we are the shell for the 90s, after all), and really got what the idea here is.
Fish is a fairly old codebase. It was started in 2005
Which means I still can’t tell the degree to which he’s joking. The idea that a codebase from 2005 is old is mind boggling to me. It’s not even 20 years old. I’ve worked on a lot of projects with code more than twice that age.
To put things into perspective, 2005 to 2023 is 18 years — that is the entire lifespan of the classic MacOS.
Modern macOS is a direct descendent of NeXTSTEP though, which originally shipped in 1989 and was, itself, descended from 4BSD and CMU Mach, which are older. Most of the GNU tools are a similar age. Bash dates back to 1989.
Most software projects just rot away in 18 years because needs or the surrounding ecosystems change.
That’s probably true, but it’s a pretty depressing reflection on the state of the industry. There are a lot of counter examples and a lot of widely deployed software is significantly older. For example, all of the following have been in development for longer than fish:
The Linux kernel (1991)
*BSD (1991ish, depending on when you count, pre-x86 BSD is older)
Most of the GNU tools (1980s)
zsh (1990)
NeXTSTEP / OPENSTEP / macOS (1989)
Windows NT (1993)
MS Office (1990)
SQL Server (1989)
PostgreSQL (1996)
Apache (1995)
StarOffce / OpenOffice / LibreOffice (original release was 1985!)
I find it far easier to understand than Makefiles and automake.
Plus it runs on ancient versions of Windows (like XP) and Linux, which is not something most build systems support. And it mostly “just works” with whatever compiler you have on your system.
Cargo can’t do 90% of the things that CMake can, but it’s so loved, because most projects don’t need to write any build script at all. You put your files in src/ and they build, on every Rust-supported platform. You put #[test] on unit tests, and cargo test runs them, in parallel. You can’t write your own doxygen workflow, but cargo doc gives you generated reference out of the box for every project. The biggest criticism Cargo gets about dependency management is that it’s too easy to use dependencies.
This convention-over-configuration makes any approach requiring maintaining a DIY snowflake build script a chore. It feels archaic like writing header files by hand.
I find it far easier to understand than Makefiles and automake.
Why does everyone hate being punched in the face? I find it far more pleasant than being ritually disemboweled.
And it mostly “just works” with whatever compiler you have on your system.
CMake is three things:
A set of core functionality for running some build tasks.
A truly awful macro language that’s been extended to be a merely quite bad configuration language.
A set of packages built on the macro language.
If the things that you want to do are well supported by the core functionality then CMake is fairly nice. If it’s supported by existing packages, then it’s fine. If it isn’t, then extending it is horrible. For example, when using clang-cl, I was bitten by the fact that there’s hard-coded logic in CMake that adds the /TC or /TP flags to override the language detection based on the filename and tell it to use C or C++. This made it impossible to compile Objective-C. A few releases later, CMake got support for Objective-C, but I can’t use that support to build the Objective-C runtime because it has logic in the core packages that checks that it can compile and link an Objective-C program, and it can’t do that without the runtime already existing.
I’ve tried to use CMake for our RTOS project, but adding a new kind of target is incredibly hard because CMake’s language is really just a macro language and so you can’t add a new kind of object with properties of it, you are just using a macro language to set strings in a global namespace.
I’ve been using xmake recently and, while there’s a lot I’ve struggled with, at least targets are objects and you can set and get custom properties on them trivially.
its an entire set of new things to learn and it generates a makefile so I worry that I’ll still have to deal with the problems of makefiles as well as the new problems cmake brings
the hacker news discussion is horrifying, the lack of humor and the smartassery there is astonishing :/ I really enjoyed the post and I learned something (the setuid bit)
I did not like the post, mainly because the “teacher” character, Cadey, came across more as a shitposter than a teacher in the original version (it was later changed to read better). I was also upset at the lack of historical context in the post, implying that the authors of sudo where beyond stupid for picking C over Rust (at least, that’s how I read it—wonder what that says about me).
What language do I use? What stack? What toolchain? What testing suite? A fuzzer? Which? Do I use formal verification? What does packaging and deployment look like? Do I have beta builds?
You don’t think about these each day, but you’re still making an implicit choice to go with what you decided before
Yes, I agree. The html appended seems very explicit. But I suppose that is also good. You might have a greet module somewhere else you don’t want confused with the templates directory.
Programmers have a long and rich history with C, and that history has taught us many lessons. The chief lesson from that history must surely be that human beings, demonstrably, cannot write C code which is reliably safe over time. So I hope nobody says C is simple! It’s akin to assembly, appropriate as a compilation target, not as an implementation language except in extreme circumstances.
Which human beings?
Did history also teach us that operating a scalpel on human flesh cannot be done reliably safe over time?
Perhaps the lesson is that the barrier of entry for an engineering job was way higher 40 years ago. If you would admit surgeons to a hospital after a “become a gutt-slicer in four weeks” program, I don’t think I need to detail what the result would be.
There’s nothing wrong with C, just like there’s nothing wrong with a scalpel. We might have more appropriate tools for some of its typical applications, but iC s still a proven useful tool.
Those who think their security burns will be solved by a gimmick such as changing programming language, are in for a very unpleasant surprise.
Perhaps the lesson is that the barrier of entry for an engineering job was way higher 40 years ago
Given the number of memory safety bugs that have been found in 40-year-old code, I doubt it. The late ‘90s and early 2000s exposed a load of these bugs because this C code written by skilled engineers was exposed to a network full of malicious individuals for the first time. In the CHERI project, we’ve found memory safety bugs in code going back to the original UNIX releases. The idea that there was some mythical time in the past when programmers were real men who never introduced security bugs is just plain wrong. It’s also a weird attitude: a good work an doesn’t blame his tools because a good work an chooses good tools. Given a choice between a tool that can be easily operated to produce good results and one that, if used incredibly carefully, might achieve the same results, it’s not a sign of a good engineer to choose the latter.
Given the number of memory safety bugs that have been found in 40-year-old code, I doubt it.
Back then, the C programmers didn’t know about memory safety bugs and the kind of vulnerabilities we have since two decades. Similar, Javascript and HTML are surely two programming languages which are somewhat easier to write than C and doesn’t suffer from the same class of vulnerabilities. However, 20 years ago people wrote code in these two languages that suffer from XSS and other web based vulns. Heck, XSS and SQLi is still a thing nowadays.
What I like about C is that it forces the programmer to understand the OS below. Writing C without knowing about memory management, file descriptors, processes is doomed to fail. And this is what I miss today and maybe @pm in their comment hinted at. I conduct job interviews with people who consider themself senior and they only know the language and have little knowledge about the environment they’re working in.
Yes, and what we have now is a vast trove of projects written by very smart programmers, who do know the OS (and frequently work on it), and do know how CPUs work, and do know about memory safety problems, and yet still cannot avoid writing code that has bugs in it, and those bugs are subsequently exploitable.
Knowing how the hardware, OS (kernel and userspace), and programming language work is critical for safety or you will immediately screw up, rather than it being an eventual error.
People fail to understand that the prevalence of C/C++ and other memory unsafe languages has a massive performance cost: ASLR, Stack and heap canaries, etc and then in hardware: PAC, CFI, MTE, etc all have huge performance costs in modern hardware, are all necessary solely due to the need for the platform to mitigate the terrible safety of the code being run. That’s now all sunk cost of course: if you magically shifted all code today to something that was memory safe, the ASLR and various canaries costs would still be there - if you were super confident your OS could turn ASLR off, and you could compile canary free, but the underlying hardware is permanently stuck with those costs.
Forcing the programmer to understand the OS below could (and can) happen languages other than C. The main reason it doesn’t happen is that OS APIs, while being powerful, are also sharp objects that are easy to get wrong (I’ve fixed bugs in Janet at the OS/API level, I have a little experience there), so many languages that are higher level end up with wrappers that help encode assumptions that need to not be violated.
But, a lot of those low level functions are simply the bottom layer for userland code, rather than being The Best Possible Solution as such.
Not to say that low level APIs are necessarily bad, but given the stability requirements, they accumulate cruft.
The programmer and project that I have sometimes used as a point of comparison is more recent. I’m now about the same age that Richard Hipp was when he was doing his early work on SQLite. I admire him for writing SQLite from scratch in very portable C; the “from scratch” part enabled him to make it public domain, thus eliminating all (or at least most) legal barriers to adoption. And as I mentioned, it’s very portable, certainly more portable than Rust at this point (my current main open-source project is in Rust), though I suppose C++ comes pretty close.
Do you have any data on memory safety bugs in SQLite? I especially wonder how prone it was to memory safety bugs before TH3 was developed.
Did history also teach us that operating a scalpel on human flesh cannot be done reliably safe over time?
I think it did. It’s just that the alternative (not doing it) is generally much much worse.
There’s nothing wrong with C, just like there’s nothing wrong with a scalpel.
There is no alternative to the scalpel (well, except there is in many circumstances and we do use them). But there can be alternatives to C. And I say that as someone who chose to write a new cryptographic library 5 years ago in C, because that was the only way I could achieve the portability I wanted.
C does have quite a few problems, many of which could be solved with a pre-processor similar to CFront. The grammar isn’t truly context free, the syntax has a number of quirks we have since learned to steer clear from. switch falls though by default. Macros are textual instead of acting at the AST level. Everything is mutable by default. It is all too easy to read uninitialised memory. Cleanup could use some more automation, either with defer or destructors. Not sure about generics, but we need easy to use ones. There is enough undefined behaviour that we have to treat compilers like sentient adversaries now.
When used very carefully, with a stellar test suite and sanitisers all over the place, C is good enough for many things. It’s also the best I have in some circumstances. But it’s far from the end game even in its own turf. We can do better.
And I say that as someone who chose to write a new cryptographic library 5 years ago in C, because that was the only way I could achieve the portability I wanted.
I was wondering why the repo owner seemed so familiar!
Those who think their security burns will be solved by a gimmick such as changing programming language, are in for a very unpleasant surprise.
I don’t think that moving from a language that e.g. permits arbitrary pointer arithmetic, or memory copy operations without bounds checking, to a language that disallows these things by construction, can be reasonably characterized as a gimmick.
There’s nothing wrong with C, just like there’s nothing wrong with a scalpel.
This isn’t a great analogy, but let’s roll with it. I think it’s uncontroversial to say that neither C nor scalpels can be used at a macro scale without significant (and avoidable) negative outcomes. I don’t know if that means there is something wrong with them, but I do know that it means nobody should be reaching for them as a general or default way to solve a given problem. Relatively few problems of the human body demand a scalpel; relatively few problems in computation demand C.
What we would consider “modern” surgery had a low success rate, and a high straight up fatality rate.
If we are super generous, let’s say C is a scalpel. In that case we can look at the past and see a great many deaths were caused by people using a scalpel, long after it was established that there was a significant differences in morbidity when comparing a scalpel, to a sterilized scalpel.
What we have currently is a world where we have C (and similar), which will work significantly better than all the tools the preceded it, but is also very clearly less safe than any modern safe language.
This reminds me of an issue I have with Discord and gmail. Every time I get a Discord invite to my gmail account the link is invalid. Is it possible that Google is accessing my link before me and ruining the invite?
Or you can, you know, use language the way it is intended to be. What is the value of starting a comment with
suggestion:
rather than “This is just a suggestion, but ..”?Metadata aggregation might’ve been an argument before, but isn’t anymore now that we’ve entered a time when software can categorize and understand context almost as well as another human can.
I’ll never cease to be surprised by the desire some programmers have to formalize and constrain the interactions they have with their coworkers.
I don’t think “software misunderstands no worse than humans” is a great sell. I’ll take completely reliable interpretation over that, for both humans and machines.
And I’ll never cease to be surprised by the desire of some people to go through endless rounds of clarifications because they refuse to be clear and precise. High context cultures are an impediment to science and engineering, and a major safety problem.
Hyperbole much?
Maybe this rubbed me the wrong way precisely because i’m from one of those so called high context cultures (Argentinian), but i mean come on, wouldn’t that statement sound awful if you replaced “high context cultures” with “Latinamerican and Indian cultures”? (i know there are other cultures also considered high context too)
Thing is, yes, we love having high context, having internal jokes, saying a lot without saying much. And we do find it stupid to make all the most obvious things explicit, even infantilizing. And no, i don’t think this is any impediment to our work. Cannot speak for science, but i feel pretty confident about this regarding software engineering. I’ve worked with people from all around the world, and i’ve seen Argentinian coworkers produce code of higher or equal quality to code from overpaid developers from FAANG companies on the US. There’s good and bad developers everywhere.
And regarding conventional comments in particular, i suggested using them on my previous company, and even tried using them myself for a couple of PRs. It didn’t go well. People unanimously preferred just saying things with normal language (myself included). It’s quite easy to communicate when something is just a suggestion vs when something should be changed before merging, and it feels very forced and unnatural to say that with a strict syntax. If we can’t communicate that effectively with natural language, we have bigger communication problem in general. My teammates were mostly Min/Eastern Europeans BTW, and they were totally against this convention.
Simple formalized grammar can be parsed by a simple and transparent program/script, that is cheap to write and cheap to run (CPU, RAM, bandwidth…). Compare it to AI/LLM – it runs often in
the cloud(on other people’s computers), consumes tremendous power of CPU/GPU/RAM and is not transparent at all.AI is useful for some tasks and sometimes can do a job that would be impossible (too expensive) to do without it. But intentionally generating garbage instead of structured well-defined data just because „AI can process garbage“ is not a good idea (at least at current state – resource consumption, lack of transparency, centralization, monopoly/oligopoly…).
I agree that in this case the formalization is bit unnecessary. Of course, I also have a script that parses and summarizes TODO/FIXME comments, but… when something is really important, it should be managed in a bug/requirement/change tracking system, not just be scattered in some comments (structured or unstructured, does not matter).
Why are you looking at this as “generate structured data” or “generate garbage”? You’ve very quickly assumed the only alternative is garbage, while michiel is suggesting that writing your comments in a way that a human can disambiguate what you mean is just as likely to be automatically classified by an LLM.
I agree with your cost analysis, but you lost me after that.
Less friction on the commenter. That alone would be worth it for me.
I think it may also work to help reduce the amount of effort a newcomer spends trying to not to come across as too aggressive before they can properly integrate into the project culture. And that’s assuming they’ll be interacting on regular basis in the first place. For me personally, each time I interact with a new community or project tends to be fraught with awkward and inordinate amount of time spent on phrasing
I came across the site a while ago and funny enough, machine parsability was the one thing I didn’t remember about the idea. It was the soft benefits alone that resonated for me
If I’m looking at a PR with sixty comments, being able to quickly scan them by prefix match is nice. This kind of consistency lets me look for a particular comment (or skip over comments) without really engaging the language center of my brain. That means I am less likely to lose (or corrupt) the context I have in my head.
This does not reflect my experience of software, and certainly not with the degree of confidence I can get from fixed tokens.
This is indeed a problem. But I’m not satisfied with the alternatives; in the case of DIDs you either assume every user owns a DNS record (nope) or fall back to a centralized system like Bluesky’s.
In the system I’m (interminably) developing, the private key is held in the most secure storage available, like the Mac/iOS Keychain. The drawback is that an identity is tied to a physical device, so if you use multiple devices you’re a different user on each one. It’s ok for now but I’m not happy with it.
The best solution I see is to treat a person’s identity as an aggregate of device identities. Under the hood each device identity cross-certifies the others (requiring some sort of pairing UX when setting up a new device.). At the UI layer, all identities belonging to the same user are displayed as that user.
The remaining problem is losing control of an active device, like if your phone gets stolen and you can’t remotely wipe it. In that case your other devices would post revocations of the lost ID. (What happens if you had only two devices, and you and the attacker each revoke the other’s, is left as an exercise for the reader.)
The Passkey mechanism is intended to somewhat address this. The keys stored in one device’s Secure Element can be encrypted with a public key of another Secure Element. This makes iCloud a somewhat scary single point of failure, because (as I understand the totally undocumented protocol) the key exchange happens via iCloud and iCloud attests to the target system’s key being valid. At least it depends on user action (though someone who can compromise iCloud could probably push put a malicious OS update that removed that requirement).
This was the model for the Pico project from the University of Cambridge. Their idea was to have large numbers of ‘pico siblings’ that would, between them, provide an attestation of identity. If you lost a few of your devices, that wouldn’t be enough for an attacker to impersonate you. You might put some more-privileged ones in a safe somewhere so someone who can open the safe and provide the device with the relevant biometrics would be able to invalidate all other devices. Their goal was to make these things cheap so that they could be embedded in things like earrings, watches, and so on, so you’d typically carry half a dozen of them around with you and that would be sufficient for most systems.
All of these feel somewhat unsatisfactory. I’ve never found a system that seems like it works in both the case that someone breaks into your house and steals most of your devices and in the case that your house burns down with all of your devices inside. Either you allow the attacker to impersonate you in the first case, or you lose the ability to become you in the latter case.
I thought Passkeys were an open protocol (FIDO?) Or is the mechanism to propagate them between devices an Apple extension? I was just assuming the passkeys’ private keys lived in the Keychain, which is shareable between devices and E2E encrypted.
(I could share my protocol’s private keys via the Keychain, but that’s platform-specific. And also the protocol is based on an append-only signed log, so multiple devices would create conflicts…)
Those are both very bad/rare situations; maybe it’s ok if a system can’t survive every disaster. Or if disaster resilience requires you take extra measures like putting backup keys in a bank safety-deposit box.
The client-server bits are WebAuthn.
That’s the proprietary bit.
Yes, but the mechanism for sharing is complicated. You don’t want to allow arbitrary software to dump the contents of the keychain, but for passwords you need to because the password is often the thing that you send to the server. For passkeys, you perform a computation that proves that you hold the key (for example, sign a nonce with the private key, which the server can then decrypt with the corresponding public key). Much of the increase in security comes from the fact that these things never leave trusted hardware. On Windows, they’re typically stored in the TPM, with the Apple ecosystem they’re stored in Apple’s Secure Element (this might be one key per server, but often just one private key in the hardware and a key-derivation function that derives a key from it and something OS-provided and then allows using the derived key for signing).
You really don’t want to allow a generic API to extract these keys because an attacker can use it to dump things. The normal work around for this (and I’m completely guessing at what Apple does) is to use a key-wrapping protocol. You do some communication between two HSMs that allows them to exchange a key and then you encrypt the key with that negotiated key. For this to be safe, you need some kind of attestation that the target HSM is one that the source should trust. I believe Apple does this via iCloud, with iCloud providing that guarantee. Or possibly part of that guarantee (I’d probably do it with something baked into the hardware and a revocation list in iCloud, so iCloud could say ‘don’t trust this device, I have reason to believe it’s compromised’ but not ‘trust this device, I promise it’s not a SE-emulator designed to steal all of your credentials, honest!’).
The problem with that as a backup is that now the bank needs some mechanism for establishing my identity. Which they often do with login to some computer system these days. And I can’t do that if my keys are lost. I’m more inclined to think in the direction of Shamir Secret Sharing and share a recovery key between several friends and family members so that a sufficient number of them colluding can recover my key, and set that number to be large enough that the ones that know each other can’t do it. With a sufficiently dysfunctional family, you get an extra layer of protection if you share with the people that have personal reasons not to collude under any circumstances.
If I were a bank I’d go with biometrics, at least for something physical like a safe deposit box, at least as a last resort.
This has problems I’d you’re injured. Fingerprints and face recognition can both fail after a serious injury, unfortunately.
Safe deposit boxes (at least here in Sweden) are on their way out. They used to be an attempt to monetize the area of the vault, which is needed way less nowadays as less and less physical cash is stored in a bank branch. Hence, safe deposit boxes are less and less prevalent, and they are pretty expensive.
It’s economically infeasable to demand that each and every bank customer has a safe deposit box just to enable a backup identity token.
If memory serves correct, I’ve been asked for the user password of a existing device when provisioning a new device
I’m inclined to agree that the author’s proposed changes could be small improvements, but:
My biggest problem with writing on my Android phone is how such phones have become tall and skinny nowadays. I used a 2014 phone until the early 2020s, and its slightly more squarish aspect ratio gave it slightly wider keys that I found easy to use. After finally switching to a modern phone that’s slightly narrower and so has slightly narrower keys, I find I press the wrong keys significantly more often, which slows me down as I either slow down in typing to avoid typos or stop to fix the typos I made. (If I rotate the phone into landscape mode, then, because the phone is tall (now wide), I find the keys become impractically wide, and, because the phone is skinny (now short), I find the area of text I can see impractically short.)
The narrowness of phones also exacerbates the difficulty of text selection that the author laments, as it makes the letters be smaller much like it makes the keys be smaller (although this can be changed by raising the default font size, at the cost of seeing less text at once, unlike the width of keys, which would require a fundamentally changed keyboard layout to be unconstrained by the width of the display).
Second after that problem, I would put the lack of undo/redo as a globally available function of the keyboard (although, e.g., Google Docs implements its own undo/redo).
I don’t mind fixing typos since I’m used to either fixing the before hitting send or editing my message immediately after regardless of platform
Hey, I’ve just started learning GPU programming too. If you want, I’d love to share resources as we learn. Currently, I’m focused on Metal, though I think everything except shared memory should applicable.
Here’s some resources that helped me understand how modern GPU APIs are architected:
Short YT video:
Getting Started with Metal
WWDC 2014:
Working with Metal: Overview
Working with Metal: Fundamentals
I don’t think there is a correct usage of fsync. AFAIK, Linux still marks unflushed dirty pages as clean. Crashing the application isn’t enough, you need to purge the page cache. Maybe going as far as rebooting the entire OS
It’s platform specific. I believe illumos systems have a correct implementation of fsync, for example. We inherited it from Solaris all those years ago. I expect at least FreeBSD does as well.
This article singlehandedly convinced me that time zones are a good idea and solve a real human problem. I am not in favor of abolishing time zones.
I do think that not having daylight savings time would be a good idea, although not because it would make computer timekeeping meaningfully simpler - once you have the concept of time zones, you need something like the tz database to keep track of when the administrative boundaries of time zones move, and adding dst rules to that is not a huge amount of additional complexity. The reason I care about DST is mostly because an extra hour of sleep in November doesn’t make up for the lost hour in March.
I also think that it might be a good idea to split the continental US into two time zones, the boundary following the current Mountain/Central time zone boundary, where the clock times of the zones differ by two hours. This would reduce the effective time difference between the east and west coasts, without significantly changing the mapping of the position of the sun to clock time compared to the status quo (at least not any more than daylight savings currently does).
I remember there being measurable health effects depending on which side of a time zone you lived. Essentially being: “How dark is it outside when you wake up.”
Wider timezones would probably exasperate that
I enjoyed reading this, but Fastly’s Fast Forward provides free CDN services for open source projects which might help with the “most Linux distributions are free and thus don’t have a project budget to spin up a global CDN” problem.
I have no experience of Fast Forward myself, but I’ve heard that OpenStreetMap and meta::cpan have benefited from it, and I’m sure others have too.
Still, running a CDN off repurposed thin clients is very impressive.
It’s also a unknown when Fastly will do a rug-pull and stop supporting Open Source projects with a free CDN, so building your own, as well as using Fastly’s is probably the best way to go.
Hosting files on CloudFlare R2 is 0.015 / gb-month with no egress costs.
https://developers.cloudflare.com/r2/pricing/
I could reasonably see a small homebrew linux distro having 100GB in downloadable assets, that would be 1.5$ a month.
Void recently started offering a Fastly mirror, but I find it slow compared to the other Tier 1 mirrors
Really? That’s interesting! What region are you in?
Montgomery County, Maryland, US
Checking again, it seems like the issue is that Fastly almost never has anything in cache
Interesting, I know for certain there are some void hosts in that region. Probably need to make sure their mirrors are pointed correctly to warm the cache.
I’ve been using discord heavily since 2015, and I’m fine with moving to standard usernames. If you put a gun to my head, I couldn’t tell you my discriminator, and I’m not sure that I knew it was called that before the announcement. It has always been a completely random number to me, and I have to open discord every time I want to share it with someone.
& from the linked tweet:
I 100% agree people should be able to opt out of user searching. There are already some good privacy options in place, such as disabling messages from people that aren’t friends with you. I don’t however think this change impacts harassment much, or at least I’d need to see some data to backup that claim.
What I’m failing to understand is how switching your discriminator from
jado#1234
tojado#4321
to avoid harassment is substantially different from switching@jado1234
to@jado4321
. They seem identical to me, but maybe the search feature could/would interact with this change differently.Additionally, impersonation is already a problem on discord, and I would say the discriminator system facilitates it because (ignoring Nitro) everyone has a random number. Prominent community members in one server will probably have a rank, but when they go to another server they look like every other member.
This can certainly be true, but I don’t think it’s guaranteed. Every time we push a user facing change that breaks some workflow, it is met with 1000 twitter replies saying “this new XYZ looks awesome” and 1000 replies saying “this new XYZ looks like trash”.
One aspect is massively psychological. A “discriminator” discriminates, visually, literally. You are not
jado1234
. You arejado
, the 1,234th. Nobody can take the identity ofjado
away from you. Nobody is going to try and steal your Discord account because they’re upset they can’t bejado
or their new cryptocurrency scam isjado
coin. They can bejado
too.(r.e. impersonation - you can’t impersonate someone’s discriminator, which is why people often stick with the same one. even between servers you’re still
jado#1234
, people often persist with one tag for this consistency to avoid impersonation.)Could you elaborate on this? My understanding is that displaynames already don’t tell you anything about the user.
Can someone explain what this shows. All I see is a screenshot from a unknown[0] website with a username
@s3.amazonaws.com
. So what are the security implications of this?[0] as far as I have understand it’s from Bluesky. This doesn’t help, because I don’t know what Bluesky is or does.
Bluesky is a social media site. Part of their take on ‘verified users’ (and a part of how they notionally do ‘portable identity’) is to have a process where a user can claim a domain, and it gets verified. But they used a janky home rolled solution which did not think it through (rather than webfinger or something like the verification part of ACME/letsencrypt or one of the cert signing/identification standards or one of many other already existing options), making it easier than it should be for random people to claim your identity and get verified.
That’s disappointing. I saw that they use DNS names in their IDs but I thought it was the more typical/scalable “user@domain”, not just “@domain”… [facepalm]
Could you elaborate on scalability? I’m assuming you mean creating DNS records vs serving http responses, but I don’t see where wildcard records +
Origin:
fall short?Sorry, mixed up what
Origin:
meantThere’s still other ways of indicating what domain you’re requesting though
I just meant that user@domain scales to any number of users per domain, vs. everyone needing to register their own.
Bluesky: https://bsky.app/ Powered by: https://atproto.com/
Which is basically a competitor to Mastodon/Fediverse and ActivityPub.
He may have been banned but Drew’s words ring true: https://drewdevault.com/2021/12/28/Dont-use-Discord-for-FOSS.html
Don’t lock your FOSS project into proprietary communications. If you believe in FOSS for your project, you can believe in FOSS for your communications too… and the wealth of alternative clients and supported platforms (and decentralization!) will lead to greater accessibility.
I am fully on board with decentralization, federation, open platforms and clients. I think they will always win in the end, and they should be given the priority. I use Matrix and pay Element for a server even if though I am capable of hosting it myself. I use Mastodon. I use IRC, but only through the Matrix clients.
I also use Discord. I have spent a lot of time on this platform as well, and I’m sorry to say that it’s one of the best experiences out there. None are perfect, but the Discord client is years ahead of all the Matrix clients combined. Drew mentions accessibility in his post, but that information is a bit dated as Discord has progressed in this area. Enough that a blind student of mine could easily navigate and use the platform a year ago.
FOSS projects should definitely prioritize FOSS modes of communication. But they should also be accepting of proprietary paths, in an effort to grow their communities. Let’s face it, there are large groups of people on this planet that will never sign up for Matrix, but use Discord on a daily basis. If we want to grow our projects, and make things accessible to everyone, we also need to consider where all the users are instead of requiring them to join yet another service. This is surely an area where I disagree with Drew, as he seems to neglect the value of any one else’s preferred modes of communication/work.
Chat isn’t going anywhere, so projects need to accept and embrace it as best as we can. My personal preference is that projects should bridge the chat systems, and Matrix obviously provides that capability. With Matrix bridges one can also chat with Discord and IRC users. Bridges have their problems, but we can meet the users where they are without compromising our beliefs.
It doesn’t hurt a project to give people from proprietary networks the ability to join the conversation, in fact I’d argue it benefits them.
Aren’t people always posting about FOSS maintainer burnout? I’d wager that adding additional users who aren’t willing or able to use simple FOSS tools will be more of the burnout-inducing kind than the helpful community-minded sort. If you present your FOSS project as a product with all the regular support channels that proprietary products use, you’re gonna get people whose relationship to your project is as consumers of a product.
I do wonder how closely I will follow this idea myself, though. I’m developing a project on sourcehut and am considering setting up a copy of it on github so people can find it and file issues and such.
Yes, each maintainer should make their own decisions on how wide of a net they wish to cast. If you’re making a personal project and don’t want a bunch of consumers, then you’re probably not to the level I was thinking of when I wrote that. A single Matrix channel is actually a lot less maintenance than an entire Discord server.
Personally I mirror my repos across providers for a number of reasons. That includes discoverability, but also resilience and availability. I use codeberg for some things, but have scaled back from sourcehut because I personally don’t like the interfaces as much (and I can’t automate pushing from my gitea instance to sourcehut yet).
Bridging from Matrix also has (what some would call) an advantage - Matrix folks get a first-class experience because Matrix was designed for bridging, whereas Discord looks just a little funky because all the bridged users show up as “bots”. This way there’s a natural encouragement to “upgrade” to Matrix to get a more natural experience. And it lets people show up to the Discord and ask why half the people are labeled bots, instead of bailing out on Matrix before anyone gets a chance to engage in a conversation with them. (Disclaimer: I’ve never used this bridge, I’m going off educated guesses and the screenshot I linked above.)
Discord bridges on Librea are comically bad. I’ve been in multiple channels where introducing a Discord bridge caused the channel to just disintegrate.
Matrix bridges are like night and day. There’s a few quirks around messages that get edited, but overall the integration is just leaps and bounds better. Messages show as being sent by the sender instead of the bridge-user.
Speaking a heavy discord user, bridges spark an immediate self-question asking if whatever I’m about to do is worth it.
Because bridges are irritating or because they remind you that Discord is proprietary? Or something else?
I accidentally closed this tab and everything after the horizontal break was OCRed from a frame buffer in Android Recents which was still there for some reason. Also I’m sleep deprived
Bridges can have bad, heavy handed translations of certain semantics such as replies and message links. Doesn’t help that those features are my first go-to when referencing past knowledge in conversation.
Especially when they’re implemented as Rich Embeds (Rich Embeds being the part of the message representation OpenGraph and Cards are parsed into. Bots can generate their own freely. Not mere media embeds aka attachments). You don’t mind when it’s a handful of people in a small friendly community, but at some threshold of something you really do start to mind. (Though it might really have everything to do how a bridge decides to render and translate everything and that I’m merely misplacing my frustrations. There are definitely communities I lurk in where don’t even notice the seams)
Ultimate answer is that it doesn’t matter if some minor thing like the Bot tag looks funky or the bot embeds the username in the message content instead of changing the represented username per message, so long as the experience is enjoyable for both natives
(If you post a message via webhook you can add the username and avatar to be used for that message but you can’t use replies. Pick your poison)
I hope I didn’t imply meaning we should completely shun such platforms, however there are five big issues with the current state of choosing something like Discord or Slack
An increasing number of projects, especially by newer to FOSS maintainers, exclusively support Discord or Slack. This shouldn’t be acceptable.
Matrix/XMPP/IRC are treated as subordinate rather than the home with a bridge or secondary support on proprietary networks. The goal should be to get them to cross over and long-term maintenance as proprietary options need to make no promises to maintaining backwards compatibility to support bridges or even stable ToSs.
With Matrix Spaces/Rooms and XMPP MUCs, users can join in a decentralized manner so they’re not required to create an account with your provider, and it’s should be totally acceptable to host a server behind a proxy for anonymity if those are a user’s desires.
As a workplace cooler or the hallway track of a conference, community building is between humans and private discussions should be encouraged but E2EE should be the default. Just as you wouldn’t want eavesdroppers recording private conversation (and then selling to data brokers or giving to the cops), private coms should remain private and this should be valued by the community.
Public communications should in most cases be search engine indexable and archivable.
Since 2008, there has been an IEEE 754 standard for decimal floating point values, which fixes this.
The fundamental problem illustrated here is that we are still using binary floating point values to represent (i.e. approximate) decimal values.
Yeah, and Python, Julia’s language of choice, has about the world’s only easily accessible implementation of IEEE 754 decimals. Little known fact, Python’s
Decimal
class is IEEE 754-compliant arithmetic!I was flabbergasted to know that Julia is not Julia’s language of choice.
Cool! I didn’t realize that :)
Ecstasy’s decimal types are all built around the IEEE 754 spec as well. Not 100% implemented at this point, though.
Is this expected?
If you want to create a literal Decimal, pass a string:
When you pass a float, you’re losing information before you do any arithmetic:
The problem is that
0.1
is not one tenth–it’s some other number very close:Whereas if you create a Decimal from a string, the Decimal constructor can see the actual digits and represent it correctly:
I mean it solves this loosely. The places where decimal vs. non-decimal matters - certainly where this seems to come up - are generally places where I would question the use of floating vs fixed point (of any or arbitrary precision).
Base 10 only resolves the multiples of 1/10 that binary can’t represent, but it still can’t represent 1/3, so it seems like base 30 would be better as it can also accurately represent 1/3, 1/6, in addition to 1/2, 1/5, and 1/10. Supporting this non binary format necessarily results in slower operations.
Interestingly to avoid a ~20% reduction in precision the decimal ieee754 actually works in base 1000.
“Base 10 only resolves the multiples of 1/10 that binary can’t represent”
That is quite convenient, since humans almost always work in decimals.
I have yet to see a currency that is not expressed in the decimal system.
I have yet to see an order form that does not take its quantities in the decimal system.
In fact, if there’s any type that we do not need, it’s binary floating point, i.e. what programmers strangely call “float” and “double”.
Yes, which is my point, there are lots of systems for which base 10 is good for humans, but that floating point in any base is inappropriate.
Every use case for floating point requires speed and accuracy. Every decimal floating point format is significantly more expensive to implement in hardware area, and is necessarily slower than binary floating point. The best case we have for accuracy is ieee754’s packed decimal (or compressed? I can’t recall exactly) which takes a 2.3% hit to precision, but is even slower than the basic decimal form which takes a 20% precision hit.
For real applications the operations being performed typically cannot be exactly represented in base 10 (or 1000) or base 2, so the belief that base 10 is “better” is erroneous. It is only a very small set of cases where a result would be exactly representable in base 10 where this comes up. If the desire is simply “be correct according to my intuition” then a much better format would be base-30, which can also represent 1/(3^n) correctly. But the reality is that the average precision is necessarily lower than base-2 for every non-power of 2 base, and the performance will be slower.
Floating point is intended for scientific and similar operations which means it needs to be as fast as possible, with as much precision as possible.
Places where human decimal behaviour is important are almost universally places where floating point is wrong: people don’t want their bank or order systems doing maths that says x+y==x when y is not zero, which is floating point does. That’s because people are dealing with quantities that generally have a minimum fractional quantity. Once you recognize that, your number format should become an integer count of that minimum quantity.
For currencies, you can just use integers, floats are not meant for that anyway. Binary is the most efficient to evaluate on a computer.
Yes, for currencies, you can use integers. Who would want to say
x * 1.05
when they could saymultFixPtDec(x, 105, 2);
To some extent, this is why we use standards like IEEE 754. Some of us remember the bad old days, when every CPU had a different way of dealing with things. 80 bit floats for example. Packed and unpacked decimal types on x86 for example. Yay, let’s have every application solve this in its own unique way!
Or maybe instead, let’s just use the standard IEEE 754 type that was purpose-built to hold decimal values without shitting itself 🤷♂️
[minor edit: I just saw both my wall of text replies were to u/cpurdy which I didn’t notice. This isn’t meant to have been a series of “target cpurdy” comments]
I mean, sure if you have a piss poor language that doesn’t let you define a currency quantity it will be annoying. It sounds like a poor language choice if you writing something that is intended to handle money, but more importantly, using floating point for currency is going to cause much bigger problems.
And this has nothing to do with ieee754, that is merely a specific standard detailing how the storage bits for the format work, the issue is fundamental to any floating point format: floating point is not appropriate to anything where use are expecting exact quantities to be maintained (currencies, order quantities, etc) and it will bite you.
So as a heads up assuming you’re complaining about x87’s 80bit floats: those are ieee754 floating point, and are the reason ieee754 exists: every other manufacturer said the ieee754 could not be implemented efficiently until intel went and produced it. The only issue is that being created before finalization of the ieee754 specification it uses an explicit 1-bit which turns out to be a mistake.
You’ll be pleased to know ieee754’s decimal variant has packed and unpacked decimal formats - unpacked taking a 20% precision hit but being implementable in software without being catastrophically slow, and packed having only a 2.3% precision hit but being pretty much hardware only (though to be clear as I’ve said elsewhere, still significantly and necessarily slower than binary floating point)
If you are hell bent on using an inappropriate format for your data then maybe decimal is better, but you went wrong when you started using a floating point representation for values that don’t have significant dynamic range where gaining and adding value due to precision limits is not acceptable.
No worries. I’m not feeling targeted.
C. C++. Java. JavaScript.
Right there we have 95% of the applications in the world. 🤷♂️
How about newer languages with no decimal support? Hmm … Go. Rust.
Other than it actually specifies a standard binary format, operations, and defined behaviors thereof for decimal numbers.
Yes, there are special carve-outs (e.g. defining “extended precision format”) in IEEE754 to allow 8087 80-bit floats to be legal. That’s not surprising, since Intel was significantly involved in writing the IEEE754 spec.
I’ve implemented IEEE754 decimal with both declet and binary encoding in the past. Both formats have the same ranges, so there is no “precision hit” or “precision difference”. I’m not sure what you mean by packed vs unpacked; that seems to be a reference to the ancient 8086 instruction set, which supported both packed (nibble) and unpacked (byte) decimal arithmetic. (I used both, in x86 assembly, but probably not in the last 30 years.)
I really do not understand this. It is true that IEEE754 floating point is very good large dynamic ranges, but that does not mean that it should only be used for values with a large dynamic range. In fact, quite often IEEE754 is used to deal with values limited between zero and one 🤷♂️
C++:
You can also do similar in rust. I did not say “has a built in currency type”.
You can also add one to python, or a variety of other languages. I’m only partially surprised that Java still doesn’t provide support for operator overloading.
No. It defines the operations on floating point numbers. Which is a specific numeric structure, and as I said one that is inappropriate for the common cases where people are super concerned about handling 1/(10^n) accurately.
I had to go back and re-read the spec, I misunderstood the two significand encodings. derp. I assumed your reference to the packed and unpacked was those.
On the plus side, this means that you’re only throwing out 2% of precision for both forms.
No, I mean the kind of things that people care about/need accurate representation over multiples 1/(10^n) do not have dynamic range, fixed/no-point are the correct representation. So optimizing the floating point format for fixed point data, instead of the actual use cases that have widely varying ranges (scientific computation, graphics, etc)
There is a huge dynamic range between 0 and 1. The entire point of floating point is that all numbers can be represented as a value between [1..Base) with a dynamic range. The point I am making is that the examples where decimal formats is valuable do not need that at all.
What is the multiplication supposed to represent? Are you adding a 5% fee? You need to round the value anyway, the customer isn’t going to give you 3.1395 dollars. And what if the fee was 1/6 of the price? Decimals aren’t going to help you there.
It never ceases to amaze me how many people really work hard to avoid obvious, documented, standardized solutions to problems when random roll-your-own solutions can be tediously written, incrementally-debugged, and forever-maintained instead.
Help me understand why writing your own decimal support is superior to just using the standard decimal types?
I’m going to go out on a limb here and guess that you don’t write your own “int”, “float”, and “double”. Why is decimal any different?
This whole conversation seems insane to me. But I recognize that maybe I’m the one who is insane, so please explain it to me.
No, I’m saying that you don’t need a decimal type at all. If you need to represent an integral value, use an integer. If you want to represent an approximation of a real number, use a float. What else would you want to represent?
I would like to have a value that is a decimal value. I am not the only developer who has needed to do this. I have needed it many times in financial services applications. I have needed it many times in ecommerce applications. I have needed it many times in non-financial business applications. This really is not a crazy or rare requirement. Again, why would you want to use a type that provides an approximation of the desired value, when you could just use a type that actually holds the desired value? I’m not talking crazy, am I?
What do you mean by “a decimal value”? That’s not an established mathematical term. If you mean any number that can be expressed as m/10ⁿ for some integers m, n, you need to explain precisely why you’d want to use that in a real application. If you mean any number that can be expressed as m/10ⁿ forsome integer m and a fixed integer n, why not just use an integer?
My proposal is that we switch to a base 30 floating point format, and that could handle a 1/6th fee :D :D :D
You’re almost there. https://en.wikipedia.org/wiki/Sexagesimal
Being able to say
x * 1.05
isn’t a property of the type itself, it’s just language support. If your language supports operator overloading you could use that syntax for fixed point too.Oh, you are using a language with fixed point literals? I have (in the past). I know that C#/VB.NET has its 128-bit non-standard floating point decimal type, so you’re not talking about that. Python has some sort of fixed point decimal support (and also floating point decimal). What language are you referring to?
You don’t need to. Strings are a good substitute
For Kotlin it doesn’t really even matter what the left operand is
https://pl.kotl.in/7FDdqQdSo
So your idea is to write your own custom decimal type? And that is somehow better than using an international well-established standard IEEE-754?
I think Kotlin is a nice language, and it’s cool that it allows you to write new classes, but being forced to build your own basic data types (”hey look ma! I invented a character string!”) seems a little crazy to me 🤷♂️
The idea is that the type represents an underlying standard as well as its defined operations. You don’t need native support for a standard in order to support said standard
Edit:
I was giving an example about ergonomics and language support rather than using an opaque dependency
I find that I have more time for whimsy when I’m not fielding the umpteenth “what was the purpose of omega star again?” request. I’m not at work for cuteness–I’m there to get paid, support my family, and reduce the annoyance of the lives of my coworkers.
There’s also the unfortunate thing about whimsy: team composition changes over time, and what you find funny today may not be funny later to other folks. Consider the issues you might run into with:
fargo
as the name for a service for removing user objectsgestapo
for the auditing systemdallas
as a name for the onboarding service (since everybody does it)kali
as a name for a resource management and quota enforcement engineminiluv
for your customer service back-office suitehyrda2
becausehydra
sucks but version 2 needs to coexist for a while yet with itQuetzalcoatl
after a character from that silly anime Kobayashi’s Dragon Maid (bonus points if you are concerned about second-hand cultural appropriation)fido
(or whatever your dog’s name is) for the fetching service might not be so pleasant after the namesake is sunset from production2consumers1queue
for a data ingest worker pool probably isn’t gonna go over well if anybody on staff has any ranks inknowledge (cursed)
And so on and so forth. I’m put in mind of an old article by Rachel Kroll talking about some logging service or format that was named rather blatantly in reference to either a porno or a sexual act–while this might be a source of chuckles for the team at launch, later hires may object.
As an industry we haven’t gotten over the whimsy of blacklists and whitelists, master and slave, female and male connectors–or hell, even the designation of “user”. If we can’t handle those things, what chance do we think we have with names that seem entertaining in the spur of the moment?
If you still doubt me, consider that AWS seems to engage in both conventions. Which is the easier product to identify, Kinesis or Application Load Balancer? Athena or Secrets Manager? IOT Core or Cognito?
~
I’ll grant that for hobby projects, sure, go nuts. I enjoy thematic character names from anime for my home lab. I used to use names from a particularly violent action movie for cluster nodes.
I’ll also grant that for marketing purposes (at least at a project level) it isn’t bad to have a placeholder name sometimes while things are getting hashed out–though those names often stick around and end up serving as inside baseball for folks to flaunt their tenure.
Lastly, I’ll totally grant that if you by design are trying to exclude people, then by all means really just go hog wild. There are all kinds of delightfully problematic names that function like comic sans to filter folks. Just don’t be surprised if you get called on it.
Meh. The problem with this is that doing gratuitously offensive stuff, like deliberately making your presentations harder to look at, also attracts a bunch of people who think being gratuitously offensive is, like, the coolest thing ever. And when those people find a home in the tech world they soon set about being offensive in the wider community.
Having helped moderate a once fairly significant FOSS thing, I’m pretty convinced that the assholery-positive branding is a bad thing for all of us. It breeds a culture of assholery that we all have to live with no matter where we are.
With all of that said, some cute names don’t concern any sensitive subjects, so I feel like you’re tearing down a straw man, or at least the more strawy half of a composite man. At a previous job we created a service called “counter”, which did have a serious job, but we liked to describe it to management—mostly truthfully—as just being for counting things. You know, like, one apple, two apples… I don’t know if this is funny in anyone’s mind but mine, but the name certainly wasn’t chosen to be a useful description.
It is not only about being offensive.
Homebrew seem to be using beer brewing names and metaphors throughout. As someone who doesn’t know brewing beer nothing makes sense there. It feels to me like they’re taking a subject I know something about (packaging software) and deliberately make it obscure by renaming everything.
I’m similarly put off Kubernetes. Why invent a whole new vocabulary to name existing concepts? I don’t care enough about Kubernetes to try deciphering their jargon.
If you take an existing domain and change the names of all the things then anyone wanting to participate has to relearn everything you’ve made up (poorly in most cases.)
It makes interacting with you like speaking in a second language you have little command of.
EDIT: Just pause for a second and imagine reading a codebase where classes, functions and variables all have cute names…
I think cutesy naming has its place, but I agree that sub-naming of cutesy things (e.g. cheese shop, wheels, eggs from Python packaging) is bad. Your cutesy name should just be a pun with a more or less deducible relationship to the thing it does (e.g. Celery feeds RabbitMQ). You can have jokes in the documentation, but no joke nomenclature.
TIL! I worked at a company where we used Celery – but with Redis as the backing store – for years and I never made this Celery/RabbitMQ connection before.
I don’t know that I disagree in general, either with you or with friendlysock’s comment. I was responding to a specific thing in it that I found significant. If you want to talk about naming more broadly, know that—at least from my perspective—people don’t all engage with names in the same way, and the purpose of names is situational, learned (think of mathematician vs programmer attitudes to naming variables), and at least in some cases a tradeoff between the interests of the various different people who will be using the name. So I don’t think it’s very useful to have a blanket opinion.
Indeed I agree with you. I do seem to have grossly misread your comment and replied to it out of context, my apologies.
Edgelord Simon Peyton Jones
So much this, and also:
The joke is not going to hold up. It probably wasn’t the funny the first time except to you and maybe one other person, and it’s certainly not going to stand the test of time. I can count on one hand the number of “jokes” in comments and names I’ve come across that have actually made me laugh.
You might think you are adding more “whimsy” into a drab world… in fact, you are probably inducing eye rolls for the decade of programmers who will have to maintain your work.
Okay but I’m just going to say that
2consumers1queue
is amazing.I am more partial to
1cookie1jar
but that’s also cool.Honestly, I think this is very easy to avoid if you have a diverse team and take a minimum amount of care in choosing a name. These examples are valid, but I think they are clearly influenced by the insularity and whiteness of software developers.
I’ve built a series of services over the past few years, and my preference has been animal names.
They were not chosen at random, they were designed to be mnemonic.
While I take your point about AWS, Google does the opposite approach and names everything generically, which makes searching for them a pain in the ass. Also, I think there are distinctly different tradeoffs involved in choosing internal names vs external product names.
As a practical matter, early-stage startups and small companies do not tend to have diverse teams. Remember, naming is an issue that happens with a team size of 1 and which impacts a team size of…however many future selves or people work on a codebase.
You use
Coyote
as a harmless example (because ACME like in the coyote and roadrunner cartoons, right?) but similarly banal things pulled from the cartoons likegonzales
for a SPDY library in this day and age cannot be guaranteed to be inoffensive. Even if you take care for today’s attitudes there is no guarantee of tomorrow.On a long enough timeline, anything not strictly descriptive becomes problematic to somebody (and if you don’t like where the industry is heading in that regard, well, that ship has sailed).
The less diverse your team is, the more care you should take.
I think this only buttresses my point: Gonzalez is clearly a racial/ethnic stereotype. I would never even think of choosing that. This is not rocket science!
Whereas using a purely descriptive name is much more likely to become problematic on a shorter timeline for the reasons stated in the article.
According to Wikipedia, even though Speedy Gonzales is clearly a problematic ethnic stereotype, there has been a thing where the Mexican community has decided they like him, so he’s been uncanceled. Who can predict these things? https://en.wikipedia.org/wiki/Speedy_Gonzales#Concern_about_stereotypes
Gonzales was once taken off the air out of racial concerns. Hispanic groups campaigned against this decision because Speedy was an a cultural icon and a hero. Eventually he went back on air
Assuming a person or demographics’s view point is it’s own danger. I am not innocent in this regard
I’m well aware of the history there, never fear. That’s why I used him as an example: using that name may annoy well-meaning people, changing that name may annoy somebody with that Hispanic background.
If we’d gone with the boring utilitarian name of
spdy-client
, instead of being cutesy, we could’ve sidestepped the issue altogether.Sadly, that is not the direction well-meaning people influencing our industry have taken. So, in the meantime, I suggest boring anodyne utilitarian names until the dust settles.
Also sorry for my wording and tone. I’m not happy about how I wrote that
I’d argue that such metaphors have always been viewed as more on the descriptive side of the spectrum than whimsical or cute. In fact the idea that these terms are descriptive and in some sense “objective” is the most common defense I’ve seen of keeping such terms around, not that people enjoy them. I didn’t have to have anyone explain to me the meaning of master/slave replication when I first heard that term, the meaning is very intuitive. That’s an indictment of our culture and origins, not of metaphor.
My point is not that cute names are always great, or that it’s easy to avoid their pitfalls. It’s not. But I think holding space to be playful with language is often more generative than trying to be dry and descriptive. And it’s when we think we’re at our most objective that our subjectivity can lead us furthest astray.
I think you’re completely missing that every product and open source project that you simply use will have such a name (maybe except OBS), so why is it different for local homegrown stuff?
For the same reason that brand names can be “Amazon” or “Apple” but we don’t start renaming “computers” or “shopping websites” to something else. OSS projects exist in the same space as “brands” – typically a competitive space in which there are multiple competing solutions to the same problem. In that context, differentiating serves a purpose. However, it also has a cost: what your product does needs to be taught and remembered.
It’s possible that at a very large company some of those same forces take effect, but at small and medium companies they don’t. There, following the same principle would be like insisting everyone in the house call the kitchen sink “waterfall”. Why would I do that? “kitchen sink” works perfectly.
I dunno,
left-pad
is pretty self-explanatory, as aresqlite
andkernel-based virtual machine
andlogstash_json
andopen asset importer
andlib_png
.There are people using clear, jargon-free naming conventions.
So I guess we slightly differ on what cute and descriptive mean. sqlite has sql in it, descriptive, but -lite gives it a double meaning and makes it unique.
logstash_json is a horrible name, imho - because you could theoretically do the same project in many languages, it could be for logstash itself, or some intermediary product. (I don’t remember the specific one in question). Also libpng is very old and the assumption would be “this is the most popular PNG image library, written in C”, maybe because of the (weakly held) naming convention. These days we get many more new projects per year, but in theory a PNG lib in Erlang/Ruby/Brainfuck could also be called libpng, it’s just that the name was taken.
Maybe I am completely wrong here, but I understood the OP’s post more as “don’t use bland descriptive, pidgeonholed names” and you argue more over “don’t do cute names” - so maybe it’s a middle ground.
And yes, I still remember when a coworker wanted to call 2 projects “Red Baby” and “Blue Frog” or something and no one had any clue why, he couldn’t explain it, and we said: Why would one be red and one be blue?
logstash_json
has the project slug “Formats logs as JSON, forwards to Logstash via TCP, or to console.”. That’s roughly what I’d expect from the name, something to do with Logstash and something to do with JSON.libpng
is…“libpng is the official PNG reference library.” Sure, a slightly better name would’ve been “png_ref_implementation” or whatever, but again, the name tells me what to expect.sqlite
“implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine. “ So, you know, a SQL thingy but not large. Light, if you would. Again, maybe the name could mention the standalone nature, but that’s neither here nor there.I think that bland, descriptive names are in fact exactly the right answer.
Again, examples from the other side:
nokogiri
is a library for Japanese saws…no, wait, it’s for parsing XML and HTML.Angular
is a library for angles…no, wait, it’s an HTML templating and data binding library (among other things).Beautiful soup
is a library about soup…no, wait, another HTML/XML munging library.Ristretto
is a library for short shots of espresso…no, wait, it’s a caching library.kubernetes
is some tool for pilots…wait, no, it’s container orchestration and provisioning.Prometheus
is tool for giving fire to mortals…wait, crap, it’s a time series DB and monitoring setup.These names do not enhance understanding.
He’s obviously being defensive, but he has a good point about considering other types of safety than just memory. For example, languages without something like RAII don’t have a good way to enforce the cleanup of resources in a timely way — you have to remember to use optional constructs like “defer” to call cleanup code, otherwise the cleanup won’t happen until the GC decides to finalize the owning object, or maybe never. The arbitrary nature of finalizers has been a pain point of Java code for as long as I can remember, when working with any resource that isn’t Pure Java(tm).
Part of the problem though is that: a) That is a deflection from the entire point of the NSA thing that Stroustrup is ostensibly replying to, which is that almost all serious safety problems are memory safety problems of some kind, which C++ can not seriously mitigate b) The ‘other forms of safety’ that Stroustrup talks about in the linked letter and positions as being better without actually explicitly arguing for it (what he calls ‘type-and-resource safety’) are also things that C++ just can fundamentally never do - the linked documents are about as serious an approach to getting the described properties for C++ as the smart pointer work was about getting memory safety for C++.
Like, C++ doesn’t have memory safety (and also some related things like ‘iterator invalidation safety’) and fundamentally cannot get it without massively breaking changes, and (specifically) the lack of temporal memory safety and aliasing safety means that their approaches to ‘type-and-resource safety’ will fundamentally do essentially nothing.
This is part of a long pattern of Stroustrup trying to stop any possibility of progress on safety by (amongst other things) using his name, reputation, and any position he is able to get to push for the diversion of effort and resources into big and difficult projects of work that will look like progress but fundamentally cannot ever achieve anything good.
I would argue that memory safety is not a problem of the C++ language, it’s a problem of implementations. Real Soon Now[1], my team is going to be open sourcing a clean slate RTOS targeting a CHERI RISC-V core. The hardware enforces memory safety, the key RTOS components are privilege separated and the platform has been an absolute joy to develop.
Languages like Rust have a stronger guarantee: they check a lot of properties at compile time, which avoids the bugs, rather than simply preventing them from being exploitable. This comes with the caveat that the only data structure that you can express is a tree without dipping into unsafe (either explicitly or via the standard library) and then you need to reason about all of the ways in which those unsafe behaviours interact, without any help from the type system. The Oakland paper from a while back that found a couple of hundred CVEs in Rust crates by looking for three idioms where people misuse things that hide unsafe behind ‘safe’ interfaces suggests that people are not good at this.
The other problem that we’ve seen with Rust is that the compiler trusts the type system. This is fine if all of the code is within the Rust abstract machine, but is a nightmare for systems that interact with an adversary. For example, we saw some code that read a Rust enumeration from an MMIO register and checked that it was in the expected range. The compiler knew that enumerations were type safe so elided the check, introducing a security hole. The correct fix for this is to move the check into the unsafe block that reads from the MMIO register, but that’s the kind of small detail that’s likely to get overlooked (and, in code review, someone may well say ‘this check isn’t doing anything unsafe, can you move it out of the unsafe block?’ because minimising the amount of unsafe code is normally good practice). We need to check a bunch of things at API boundaries to ensure that the caller isn’t doing anything malicious and, in Rust, all of those things would be things that the compiler would want to assume can never happen.
We will probably rewrite a chunk of the code in Rust at some point (once the CHERI support in the Rust compiler is more mature) because there are some nice properties of the language, but we have no illusions that a naive Rust port will be secure.
[1] I think we have all of the approvals sorted now…
Heh, Mickens was right – you can’t just place a LISP book on top of an x86 chip and hope that the hardware learns about lambda calculus (or, in this case, type theory…) by osmosis :-).
This is one of the things I also struggled with back when I thought I knew enough Rust to write a small OS kernel and I was a) definitely wrong and b) somewhat disappointed. I ran into basically the same problem – reading an enum from a memory-mapped config register. As usual, you don’t just read it, because some ranges are valid, some are reserved, some are outright invalid, and of course they’re not all consecutive ranges, so “reading” is really just the happy ending of the range checks you do after a memory fetch.
At the time, I figured the idiomatic way to do it would be via the
TryFrom
trait, safely mapping config register values to my enum data type. The unsafe code block would read a word and not know/care what it means, then I’d try build the enum separately from that word, which was slower and more boilerplatey than I’d wished but would prevent the compiler from “helping” me along. That looked cool both on paper and on screen, until I tried to support later revisions of the same hardware. Teaching it to deal with different hardware revisions, where valid and reserved ranges differ, turned out to be really stringy and more bug-prone than I would’ve wanted.My first instinct had been to read and range-check the values in the unsafe block, then build the enum
From
that, which was at least slightly faster and more condensed (since it was guaranteed to succeed) – or skipenum
erating values separately altogether. However, it seemed that was just safety theater, as the conversion was guaranteed to succeed only insofar as the unsafe check was right, thus reducing the whole affair to C with a very verbose typecast syntax.Frankly, I’m still not sure what the right answer would be, or rather, I haven’t found a satisfactory one yet :(.
Haha great quote … the way I phrase it is that is “When models and reality collide, reality wins”
Type systems are models, not reality … I see a lot of solipsistic views of software that mistake the map for the territory
Previous comment on “the world”: https://lobste.rs/s/9rrxbh/on_types#c_qanywm
Not coincidentally, it also links to an article about interfacing Rust with hardware
It’s hard to say without looking a specific example, but a common trap with Rust
enum
s is that often time you don’t want an enum, you want an integer with a bunch of constants:I may be misunderstanding some details about how this works, but in the context of interfacing with the underlying hardware I think I generally want both: a way to represent related values (so a
struct Flag(u8)
with a bunch of constant values) and anenum
erated set of valid flag values, so that I can encode range checks inTryFrom
/TryInto
. Otherwise, if I do this:where
I will, sooner or later, write
get_flags
in terms of reading a byte from a corrupted flash device and I’ll wind up trying to writeFlag(42)
to a config register that only takesFlag::Foo
orFlag::Bar
.Having both means that my config read/write chain looks something like this: I get a byte from storage, I build my
enum Flag
instance based on it. If that worked, I now know I have a valid flag setting that I can pass around, moduloTryFrom<u8>
implementation bugs. To write it, I hand it over to a function which tl;dr will turn my flags into anu8
and yell it on the bus. If that function worked, I know it passed a valid flag, moduloTryInto<u8>
implementation bugs.Otherwise I need to hope that my
read_config
function checked the byte to make sure it’s a valid flag, and that myset_config
function checked the flag I got beforebus_write
ing it, and I do not want to be that optimistic :(.That’s cool. I’m quite excited for CHERI. My question is this - when you do run into a memory safety issue with CHERI what is the dev experience? In Rust you get a nice compiler error, which feels much “cheaper” to handle. With CHERI it feels like it would be a lot more expensive to have these bugs show up so late - although wayyyyyyy better than having them show up and be exploitable.
For sure. Rudra is awesome. Unsafe is hard. Thankfully, the tooling around unsafe for Rust is getting pretty insane - miri, rudra, fuzzing, etc. I guess it’s probably worth noting that the paper is actually very positive about Rust’s safety.
My opinion, and what I have observed, is that while there will be unsafety in rust it’s quite hard to exploit it. The bug density tends to be very low, low enough that chaining them together can be tough.
I don’t understand this. What are you referring to with regards to “an adversary”. Did an attacker already have full code execution and then leveraged a lack of check elsewhere? Otherwise if the compiler eliminated the check it shouldn’t be possible to reach that without
unsafe
elsewhere. Or did you do something like cast the enum from a value without checking? I don’t really understand.I’m just not understanding who this attacker is.
It’s all run-time trapping. This is, I agree, much worse than catching things at compile time. On the other hand, running existing code is a better developer experience than asking people to rewrite it. If you are writing new code, please use a memory-safe (and, ideally, type-safe language).
One of the problems with Rust is that all non-Rust code is intrinsically unsafe. For example, in our model, we can pull in things like the FreeRTOS network stack, mBedTLS, and the Microvium JavaScript VM without having to rewrite them. In Rust, any call to these is unsafe. If an attacker compromises them, then it’s game over for Rust (this is no different from C/C++, so Rust at least gives you attack-surface reduction).
If a Rust component is providing a service to untrusted components then it can’t trust any of its arguments. You (the programmer) still need to explicitly check everything.
This case didn’t have an active adversary in software. It had an attacker who could cause power glitches that caused a memory-mapped device to return an invalid value from a memory-mapped register. This is a fairly common threat model for embedded devices. If the out-of-range value is then used to index something else, you can leverage it to gain memory corruption and possibly hijack control flow and then you can use other paths to get arbitrary code execution.
Everyone else who provides any code that ends up in your program, including authors of libraries that you use. Supply chain vulnerabilities are increasingly important.
For sure. Mitigations like CHERI are critical for that reason - we can’t just say “well you should have used Rust”, we need practical ways to make all code safer. 100%.
So basically the attacker has full code execution over the process. Yeah, unless you have a virtual machine (or hardware support) I don’t think that’s a problem you can solve in Rust or any other language. At that point the full address space is open to the attacker.
This sounds like rowhammer, which I can’t imagine any language ever being resistant to. That has to happen at a hardware level - I think that’s your point? Because even if the compiler had inserted the check, if the attacker here can flip arbitrary bits I don’t think it matters.
For sure, and I think perhaps we’re on the same page here - any language without a virtual machine / hardware integration is going to suffer from these problems.
That’s the attacker’s goal. Initially, the attacker has the ability to corrupt some data. They may have the ability to execute arbitrary code in some sandboxed environment. They are trying to get arbitrary code outside of the sandbox.
You get equivalent issues from converting an integer from C code into an enumeration where an attacker is able to do something like a one-byte overwrite and corrupt the value.
Typically, attacks start with something small, which can be a single byte corruption. They then chain together exploits until they have full arbitrary code execution. The problem is when the Rust compiler elides some of the checks that someone has explicitly inserted defensively to protect against this kind of thing. Note that this isn’t unique to Rust. C/C++ also has this problem to a degree (for example, eliding NULL checks if you accidentally dereference the pointer on both paths) but it’s worse in Rust because there’s more in type-safe Rust that the language abstract machine guarantees in C.
I don’t really agree with this premise but that’s fine.
I’m confused, you mean copying the int into a rust enum too narrow for it?
Are you referring to checks at the boundary, or checks far behind it?
No, the flow is a C function returning an enumeration that you coerce into a Rust enumeration that holds the same values. An attacker is able to trigger a one-byte overwrite in the C code that means that the value returned is not actually a valid value in that enumeration range. The Rust programmer doesn’t trust the C code and so inserts an explicit check that the enumeration is a valid value. The Rust compiler knows that enumerations are type safe and so elides the check. Now you have a way for an attacker with a one-byte overwrite in C code to start a control-flow hijacking attack on the Rust code.
Checks in the trusted (Rust) code, outside of
unsafe
blocks.Nit: moving code out of an unsafe block will never affect its semantics - the only thing it might do is stop the code from compiling.
Unsafe is a magic keyword that’s required when calling certain functions, dereferencing raw pointers, and accessing mutable statics (there might be a few other rare ones I’m forgetting). Beyond allowing those three operations to compile, it doesn’t affect semantics; if a statement/expression compiles without an unsafe block (i.e. it doesn’t use any of those three operations), wrapping it in an unsafe block will not change your program.
The correct fix here is to check the value is within range before casting it to the enum (incidentally, an operation that requires an unsafe block).
All that being said, your broader point is true: Rust’s stricter rules mean that it may well be easier to write undefined behavior in unsafe Rust than C.
Can you share a link to that paper?
I misremembered, it was at SOSP, not Oakland (it was covered here). The title was:
Rudra: Finding Memory Safety Bugs in Rust at the Ecosystem Scale
Thanks! That one I do remember 😄
Does the compiler at least emit a warning like “this comparison is always true” that could signal that one’s doing this incorrectly?
(Tracing) gc has no trouble with actual graphs, and still prevents all those nasty bugs by construction.
Yes—I am still waiting for capability-safety to be table stakes. Basically no one should get the ‘unsafe’ god-capability.
But it does have problems with tail latency and worst-case memory overhead, which makes it unfeasible in the kind of scenarios where you should consider C++. If neither of those are constraints for your problem domain, C++ is absolutely the wrong tool for the job.
Unfortunately, in Rust, core standard-library things like
RC
depend on unsafe and so everything would need to hold the capability to performunsafe
to be able to pass it down to those crates, unless you have a compile-time capability model at the module level.Unsafe can be switched of at the module level and the module is indeed also the boundary of unsafe in Rust.
A mistake with unsafe may be triggered from the outside, but a correct unsafe implementation is well-encapsulated. That very effectively reduces the scope of review.
I basically agree with you! I haven’t been aware of these tendencies of his, but I’m not surprised.
But I think the types of safety provided by RIAA are valuable too. My day-job these days is mostly coding in Go and I miss RIAA a lot. Just yesterday I had to debug a deadlock produced by a high-level resource issue (failure to return an object to a pool) that wouldn’t have occurred in C++ because I would have used some RIAA mechanism to return it.
Thank you. I’m soooo sick of seeing “but Rust doesn’t solve all forms of safety so is it even safe?”. “Rust is safe” means “Rust is memory safe”. That’s a big deal, memory safety vulnerabilities are highly prevalent and absolutely worst-case.
The whole post by him is really ignorant.
That would have to be heavily qualified to a domain – otherwise I’d say it’s just plain untrue.
String injection like HTML / SQL / Shell are arguably worse problems in the wide spectrum of the computing ecosystem, in addition to plain mistakes like logic errors and misconfiguration.
As far as I can tell, none of these relate to memory safety:
https://www.checkpoint.com/cyber-hub/cloud-security/what-is-application-security-appsec/owasp-top-10-vulnerabilities/
The way this PR was written made it almost seem like a joke
and
To me this read a lot like satire poking fun at the Rust community. Took me some digging to realize this was actually serious! I personally don’t care what language fish happens to be written in. As a happy user of fish I just really hope this doesn’t disrupt the project too much. Rewrites are hard!
This is what it looks like when someone is self-aware :-)
They looked at the tradeoffs, made a technical decision, and then didn’t take themselves too seriously.
Poe’s Law is strong with this one. Not knowing the author of Fish, I genuinely can’t tell whether the commentary is 100% in earnest, or an absolutely brilliant satire.
Given the almost 6,000 lines of seemingly high quality Rust code, I’m going to say it’s not a joke.
Gotta commit to the bit.
Oh, sure! I meant the explanation in the PR, not the code itself.
Same. After doing some research into the PR though, I’m pretty sure it’s in earnest. XD
For sure! After I looked deeper and found that this person is a main contributor to fish things made more sense. I totally respect their position and hope things go well. I just thought the way it was phrased made it hard to take seriously at first!
The author understands some important but often underappreciated details. Since they aren’t paying anyone to work on the project, it has to be pleasant and attractive for new contributors to want to join in.
It only “has to be” if the project wants to continue development at an undiminished pace. For something like a shell that seems like a problematic mindset, albeit an extremely common one.
Must it?
Fish seldom plays the role of “foundational scripting language”. More often it’s the interactive frontend to the rest of your system. This port enables further pursuit of UX and will allow for features I’ve been waiting for for ages
For something like an interactive shell, I generally feel that consistency beats innovation when it comes to real usability. But if there are features that still need to be developed to satisfy the fish user base, I suppose more development is needed. What features have you been waiting for?
https://github.com/fish-shell/fish-shell/pull/9512#issuecomment-1410820102
There have been multiple maintainer comments over the years in various issues alluding to the difficultly of adding concurrency features to the codebase. e.g. https://github.com/fish-shell/fish-shell/issues/238#issuecomment-150705108
I think that the “Nobody” and “pain” there may have been referring to the dev team, not so much everyone in the world. In that context it’s a little less outlandish a statement.
It’s also not really outlandish in general. Nobody likes CMake. How terrible CMake is, is a common topic of conversation in the C++ world, and C++ itself doesn’t exactly have a reputation for being the language everyone loves to use.
I say as someone who does a whole lot of C++ development and would pick it above Rust for certain projects.
Recent observation from Walter Bright on how C++ is perceived:
From https://forum.dlang.org/post/uhcopuxrlabibmgrbqpe@forum.dlang.org
That’s totally fine with me.
My retirement gig: maintaining and rescuing old C++ codebases that most devs are too scared/above working on. I expect it to be gross, highly profitable, and not require a ton of time.
C programmers gonna have their COBOL programmer in 1999 moment by the time 2037 rolls around.
And yet, it was the ‘language of the year’ from TIOBE’s end-of-year roundup for 2022, because it showed the largest growth of all of the languages in their list, sitting comfortably at position 3 below Python and C. D shows up down at number 46, so might be subject to some wishful-thinking echo-chamber effects. Rust was in the top 20 again, after slipping a bit.
TIOBE’s rankings need to be taken with a bit of a grain of salt, because they’re tracking a lot of secondary factors, OpenHub tracks more objective things and they’re also showing a steady increase in the number of lines of code of C++ changed each month over the last few years.
TIOBE has +/- 50% error margin and even if the data wasn’t unusable, it’s misrepresented (measuring mentions picked by search engine algorithms over a historical corpus, not just current year, not actual usage). It’s so bad that I think it’s wrong to even mention it with “a grain of salt”. It’s a developer’s horoscope.
TIOBE thinks C popularity has halved one year and tripled next year. It thinks a niche db query language from a commercial product discontinued in 2007 is more popular in 2023 than TypeScript. I can’t emphasize enough how garbage this data is, even the top 10. It requires overlooking so many grave errors that it exists only to reinforce preexisting beliefs.
Out of all flawed methods, I think RedMonk is the least flawed one: https://redmonk.com/rstephens/2022/10/20/top20-jun2022/ although both RedMonk and OpenHub are biased towards open-source, so e.g. we may never learn how much Ada DoD actually uses.
My favourite part about the RedMonk chart is that it shows Haskell going out through the bottom of the chart, and Rust emerging shortly afterwards, but in a slightly darker shade of red which, erm, explains a lot of things.
The rationale provided tracks for me as someone who is about to replace an unpopular C++ project at work with Rust. Picking up maintenance of someone else’s C++ project who is no longer at the company vs. picking up someone else’s Rust project have looked very different in terms of expected pain / risk IME.
“Getting better at C++” isn’t on my team’s dance card but “getting better at Rust” is which helps here. Few working programmers know anything about or understand native build tooling these days. I’m the resident expert because I know basics like why you provide a path argument to
cmake
. I’m not actually an expert but compared to most others in my engineering-heavy department I’m as good as it gets. Folks who do a lot of C++ at work or at home might not know how uncommon any thoroughgoing familiarity with C and C++ is getting these days. You might get someone who took one semester of C to say “yeah I know C!” but if you use C or C++ in anger you know how far that doesn’t go.I’m 34 years old and got my start compiling C packages for Slackware and the like. I don’t know anyone under 30 that’s had much if any exposure unless they chose to work in embedded software. I barely know what I’m doing with C/C++ despite drips and drabs over the years. I know enough to resolve issues with native libraries, FFI, dylibs, etc. That’s about it beyond modest modifications though.
tl;dr it’s difficult getting paid employees to work on a C++ project. I can’t imagine what it’s like getting unpaid volunteers to do so.
It does seem weird. We find it easier to hire C programmers than Rust programmers and easier to hire C++ programmers than either. On the other hand, there do seem to be a lot of people that want a project to hack on to help them learn Rust, which might be a good opportunity for an open source project (assuming that you are happy with the code quality of learning-project Rust contributions).
The difficulty is that you need to hire good C++ programmers. Every time some vulnerability or footgun in C++ is discussed, people say it’s not C++’s fault, is just a crappy programmer.
OTOH my experience from hiring at Cloudflare is that it’s surprisingly easy to onboard new Rust programmers and have them productively contribute to complex projects. You tell them not to use
unsafe
, and they literally won’t be able to cause UB in the codebase.You might not, but a lot of people do.
I wrote an tool for myself on my own time that I used often at work. Folks really liked what it could do, there’s not a tool like it, and it handled “real” workloads being thrown at it. Not a single person wanted anything to do with it, since it was written in an esoteric language. I’m rewriting it in a “friendlier” language.
It seems like the Fish team thought it through, weighed risks and benefits, have a plan, and have made good progress, so I wish them the best.
Oo which language?
I’d rather not say, I don’t want anyone to feel bad. It’s sufficient to say, “As of today, not in the TIOBE Index top 20.”
The bigger point is that it was a tool I had been using for over a year, which significantly improved my efficiency and quality of life, and it got rejected for being an esoteric tech, even though I provided executable binaries.
That sucks. Yeah, I don’t mean to ask to hurt anyone’s feelings, I’m just always curious to know what people think are “esoteric”, cuz esoteric on lobste.rs (Factor, J, one of the advent of code langs) is going to be very different than esoteric at my job (haskell, rust).
Same here. As a user, it doesn’t bother me in which language it is written in. They should absolutely pick the language that allows them to be more productive and deliver more. I have been an happy fish user for 13 years, it is a software that proved useful from.day one. And every realease there are clear important improvements, often times new UX additions. I wish them a smoot migration.
If you’re curious about the size of the rewriting project: I ran tokei on the repo and it counted 49k lines of C++ 8k lines of headers 1k lines of CMake (and 57k lines of Fish, so there’s also a lot that won’t need to be rewritten)
They posted this little bit later:
The follow up contains:
Which means I still can’t tell the degree to which he’s joking. The idea that a codebase from 2005 is old is mind boggling to me. It’s not even 20 years old. I’ve worked on a lot of projects with code more than twice that age.
To put things into perspective, 2005 to 2023 is 18 years — that is the entire lifespan of the classic MacOS.
Or, to put things into perspective, the Mac has switches processor architectures twice since the Fish project was started.
Most software projects just rot away in 18 years because needs or the surrounding ecosystems change.
Modern macOS is a direct descendent of NeXTSTEP though, which originally shipped in 1989 and was, itself, descended from 4BSD and CMU Mach, which are older. Most of the GNU tools are a similar age. Bash dates back to 1989.
That’s probably true, but it’s a pretty depressing reflection on the state of the industry. There are a lot of counter examples and a lot of widely deployed software is significantly older. For example, all of the following have been in development for longer than fish:
This is the actual world we live in. This is what people really think.
Why does everyone hate CMake so much?
I find it far easier to understand than Makefiles and automake.
Plus it runs on ancient versions of Windows (like XP) and Linux, which is not something most build systems support. And it mostly “just works” with whatever compiler you have on your system.
Makefiles and automake are a very low bar.
Cargo can’t do 90% of the things that CMake can, but it’s so loved, because most projects don’t need to write any build script at all. You put your files in
src/
and they build, on every Rust-supported platform. You put#[test]
on unit tests, andcargo test
runs them, in parallel. You can’t write your own doxygen workflow, butcargo doc
gives you generated reference out of the box for every project. The biggest criticism Cargo gets about dependency management is that it’s too easy to use dependencies.This convention-over-configuration makes any approach requiring maintaining a DIY snowflake build script a chore. It feels archaic like writing header files by hand.
Why does everyone hate being punched in the face? I find it far more pleasant than being ritually disemboweled.
CMake is three things:
If the things that you want to do are well supported by the core functionality then CMake is fairly nice. If it’s supported by existing packages, then it’s fine. If it isn’t, then extending it is horrible. For example, when using
clang-cl
, I was bitten by the fact that there’s hard-coded logic in CMake that adds the /TC or /TP flags to override the language detection based on the filename and tell it to use C or C++. This made it impossible to compile Objective-C. A few releases later, CMake got support for Objective-C, but I can’t use that support to build the Objective-C runtime because it has logic in the core packages that checks that it can compile and link an Objective-C program, and it can’t do that without the runtime already existing.I’ve tried to use CMake for our RTOS project, but adding a new kind of target is incredibly hard because CMake’s language is really just a macro language and so you can’t add a new kind of object with properties of it, you are just using a macro language to set strings in a global namespace.
I’ve been using xmake recently and, while there’s a lot I’ve struggled with, at least targets are objects and you can set and get custom properties on them trivially.
Only versions no one wants to run anymore (i.e. 3.5 and older).
its an entire set of new things to learn and it generates a makefile so I worry that I’ll still have to deal with the problems of makefiles as well as the new problems cmake brings
the hacker news discussion is horrifying, the lack of humor and the smartassery there is astonishing :/ I really enjoyed the post and I learned something (the setuid bit)
I did not like the post, mainly because the “teacher” character, Cadey, came across more as a shitposter than a teacher in the original version (it was later changed to read better). I was also upset at the lack of historical context in the post, implying that the authors of
sudo
where beyond stupid for picking C over Rust (at least, that’s how I read it—wonder what that says about me).I’m fairly certain the whole thing was a joke and not anything serious.
The choice here to me is not the initial one at the beginning of the project, but the implict one that every project makes everyday
Which would be … ? Switching to anything other than C?
What language do I use? What stack? What toolchain? What testing suite? A fuzzer? Which? Do I use formal verification? What does packaging and deployment look like? Do I have beta builds?
You don’t think about these each day, but you’re still making an implicit choice to go with what you decided before
I’m really happy about this. The implicit mapping from value to template location always bothered me
P.S. I believe the contents of the tree should be
greet/
rather thangreet_html/
Yes, I agree. The html appended seems very explicit. But I suppose that is also good. You might have a
greet
module somewhere else you don’t want confused with the templates directory.Sorry, I meant that I didn’t know where the
_html
suffix came from since the folder is referenced asgreet/
everywhere else in the articleOh yes, I see. Maybe a typo.
Programmers have a long and rich history with C, and that history has taught us many lessons. The chief lesson from that history must surely be that human beings, demonstrably, cannot write C code which is reliably safe over time. So I hope nobody says C is simple! It’s akin to assembly, appropriate as a compilation target, not as an implementation language except in extreme circumstances.
Which human beings? Did history also teach us that operating a scalpel on human flesh cannot be done reliably safe over time?
Perhaps the lesson is that the barrier of entry for an engineering job was way higher 40 years ago. If you would admit surgeons to a hospital after a “become a gutt-slicer in four weeks” program, I don’t think I need to detail what the result would be.
There’s nothing wrong with C, just like there’s nothing wrong with a scalpel. We might have more appropriate tools for some of its typical applications, but iC s still a proven useful tool.
Those who think their security burns will be solved by a gimmick such as changing programming language, are in for a very unpleasant surprise.
Given the number of memory safety bugs that have been found in 40-year-old code, I doubt it. The late ‘90s and early 2000s exposed a load of these bugs because this C code written by skilled engineers was exposed to a network full of malicious individuals for the first time. In the CHERI project, we’ve found memory safety bugs in code going back to the original UNIX releases. The idea that there was some mythical time in the past when programmers were real men who never introduced security bugs is just plain wrong. It’s also a weird attitude: a good work an doesn’t blame his tools because a good work an chooses good tools. Given a choice between a tool that can be easily operated to produce good results and one that, if used incredibly carefully, might achieve the same results, it’s not a sign of a good engineer to choose the latter.
Back then, the C programmers didn’t know about memory safety bugs and the kind of vulnerabilities we have since two decades. Similar, Javascript and HTML are surely two programming languages which are somewhat easier to write than C and doesn’t suffer from the same class of vulnerabilities. However, 20 years ago people wrote code in these two languages that suffer from XSS and other web based vulns. Heck, XSS and SQLi is still a thing nowadays.
What I like about C is that it forces the programmer to understand the OS below. Writing C without knowing about memory management, file descriptors, processes is doomed to fail. And this is what I miss today and maybe @pm in their comment hinted at. I conduct job interviews with people who consider themself senior and they only know the language and have little knowledge about the environment they’re working in.
Yes, and what we have now is a vast trove of projects written by very smart programmers, who do know the OS (and frequently work on it), and do know how CPUs work, and do know about memory safety problems, and yet still cannot avoid writing code that has bugs in it, and those bugs are subsequently exploitable.
Knowing how the hardware, OS (kernel and userspace), and programming language work is critical for safety or you will immediately screw up, rather than it being an eventual error.
People fail to understand that the prevalence of C/C++ and other memory unsafe languages has a massive performance cost: ASLR, Stack and heap canaries, etc and then in hardware: PAC, CFI, MTE, etc all have huge performance costs in modern hardware, are all necessary solely due to the need for the platform to mitigate the terrible safety of the code being run. That’s now all sunk cost of course: if you magically shifted all code today to something that was memory safe, the ASLR and various canaries costs would still be there - if you were super confident your OS could turn ASLR off, and you could compile canary free, but the underlying hardware is permanently stuck with those costs.
Forcing the programmer to understand the OS below could (and can) happen languages other than C. The main reason it doesn’t happen is that OS APIs, while being powerful, are also sharp objects that are easy to get wrong (I’ve fixed bugs in Janet at the OS/API level, I have a little experience there), so many languages that are higher level end up with wrappers that help encode assumptions that need to not be violated.
But, a lot of those low level functions are simply the bottom layer for userland code, rather than being The Best Possible Solution as such.
Not to say that low level APIs are necessarily bad, but given the stability requirements, they accumulate cruft.
The programmer and project that I have sometimes used as a point of comparison is more recent. I’m now about the same age that Richard Hipp was when he was doing his early work on SQLite. I admire him for writing SQLite from scratch in very portable C; the “from scratch” part enabled him to make it public domain, thus eliminating all (or at least most) legal barriers to adoption. And as I mentioned, it’s very portable, certainly more portable than Rust at this point (my current main open-source project is in Rust), though I suppose C++ comes pretty close.
Do you have any data on memory safety bugs in SQLite? I especially wonder how prone it was to memory safety bugs before TH3 was developed.
I think it did. It’s just that the alternative (not doing it) is generally much much worse.
There is no alternative to the scalpel (well, except there is in many circumstances and we do use them). But there can be alternatives to C. And I say that as someone who chose to write a new cryptographic library 5 years ago in C, because that was the only way I could achieve the portability I wanted.
C does have quite a few problems, many of which could be solved with a pre-processor similar to CFront. The grammar isn’t truly context free, the syntax has a number of quirks we have since learned to steer clear from.
switch
falls though by default. Macros are textual instead of acting at the AST level. Everything is mutable by default. It is all too easy to read uninitialised memory. Cleanup could use some more automation, either withdefer
or destructors. Not sure about generics, but we need easy to use ones. There is enough undefined behaviour that we have to treat compilers like sentient adversaries now.When used very carefully, with a stellar test suite and sanitisers all over the place, C is good enough for many things. It’s also the best I have in some circumstances. But it’s far from the end game even in its own turf. We can do better.
I was wondering why the repo owner seemed so familiar!
I don’t think that moving from a language that e.g. permits arbitrary pointer arithmetic, or memory copy operations without bounds checking, to a language that disallows these things by construction, can be reasonably characterized as a gimmick.
This isn’t a great analogy, but let’s roll with it. I think it’s uncontroversial to say that neither C nor scalpels can be used at a macro scale without significant (and avoidable) negative outcomes. I don’t know if that means there is something wrong with them, but I do know that it means nobody should be reaching for them as a general or default way to solve a given problem. Relatively few problems of the human body demand a scalpel; relatively few problems in computation demand C.
That’s a poor analogy.
What we would consider “modern” surgery had a low success rate, and a high straight up fatality rate.
If we are super generous, let’s say C is a scalpel. In that case we can look at the past and see a great many deaths were caused by people using a scalpel, long after it was established that there was a significant differences in morbidity when comparing a scalpel, to a sterilized scalpel.
What we have currently is a world where we have C (and similar), which will work significantly better than all the tools the preceded it, but is also very clearly less safe than any modern safe language.
Some facts about me:
This reminds me of an issue I have with Discord and gmail. Every time I get a Discord invite to my gmail account the link is invalid. Is it possible that Google is accessing my link before me and ruining the invite?
I’m unable to replicate this with an invite sent from outlook to gmail. Is it possible that the invites are expiring? How are the invites sourced?
Edit: Actually, merely viewing the invite isn’t enough to decrease the number of uses on a limited-use invite
They are expiring within 2 minutes of generation. Weird. Thanks for thinking about it.
Slug AA is mediocre. Better to go with sdfs for interactive (to avoid licensing stuff), and then render the vectors on the CPU for final export.
Are there a lot of IP pits in font rendering? How do you know what to avoid?
Not afaik. There’s slug, and there’s loop&blinn, and I think pretty much everything else is unencumbered. And loop&blinn expires in a year or two.