Question for Andrew: how do you make this style work in the presence of multithreading? The advantage with having each object be its own memory allocation and accessed through its own pointer is that you can modify it atomically with a copy + atomic pointer swap, and there’s tricks one can do to do fast memory reclamation without going all the way to a GC.
This is kinda lost when you coalesce all the objects into a single memory allocation, but maybe there’s another way to handle the problem?
Counter-based RNGs make it much easier to calculate skip-ahead states – adding 0,1,2,3,4 is more obvious and faster than two multiplications and an add – so they are a lot easier to vectorize than PCG!
I haven’t looked at counter-based RNGs in detail; I decided a few years ago that PCG is good enough (TM) so I could just use my standard implementation whenever I need some random numbers. But I can’t stop tinkering with it, and occasionally it attacks me with a cunning idea that I need to blog about.
Reading this makes me wonder, if you did have to come up with a system for shared hosting that was easier to manage nowadays, what would it be? It doesn’t feel like fastCGI is the answer, but is there one?
optionally supports persistent processes like FastCGI, but not through sockets and threads
I’d support a single-threaded while (1) server
Or perhaps optionally a fork() server
parses all the crap in HTTP into structured data for you:
URL escaping like %20
URL query params: x=42&y=99
multipart/MIME form POSTS - the format that the browser sends file uploads in, which is very messy
cookies - these have a few fields
…
maybe sandboxed by default, e.g.
database connections are “capability”
persistent disk is a capability – the default is ephemeral disk?
I think it can be useful for newer languages like YSH, Inko, Zig, etc.
They will relieve you of having to write say a multipart/MIME parser, which is very annoying. That can happen in a different process.
Instead you would just use a length-prefixed format for everything, like netstrings
It could also be JSON/JSON8, since those parsers already exist
Actually almost 15 years ago I wrote almost exactly this with https://tnetstrings.info/ , used by the Mongrel 2 web server
My implementation was actually used by a few people, but it wasn’t very good
I still think this is sorely needed… it’s weird that nothing has changed since 1997, but I guess that’s because there’s a lot of money to be made in the cloud, and locking customers in
The 12 factor app by Heroku (linked in post) is probably the only evidence I’ve seen of any hosting provider thinking about protocols and well-specified platforms, in the last 20+ years
If anyone has a nascent programming language that needs “CGI” / web support, feel free to contact me! It would be nice to iron out this protocol with someone who wants to use it
I think it can be deployed easily, because the first step would just be a simple CGI exec wrapper … So it would be a CGI script that execs your program, and hence can be deployed anywhere. This won’t be slow because Unix processes are not slow :-)
I’d say that proxying is fine and that’s what Heroku does, but it’s also easier to write a CGI script than an HTTP server.
It’s not easier to write a FastCGI script than an HTTP server, because you always need a FastCGI library, which is annoying.
But I think we can preserve the good parts of FastCGI in a “CGI version 2”
There is a actually a 3rd dimension I’m thinking of - “lazy” process management, which is similar to the cloud. It’s not just config and the protocol.
Also, WSGI / Rack / PSGI all use CGI-style requests, in process. So there a CGI-like protocol is still very natural for programming languages, more so than HTTP.
Just for refence, CGI has no state, right? I’ve always kinda wondered if there was some trick that could be used that would get us to something kinda like CGI but without paying bootstrap costs per request (though maybe that’s… fine)
Yes, CGI is stateless – it starts a process for every request.
The simplest way to avoid that would be to have the subprocesses run in a while (1) loop as mentioned. So you can reuse your database connections and so forth between requests.
And then a higher level would handle concurrency, e.g. starting 5 of these while (1) processes in parallel, and multiplexing between them.
You will get an occasional “cold start”, which is exactly how “serverless” products work, so that seems fine. Or of course you can always warm them up beforehand.
(FastCGI as deployed on Dreamhost has a weird quirk where it will start 2 processes, but then those 2 processes may each have say 4 threads, which to me seems to make provisioning more complex)
bootstrap costs per [CGI] request (though maybe that’s… fine)
I’m guessing that it’s rare to run into any real-world problems with CGI now that machines have enough memory that they no longer have to hit the disk in order to find the code to run for each request.
It depends on the startup cost of the CGI program. You’ll run into problems at low load if it’s Python or if it’s library-heavy Perl. My link log is a CGI that ingests its entire database of tens of thousands of links on every request, which needed FastCGI when it was Perl; after rewriting with Rust+Serde it is fast enough that I could revert to much simpler CGI. At some point I might have to move it from a flat file to something with indexes…
CGI that ingests its entire database of tens of thousands of links on every request […]
At some point I might have to move it from a flat file to something with indexes
If you use CGI anyway, can’t you embed the links in the binary at compile time? New links => recompile and swap the binary. Should just work, as there’s no state.
I’m starting to believe more and more that the IT security field is a huge scam.
This is based on anecdotal data:
I used to work at German Fintech. We were about to get a big German bank as a customer, they requested “a security audit”, whatever that meant. My startup looking to save as much money as possible went to some random security company in Germany. The security “experts” ran some off-the-shelf vulnerability scanner on our Python/Flask application, and sent us a “list of vulnerabilities.”
We had a bug that if you accessed something else than / or /login/ while being logged-off, you would get a 500 error. This was because your user was None and user.email would raise an AttributeError: 'NoneType' object has no attribute 'email'.
The report was mostly vulnerabilities like:
Possible SQL injection: staging.example.com/phpmyadmin/index.php?statement=INSERT+... returned a 500 error.
Imagine this declined for /wp-admin/, and all the other PHP cliché paths. We fixed the bug, and the bank was happy with our security audit…
I used to work at another firm, which had some automaker as a customer, I mean this is Germany after all. They requested to run an audit on our system, before sending us any data. What was the audit? Sending a phishing email to all the employees. Of course somebody fell for it, so we had to go through phishing training. (There is a great article about how this is fake security: Don’t include social engineering in penetration tests)
My partner worked in a large non-tech firm. Their internal software has no granular ACL. People had global read/write access to everything by default, you just needed to login. If you were to manage to compromise an account, you could wreak total havoc on their systems. They had a dedicated security department, if you didn’t update your computer the latest Windows version when it was suggested to you, you would get a warning sent to your manager…
I was working at a multi-national firm, tech related, and we had to improve a risky system. Users had wide access, and there were severe insider risk. We designed a new solution, it was not fully secure, but it was definitely an improvement over the current state. We wrote down the design, sent it for security review, the security council was like “no! there are there problems, and these problems. You can’t move forward.” We explained that these problems already existed, and we were open to suggestions for solutions. They told us we were responsible for finding solutions, and blocked the project, thus leaving the current, worse situation as an indefinite solution. Basically, it was not about security, it was about bureaucracy and justifying their jobs… But I still needed to reboot my computer every X days for security reasons…
All of this leads me to believe IT security industry, such as CVEs, is entirely bullshit. I’m convinced that the only people doing real security are people like Theo de Raadt and Colin Percival…
Selecting examples of bad approaches doesn’t work. It’s like saying art is a scam because there’s lots of terrible quality work created on Fiverr. It’s sure there, but that doesn’t define the field itself. It’s a problem that companies can’t easily identify good quality security consultants. It’s a problem that PCI, SOX, etc are silly checkbox exercises. It’s a problem that standards are hard to apply to very custom internal solutions. It’s a problem that security testing is hard. But every field will have problems like that - it doesn’t mean the whole field is a scam.
The problem is if (my guess) 90% of people only have the experiences with audits as described then you can try to argue for bad examples, but if it’s the majority it’s a structural problem. And I think it is.
When I worked at a bank we also had mandatory security “checks” (not a proper audit by an external company) and 90% of the findings were simply bogus or non-exploitable, leaning mostly towards ridiculous. The more regulated the market the more it just attracts charlatans who know nothing except to check boxes.
In every industry there are good people doing good work, that doesn’t make the industry not bullshit overall.
In every industry there are good people doing good work, that doesn’t make the industry not bullshit overall.
This! Thank you. This was exactly my point.
One can take the Multi-Level-Marketing industry, as an example, which is basically a scam, and I’m pretty sure one can find a few multi-level-marketing companies which actually focus on selling products instead of running a pyramid scheme of salespeople. But one cannot use these isolated cases to dismiss the scam-ish behavior of MLMs as a whole.
If people have paid attention to the previous stories from Daniel (= the author of this story), he has been trying to fight windmills for years because people are gaming the CVE mechanism for their own benefits. (i.e. if you find a lot of critical vulnerabilities it helps your reputation as a security researcher, so there is an incentive to inflate the score of any vulnerability you find.)
I’m reading these stories, plugging in my anecdotal experiences with the industry, and folks go “no you can’t just throw a rotten industry under the bus, there are some competent people”. I’m sure there are some smart and well-intention people. In fact, I mentioned two: Theo de Raadt and Colin Percival. I’m convinced there are more, but this doesn’t mean the industry is healthy.
if you find a lot of critical vulnerabilities it helps your reputation as a security researcher, so there is an incentive to inflate the score of any vulnerability you find.
I’ve started referring to most CVEs as Curriculum Vitae Enhancers. I can’t even remember the last time we had a legitimate problem flagged by the scanner at work compared to the bogus ones. It makes it very difficult to take any of them seriously.
But every field will have problems like that - it doesn’t mean the whole field is a scam.
Depends on the prevalence of such problems. If a sufficient proportion of the field is such bullshit, then the field is a bullshit.
I’ll add another example, that I found was almost ubiquitous: bogus password management at work:
First login (on Windows of course), they ask me to set up my password. Okay then: correct battery horse staple
“Your password does not respect our complexity metrics”. Okay, let’s try again, Correct battery horse staple.
“Your password does not respect our complexity metrics”. Fuck you, why don’t you tell me what you want?!? But okay, I can guess: Correct battery horse staple1.
“All good”. At last.
1-3 months later: “you must change your password”. That’s it, I’m going Postal.
It’s been some years now that the NIST updated their recommendations to “don’t use complexity rules, just ban the most common password”, and “don’t force password renewal unless there’s an actual suspicion of a breach”. Yet for some reason the people in charge keep with the old, less secure, more cumbersome way of doing things. This is at best incompetence at a global scale. (I mean, okay, I live in the EU, so applicable guidelines may be obsolete, but the larger point remains.)
Oh, password management blowing up in people’s faces is one of my favourite types of best intentions just not surviving contact with the real world.
My favourite is from a few years back, in a place that had a particularly annoying rule. When people were onboarded, they were offered a password that they could not change. They could only change a password when it expired.
Now, all passwords issued to new hires had a similar format, they were something like 8 random letters followed by 4 random numbers. Since you couldn’t change it in the first three months, people really did learn them by heart, it was more or less the only way.
When the time to change them came, the system obviously rejected previous passwords. But it also rejected a bunch of other things, like passwords that didn’t have both letters and numbers, or passwords based on a dictionary word – so “correct battery horse staple” didn’t work, I mean, it had not one, but four dictionary words, so it had to be, like, really bad, right?
Most people wouldn’t type in their “favourite” password (since they knew they had to change it soon), couldn’t type something more rememberable, so they did the most obvious thing: they used exactly the same password and just incremented the number. So if they’d started out with ETAOINSH-9421, their new password would be ETAOINSH-9422.
Turned out that this was juuuust long enough that most people had real difficulty learning it by heart (especially those who, not being nerds, weren’t used to remembering weird passwords). So most of them kept their onboarding sheet around forever – which contained their password. Well, not forever, they kept it around for a few weeks, after which they just forget it in a drawer somewhere.
If you got a hold of one of those – it was conveniently dated and all – you could compromise their password pretty reliably by just dividing the interval since they’d been hired to now by the password change interval, add that to their original password, and there you had it.
It’s been some years now that the NIST updated their recommendations to “don’t use complexity rules, just ban the most common password”, and “don’t force password renewal unless there’s an actual suspicion of a breach”.
I mean even M$ is saying this nowadays. Nevertheless, our (Czech) beloved Cybernetic Security Agency decided to publish following recommendation:
minimum password length is 10 characters,
prohibition of using the same password (last 12 passwords),
maximum password validity period is 18 months,
account lockout after 10 invalid password attempts in a row,
So not only they force expiration, but they also introduce DoS vector. :facepalm:
And they are literally in M$ pocket, running 100% on their stack, cloud, etc…
Microsoft is weird here; they’re forcing my users to enter a pin number which is apparently as good as their complex password; yet its limited to 4-6 characters.
The lord knows how quickly one can blow through the entire numbers keyspace of even a 10 digit number, and numbers arent more memorable than phrases. I’m not sure how this is better security, but they are convinced it is: and market it as such.
PINs on Windows machines are backed by the TPM, which will lock itself down in the event that someone is trying to brute-force the PIN. That’s basically the entire point of those PINs: Microsoft basically says “we introduced an extra bit of hardware that protects you from brute force attacks, so your employees can use shorter, more memorable passwords (PINs)”.
Those PINs are actually protecting a (much longer) private key that is stored in the TPM chip itself. The chip then releases this key only if the PIN is correct. You can read more about the whole system here: Windows Hello.
To add to what @tudorr said: the reason that passwords need to be complex is that the threat model is usually an offline attack. If someone grabs a copy of your password database (this happens. A lot.) then they can create rainbow tables for your hash salt and start brute forcing it. If you’re using MD5, which was common until fairly recently, this cracking is very fast on a vaguely modern GPU for short passwords. If you’re using a modern password hash such as one of the Argon2 variants, you can tune it such that easy attempt costs 128 MiB of RAM (or more) and a couple of seconds of compute, so attacking it with a GPU is quite hard (you’ll be RAM limited on parallelisation and the compute has a load of sequential paths so making it take a second is not too hard, a GPU with 8 GiBs of RAM (cheap now) may be able to do 64 hashes per second, and it takes a long time to crack at that speed. Unless you care about attackers willing to burn hundreds of thousands of dollars on GPU cloud compute, you’re probably fine.
The PIN on Windows and most mobile devices is not used in the same way as the password. It is passed to the TPM or other hardware root of trust as one of the inputs to a key derivation function. This is then used to create a private key. When you log in, the OS sends this and a random number to the TPM. The TPM then generates the key with the PIN and some on-TPM secrets (which may include some secure boot attestation, so a different OS presenting the same PIN will generate a different key) and then encrypts the random number with this key. The OS then decrypts it with the public key that it has on file for you. If it matches the number that was sent to the TPM, you have logged in. Some variants also store things like home-directory encryption keys encrypted with this public key and so you can’t read the user’s files unless the TPM has the right inputs to the KDF and decrypts the encryption key. Even if you have root, you can’t access a user’s files until they log in.
If you compromise the password database with PINs, you get the public key. To log in, you need to create the private key that corresponds to the public key associated with the account. This is computationally infeasible.
The set of PINs may be small, but the PIN is only one of the inputs to the KDF. You need to know the others for the PIN to be useful. The TPM (or equivalent) is designed to make it hard to exfiltrate the keys even if you’re willing and able to decap the chips and attack them with a scanning-tunnelling electron microscope. If you can’t do that, they do rate limiting (in the case of TPMs, often by just being slow. Apple’s Secure Element chips implement exponential backoff, so require longer waits after each failed attempt). If you get three guesses and then have to wait a few minutes, and that wait gets longer every failed attempt, even a 4-digit PIN is fine (assuming it isn’t leaked or guessed some other way, such as being your birthday, but in those cases an arbitrarily long PIN is also a problem).
1-3 months later: “you must change your password”. That’s it, I’m going Postal.
I knew someone that hated that rule, but he figured out that the system would forget his previous passwords after ten changes, so every time he changed his password eleven times, putting back his old password at the end.
Thus (at least partially) defeating the very purpose of the bogus policy to begin with. Oh well, as long as I can just number my password that’s not too bad.
At university I had to change my password on a regular basis. There was no requirement, but my 128 character password would trip up the occasional university system. And the ’ character (or some character) would trip up ASP.NET’s anti-SQL injection protections, and a couple of sites that I had to use very infrequently were ASP.NET. So I would change my password to something simpler, then change it back.
Eventually they instituted a password history thing, or I just got tired of picking a new password, or something like that. I can’t remember. I just remember sitting there trying over and over again to exhaust the history mechanism. I got up to multiple dozens of password changes before I gave up.
Your analogy works better than you think, but your conclusion is wrong.
The art field is absolutely a scam, and what art professionals do (which is different from artists) defines the field, making it a total con. Same with these security “professionals”.
There are real artists and there are real security researchers. They make for an insignificant amount of activity in their respective industries. Clearly a very important part, of course, otherwise the grifters wouldn’t have anything to grift on. But the dynamics of their industries come from the grifters, not from the researchers or the real artists.
Security is a huge problem so there are a lot of vendors and a lot of people in it. It’s a very immature industry and, frankly, the standards for getting into it are extremely low. Most security analysts will have virtually no ability to read or write code, it’s shocking but it’s true. Their understanding of networking isn’t even particularly strong - your average developer probably has a better understanding.
You’re describing the “bad” that I’ve seen a bit of, but it’s quite the opposite of my personal experiences. At Dropbox, when I was on the security team at least (years ago), we didn’t say things like “no” or “that’s your responsibility”. We had to pass a coding test. We built and maintained software used at the company. When we did security reviews we explained risks and how to mitigate, we never said “no” because it wasn’t our place - risk is a business decision, we just informed the process. Lots of companies operate this way but it takes investment.
Unfortunately the need is high and the bar is low, so this is where we find ourselves.
I would not write off the entire security industry other than a few people as a scam.
Some of us do understand networks. I’ve been doing infosec for about 30 years, I do agree that most of the auditors and folks that are analysts tend not to know enough. It’s very sad to see the complete lack of technical competence within the field. Even with supposed standards such as CISSP etc. I find a lack of understanding. I’m working to change that that I teach the younger folks about networking, and intelligent analysis. So I’m doing my bit.
Oh I’ve worked with a ton of security people who know a ton of stuff and are impressively technical. I just mean that the skillset levied at a company can vary wildly. What most security people do bring to the table is information on actual attacks, what attackers go for, an interest in the space, etc. But in terms of specific technical skills the bar is all over the place.
Kinda makes me think computer software business is a huge scam :P The examples read to me as if “business does not care about actual security, but does the bare minimum they can get away with and then is surprised that this isn’t appropriate”
It’s no more BS than almost everything else. Any even moderately sized organization will consist largely of people who are some combination of: completely out of depth, unable to think about nuance, do not care about the result of their actions, selfish and so on. Most people just want to get paid and not get in trouble, and expect the same from other people. It takes a huge determination and wisdom from the leadership to steer any larger group towards desirable outcomes. Governments, corporations, etc are doomed to complete inefficiency for this very reason.
All of this leads me to believe IT security industry, such as CVEs, is entirely bullshit.
It’s unfortunately just not a thing, IMO, that’s well-suited to being commodified, reduced to a set of checklists (though checklists are a helpful tool in the field, to be sure) and turned into an “industry” at all. The people you list are especially good at making exploits harder. There are some people who are doing excellent work at detecting and responding to exploits as well.
Apart from those two areas, which are important, I feel like the big advances to be had are in the study of cognitive psychology (specifically human/machine interaction) but haven’t quite been able to persuade myself to go back to school for that yet and pursue them.
I would argue that checklists being created by people not understanding the subject matter is worse than no checklists. I’ve seen things be objectively made worse because of security check lists.
I think we probably agree, but I’d argue that the problem is that they’re often used to supplant rather than supplement expert judgement.
Two examples I can think of offhand are flight readiness checklists and pre-surgery checklists. Those are often created by managers or hospital administrators (but with expert input) who don’t understand the details of the subject matter the same way as the pilots, mechanics and surgeons who will execute them. And they’ve been shown to reduce errors by the experts who execute them.
What we’re doing with IT security checklists, though, is both having non-experts create them, then having non-experts execute them, then considering them sufficient to declare a system “secure”. A checklist, even created by non-experts, in the hands of someone who well understands the details of the system is helpful. A checklist, no matter who created it, being executed by someone who doesn’t then being signed off by someone who doesn’t, makes things objectively worse.
I was under the impression (mainly from Atul Gawande’s book) that flight checklists and surgical checklists were created mostly by experts. The managers are mostly responsible for requiring that checklists are used.
While I have heard about that book, I haven’t read it. My only experience with flight checklists is from a mechanic’s perspective, so I can’t authoritatively say who created them, but the impression I got was that it was project managers synthesizing input from pilots, mechanics and engineers. Working with hospital administrators, I would definitely argue that the administrators had creative input into the checklists. Not into the technical details, exactly. But they were collecting and curating input with help, because they were responsible for the “institutional” perspective, where the experts who were detailing specific list items were taking a more procedure-specific lens.
The big beneficial input of the managers was certainly requiring that checklists be used at all, and maybe “synthesized” is a better word than “created” in the context of their input to the lists themselves.
I think it’s worth mentioning that air travel is highly standardized and regulated. Commercial flight especially, but even private planes must comply with extensive FAA regulations (and similar outside the US, for most countries).
The world of IT isn’t even close to that level of industrial maturity on a log scale.
That’s a really good point. “Industrial maturity” is the phrase I was grasping for with my first sentence in my first reply, and kind of described but didn’t arrive at.
I think the meaningful difference is that people would push back by actively wrong things on these checklists.
Imagine “poke your finger into the open wound” levels of bad, not just “make sure to wash your hands three times” where someone would say “we used to do two, and it was fine, but ok, it takes me 1/10 the time of the operation ans hurts no one” versus “make sure to install Norton Antivirus on your server, and no we don’t care if you run OpenBSD” - which is “absolutely horrible take on security that would make everything worse even if it was possible”.
I remember one check list where it said “make sure antivirus X is installed” - the problem is that it made every linux box work at 20% speed (we measured) when it scanned git checkouts. So we made sure it was installed and not running. Check.
Sending a phishing email to all the employees. Of course somebody fell for it,
Actually, just sending a phishing to everyone, and then checking who clicked would be a great thing to do. I work for a tiny company and not everyone is that savvy.
At previous $workplace, we had that cheesy warning in Outlook telling you the mail was from someone outside your organization. Of course when the security team ran a phishing test, they disabled that, and people fell for it. I guess because it’s really easy to spoof a sender that looks like they’re internal? If so, what’s the use of the warning in the first place?
Yes. Well, my point is that the security team disabled one part of security theater to sneakily trap people to click on a link in an email that should reasonably have been flagged as outside the organization if the first piece of theater was active.
In my experience it’s way too noisy to be a useful indicator. I’d estimate that 95% of the emails I get have a yellow WARNING: EXTERNAL EMAIL banner at the top. Every email from Jira, GitLab, Confluence, Datadog, AWS, you name it… they’re all considered “external” by the stupid mail system so they all get the warning. People develop something like ad banner blindness and tune this warning out very quickly.
I suppose this is a benefit to my employer self-hosting Jira, GitLab, Confluence, etc. - their emails all go through internal mail servers, and as such don’t have the “EXTERNAL EMAIL” warning on them (unless they’re not legit)
I tested this when I was at Microsoft. The SPF and DMARC records were correctly set up for microsoft.com, yet an email from my own Microsoft address to me, sent from my private mail server, arrived and was not marked as coming from an external sender. The fact that it failed both SPF and DKIM checks was sufficient for it to go in my spam folder, but real emails from colleagues sometimes went there too so I had to check it every few days.
My partner worked in a large non-tech firm. Their internal software has no granular ACL. People had global read/write access to everything by default, you just needed to login. If you were to manage to compromise an account, you could wreak total havoc on their systems. They had a dedicated security department, if you didn’t update your computer the latest Windows version when it was suggested to you, you would get a warning sent to your manager…
There are many similar situations here at $WORK. I guess you could say I am the very security department you speak of.
Hypothetically if I were in this scenario you speak of (I’m not, but I am in similar), the thing is we can convince people to install updates - they’re automatic these days anyways.
But we cannot convince internal dev teams that are understaffed and overworked (and cough under qualified) to create some perfect ACLs in the decade old internal crapware they’ve been handed to maintain.
We tell them it’s vulnerable, add it to the ever-growing risk register, make sure assurance is aware, and move on to the next crap heap.
There are just not enough “resources” (i.e. staff, people) around sometimes.
The paper doesn’t say how big their Bloom filters are. You can’t fit a very large bloom filter in a cache line, certainly not large enough to be useful for speeding up a slow join. As a rough estimate you need about 1 byte of filter for each item for a 1% false positive rate. https://hur.st/bloomfilter/
I think they mean that the signature fits in a cache line. In the source they say
/* The Bloom filter is a Blob held in a register. Initialize it
** to zero-filled blob of at least 80K bits, but maybe more if the
** estimated size of the table is larger. We could actually
** measure the size of the table at run-time using OP_Count with
** P3==1 and use that value to initialize the blob. But that makes
** testing complicated. By basing the blob size on the value in the
** sqlite_stat1 table, testing is much easier.
*/
And based on the return type of filterHash, I think they are using 64-bit signatures.
Sometimes people also call them “fingerprints”. Personally, I find “truncated hash” the most precise.
In point of fact, while most of the time people truncate to some round number of bytes (8, 16, 32, 64 bits, really) because this is very cheap, but it’s actually barely more expensive to truncate to any smaller than 64 number of bits with masks & shifts. Doing so with a back-to-back array of such truncated b-bit numbers in a hash table structure (such as robin hood linear probing where a hit or miss will likely only induce a nearly guaranteed single cache line miss) lets you construct a filter that has many fewer CPU cache misses than a Bloom filter. It’s really just an existence set table of fingerprints/signatures/truncated hashes. There are some example numbers and a “calculus analysis” at the bottom of the test program, but it’s all only a few hundred lines of Nim and you could re-do a test in your favorite ProgLang.
Of course, as usual “everything all depends”, but Bloom filters may save you a small multiple (2..4X) of space, but probably cost a much bigger multiple (10X) of cache-miss time - unless that space savings is right at some cache size cliff for your workloads. There are also fancier things you can do, like Cuckoo filters { though that will tend to be harder to get down to just 1 cache miss from 1.2 to 1.4 or so and tend to rely upon much stronger hashes to avoid bad failure modes }.
“Partitioned bloom filters” allow all hash lookups for a given key to span a small region of memory (say a page, or several cache lines). In practice, you just use a few bits from the key hash to pick a partition where you evaluate the bloom filter’s hash functions.
One question for the author. The series is called “Pragmatic Category Theory”, but there is no category theory? (e.g. the definition of Semigroup is given in terms of elements instead of morphisms.)
Also, two notes. 1. Arguably this is incorrect:
Associativity is great but it’s not strong enough for MapReduce.
Associativity means that, for MapReduce on N elements, if the reduce step is going to take log(N) time instead of N.
Commutativity would let you start the reduce step before the map step has finished, but whether that is an improvement depends on how variable the map step is.
Often you can compensate for variance in the completion time of the map step by having more jobs than processors/nodes, and dynamically filling nodes that finish early with extra jobs.
Associativity lets you do not only reduce in log(N) time on parallel computers, but also prefix sums.
With P2169 there is also now a much saner std::ignore that works in structured bindings:
auto [x, _, z] = f();
Which also leads to this fun semantical difference between the two:
_ = f(); // (1)
std::ignore = f(); // (2)
(2) will throw away the return value, whereas (1) will stay alive for the remainder of the current scope.
The slightly cursed reason is that for backwards compatibility, _ acts the same as a variable name, except with C++26 you can now redeclare as many _ as you want. But it’s actually really useful for RAII wrappers where you only care about the side effects, e.g.:
Oh, I need to put a pin in this! Recently there was an article complaining about an identical semantic difference in Rust, from the angle of “this isn’t spelled out in the spec”
It would be interesting to compare corresponding wordings from the Rust reference and C++ standard! Would the C++ docs be really more clear than the Rust ones?
Searched the document for the word “fast” and it did not turn up.
The thing where dev tooling for other languages is written in Rust and then becomes much much faster… Somebody maybe should be doing that for Rust itself.
The goal of this project is to make it easier to integrate Rust into existing C and C++ projects that are traditionally built with gcc (Linux being the most prominent).
Given that gcc and clang are comparable at compilation speed, with clangmaaybe being slightly better, I wouldn’t expect this project to improve compilation speed, nor I believe it should be within scope for this project.
Wow, newlines in filenames being officially deprecated?!
Re. modern C, multithreaded code really needs to target C11 or later for atomics. POSIX now requires C17 support; C17 is basically a bugfix revision of C11 without new features. (hmm, I have been calling it C18 owing to the publication year of the standard, but C23 is published this year so I guess there’s now a tradition of matching nominal years with C++ but taking another year for the ISO approval process…)
Nice improvements to make, and plenty of other good stuff too.
It seems like both C and POSIX have woken up from a multi-decade slumber and are improving much faster than they used to. Have a bunch of old farts retired or something?
I have been calling it C18 owing to the publication year of the standard, but C23 is published this year so I guess there’s now a tradition of matching nominal years with C++
I believe the date is normally they year when the standard is ratified. Getting ISO to actually publish the standard takes an unbounded amount of time and no one cares because everyone works from the ratified draft.
As a fellow brit, you may be amused to learn that the BSI shut down the BSI working group that fed WG14 this year because all of their discussions were on the mailing list and so they didn’t have the number of meetings that the BSI required for an active standards group. The group that feeds WG21 (of which I am a member) is now being extra careful about recording attendance.
Unfortunately, there were a lot of changes after the final public draft and the document actually being finished. ISO is getting harsher about this and didn’t allow the final draft to be public. This time around people will probably reference the “first draft” of C2y instead, which is functionally identical to the final draft of C23.
There are a bunch of web sites that have links to the free version of each standard. The way to verify that you are looking at the right one is
look at the committee mailings which include a summary of the documents for a particular meeting
look for the editor’s draft and the editor’s comments (two adjacent doocuments)
the comments will say if the draft is the one you want
Sadly I can’t provide examples because www.open-std.org isn’t working for me right now :-( It’s been unreliable recently, does anyone know what’s going on?
For C23, Cppreference links to N3301, the most recent C2y draft. Unfortunate that the site is down, so we can’t easily check whether all those June 2024 changes were also made to C23. The earlier C2y draft (N3220) only has minor changes listed. Cppreference also links to N3149, the final WD of C23, which is protected by low quality ZIP encryption.
I think for C23 the final committee draft was last year but they didn’t finish the ballot process and incorporating the feedback from national bodies until this summer. Dunno how that corresponds to ISO FDIS and ratification. Frankly, the less users of C and C++ (or any standards tbh) have to know or care about ISO the better.
Re modern C: are there improvements in C23 that didn’t come from either C++ or are standardization of stuff existing implementations have had for ages?
I think the main ones are _BitInt, <stdbit.h>, <stdckdint.h>, #embed
Generally the standard isn’t the place where innovation should happen, though that’s hard to avoid if existing practice is a load of different solutions for the same problem.
They made realloc(ptr, 0) undefined behaviour. Oh, sorry, you said improvements.
I learned about this yesterday in the discussion of rebasing C++26 on C23 and the discussion from the WG21 folks can be largely summarised as ‘new UB, in a case that’s trivial to detect dynamically? WTF? NO!’. So hopefully that won’t make it back into C++.
realloc(ptr,0) was broken by C99 because since then you can’t tell when NULL is returned whether it successfully freed the pointer or whether it failed to malloc(0).
POSIX has changed its specification so realloc(ptr, 0) is obsolescent so you can’t rely on POSIX to save you. (My links to old versions of POSIX have mysteriously stopped working which is super annoying, but I’m pretty sure the OB markers are new.)
C ought to require that malloc(0) returns NULL and (like it was before C99) realloc(ptr,0) is equivalent to free(ptr). It’s tiresome having to write the stupid wrappers to fix the spec bug in every program.
Maybe C++ can fix it and force C to do the sensible thing and revert to the non-footgun ANSI era realloc().
C ought to require that malloc(0) returns NULL and (like it was before C99) realloc(ptr,0) is equivalent to free(ptr). It’s tiresome having to write the stupid wrappers to fix the spec bug in every program.
98% sure some random vendor with a representative via one of the national standards orgs will veto it.
In cases like this it would be really helpful to know who are the bad actors responsible for making things worse, so we can get them to fix their bugs.
It was already UB in practice. I guarantee that there are C++ compilers / C stdlib implementations out there that together will make 99% of C++ programs that do realloc(ptr, 0) have UB.
Not even slightly true. POSIX mandates one of two behaviours for this case, which are largely compatible. I’ve seen a lot of real-world code that is happy with either of those behaviours but does trigger things that are now UB in C23.
But POSIX is not C++. And realloc(ptr, 0) will never be UB with a POSIX-compliant compiler, since POSIX defines the behavior. Compilers and other standards are always free to define things that aren’t defined in the C standard. realloc(ptr, 0) was UB “in practice” for C due to non-POSIX compilers. They could not find any reasonable behavior for it that would work for every vendor. Maybe there just aren’t enough C++ compilers out there for this to actually be a problem for C++, though.
And realloc(ptr, 0) will never be UB with a POSIX-compliant compiler
In general, POSIX does not change the behaviour of compiler optimisations. Compilers are free to optimise based on UB in accordance with the language semantics.
They could not find any reasonable behavior for it that would work for every vendor
Then make it IB, which comes with a requirement that you document what you do, but doesn’t require that you do a specific thing, only that it’s deterministic.
Maybe there just aren’t enough C++ compilers out there for this to actually be a problem for C++, though.
No, the C++ standards committee just has a policy of not introducing new kinds of UB in a place where they’re trivially avoidable.
In general, POSIX does not change the behaviour of compiler optimisations. Compilers are free to optimise based on UB in accordance with the language semantics.
C23 does not constrain implementations when it comes to the behavior of realloc(ptr, 0), but POSIX does. POSIX C is not the same thing as standard C. Any compiler that wants to be POSIX-compliant has to follow the semantics laid out by POSIX. Another example of this is function pointer to void * casts and vice versa. UB in C, but mandated by POSIX.
No, the C++ standards committee just has a policy of not introducing new kinds of UB in a place where they’re trivially avoidable.
They introduced lots of new UB in C++20, so I don’t believe this.
When I’ve tried emulating X86-64 on Apple Silicon using QEMU it’s been incredibly slow, like doing ls took like 1-2 seconds. So if these fine people manage to emulate games then I’m very impressed!
QEMU emulation (TCG) is very slow! Its virtue is that it can run anything on anything, but it’s not useful for productivity or gaming. I used to use it to hack around a FEX RootFS as root, and even just downloading and installing packages with dnf was excruciatingly slow.
Emulators that optimize for performance (such as FEX, box64, and Rosetta, and basically every modern game console emulator too) are in a very different league. Of course, the tradeoff is they only support very specific architecture combinations.
As @lina says, QEMU is general. It works a few instructions at a time, generates an IR (TGIR, which was originally designed for TCC, which was originally an IOCC entry), does a small amount of optimisation, and emits the result.
Rosetta 2 works on much larger units but, more importantly, AArch64 was designed to support x86 emulation and it can avoid the intermediate representation entirely. Most x86-64 instructions are mapped to 1-2 instructions. The x86-64 register file is mapped into 16 of the AArch64 registers, with the rest used for emulator state.
Apple has a few additional features that make it easier:
They use some of the reserved bits in the flags register for x86-compatible flags emulation.
They implement a TSO mode, which automatically sets the fence bits on loads and stores.
FEX doesn’t (I think) take advantage of these (or possible does but only on Apple hardware?), but even without them it’s quite easy (as in, it’s a lot of engineering work, but each bit of it is easy) to translate x86-64 binaries to AArch64. Arm got a few things wrong but both Apple and Microsoft gave a lot of feedback and newer AArch64 revisions have a bunch of extensions that make Rosetta 2-style emulation easy.
RISC-V’s decision to not have a flags register would make this much harder.
There are two more hardware features: SSE denormal handling (FTZ/DAZ) and a change in SIMD vector handling. Those are standardized as FEAT_AFP in newer ARM architectures, but Apple doesn’t implement the standard version yet. The nonstandard Apple version is not usable in FEX due to a technicality in how they implemented it (they made the switch privileged and global, while FEX needs to be able to switch between modes efficiently, unlike Rosetta, and calling into the kernel would be too slow).
FEX does use TSO mode on Apple hardware though, that’s by far the biggest win and something you can’t just emulate performantly if the hardware doesn’t support it. Replacing all the loads/stores with synchronized ones is both slower and also less flexible (fewer addressing modes) so it ends up requiring more instructions too.
them it’s quite easy […] to translate x86-64 binaries to AArch64
[…]
RISC-V’s decision to not have a flags register would make this much harder.
Dumb question: is there a reason not to always ahead-of-time compile to the native arch anyway?
(i believe that is what RPCS3 does, see the LLVM recompiler option).
As I understand it, that’s more or less what Rosetta 2 does: it hooks into mmap calls and binary translates libraries as they’re loaded. The fact that the mapping is simple means that this can be done with very low latency. It has a separate mode for JIT compilers that works more incrementally. I’m impressed by how well the latter works: the Xilinx tools are Linux Java programs (linked to a bunch of native libraries) and they work very well in Rosetta on macOS, in a Linux VM.
The Dynamo Rio work 20 or so years ago showed that JITs can do better by taking advantage of execution patterns. VirtualPC for Mac did this kind of thing to avoid the need to calculate flags (which were more expensive on PowerPC) when they weren’t used. In contrast, Apple Silicon simp,y makes it sufficiently cheap to calculate the flags that this is not needed.
Rosetta does do this, but you have to support runtime code generation (that has to be able to interact with AOT generated code) at minimum because of JITs (though ideally an JIT implementation should check to see if it is being translated and not JIT), but also if you don’t support JIT translating you can get a huge latency spike/pause when a new library is loaded.
So no matter what you always have to support some degree of runtime codegen/translation, so it’s just a question of can you get enough of a win from an AOT as well as the runtime codegen to justify the additional complexity.
Ignore the trashy title, this is actually really neat. They offload IO to separate threads which means the main thread now gets commands in batches; so the main thread can interleave the data structure traversals for multiple keys from the batch, so it can make much better use of the memory system’s concurrency.
I wonder whether anyone has tried to fold system load into the concept of time. I.e. time flows slower if the system is under load from other requests.
Oops, that’s a total mistake on our part! I’ve (re)added a reference (it was lost along the way somehow) and included it back in our latter discussion. Thanks for pointing out that omission!
The original mailing-list thread started when someone came back to their workstation to find it magically unlocked: while they were gone, the system had run out of memory and the OOM killer had chosen to kill the xlock process!
If anything, this speaks more how badly X is for modern use cases than anything. There are lots of reasons that the locker can die (not only for OOM), but the fact that this can “unlock” your desktop is the actual absurd part.
If wayland was good, we’d all be using it by now. It has had so, so much time to prove itself.
My experience with wayland has been:
it’s never the default when I install a popular window manager
every 5 years I see if I should tweak settings to upgrade, and find out that if I do that it’s going to break some core use case, for example gaming, streaming, or even screenshots for god’s sake.
it’s been like 20 years now
I think the creators of Wayland have done us a disservice, by convincing everyone it is the way forward, while not actually addressing all the use cases swiftly and adequately, leaving us all in window manager limbo for two decades.
Maybe my opinion will change when I upgrade to Plasma 6. Although, if you search this page for “wayland” you see a lot of bugs…
it’s never the default when I install a popular window manager
Your information might be outdated, not only is the default in Plasma 6 and GNOME 46, but they’ve actually worked to allow compiling them with zero Xorg support. I believe a lot of distros are now not only enabling it by default but have express plans to no longer ship Xorg at all outside of xwayland.
If wayland was good, we’d all be using it by now. It has had so, so much time to prove itself.
Keep in mind that I gave Wayland as an example in how they should have fixed the issue (e.g.: having a protocol where if the locker fails, the session is just not opened to everyone).
My experience with Wayland is that it works for my use cases. I can understand the frustration of not working for yours (I had a similar experience 5 years ago, but since switching to Sway 2 years ago it seems finally good enough for me), but this is not a “Wayland is good and X is bad”, it is “X is not designed for modern use cases”.
This is a thread about OS kernels and memory management. There are lots of us who use Linux but don’t need a desktop environment there. With that in mind, please consider saving the Wayland vs X discussion for another thread.
If Linux was any good, we’d all be using it by now.
If Dvorak was any good, we’d all be typing on it by now.
If the metric system was any good, the US would be using it by now.
None of the above examples are perfect, I just want to insist that path dependence is a thing. Wayland, being so different than X, introduced many incompatibilities, so it had much inertia to overcome right from the start. People need clear, substantial, and immediate benefits to consider paying even small switching costs, and Wayland’s are pretty significant.
Except on the desktop. You could argue it’s just one niche among many, but it remains a bloody visible one.
Dvorak isn’t very good (although I personally use it). Extremely low value-add.
Hmm, you’re effectively saying that no layout is very good, and all have extremely low value-add… I’m not sure I believe that, even if we ignore chording layouts that let stenotypists type faster than human speech: really, we can’t do significantly better than what was effectively a fairly random layout?
The metric system is ubiquitous, including in the US. […]
I call bullshit on this one. Last time I went there it was all about miles and inches and square feet. Scientists may use it, but day to day you’re still stuck with the imperial system, even down to your standard measurements: wires are gauged, your wood is 2 by 4 inches, even your screws use imperial threads.
Oh, and there was this Mars probe that went crashing down because of an imperial/metric mismatch. I guess they converted everything to metric since then, but just think of what it took to do that even for this small, highly technical niche.
That being said…
I think Wayland sucked a lot.
I can believe it did (I don’t have a first hand opinion on this).
Just on a detail, I believe it doesn’t matter to most people what their keyboard layout is, and I’ve wasted a lot of time worrying about it. A basically random one like qwerty is just fine. That doesn’t affect your main point though, especially since the example of stenography layouts is a slam dunk. Many people still do transcription using qwerty, and THAT is crazy path-dependence.
The display protocol used by a system using the bazaar style of development is not a question of design, but that of community support/network effect. It can be the very best thing ever, if no client supports it.
Also, the creators of Wayland are the ex-maintainers of X, it’s not like they were not familiar with the problem at hand. You sometime have to break backwards compatibility for good.
Sure, it’s good to be finally happening. My point is if we didn’t have Wayland distracting us, a different effort could have gotten us there faster. It’s always the poorly executed maybe-solution that prevents the ideal solution from being explored. Still, I’m looking forward to upgrading to Plasma 6 within the next year or so.
Designing Wayland was a massive effort, it wasn’t just the Xorg team going “we got bored of this and now you have to use the new thing”, they worked very closely with DE developers to design something that wouldn’t make the same mistakes Xorg did.
Take the perspective of a client developer chasing after the tumbleweed of ‘protocols’ drifting around and try to answer ‘what am I supposed to implement and use’? To me it looked like like a Picasso painting of ill-fitting- and internally conflicted ideas. Let this continue a few cycles more and X11 will look clean and balanced by comparison. Someone should propose a desktop icon protocol for the sake of it, then again, someone probably already has.
It might even turn out so well that one of these paths will have a fighting chance against the open desktop being further marginalised as a thin client in the Azure clouded future; nothing more than a silhouette behind unwashed Windows, a virtualized ghost of its former self.
That battle is quickly being lost.
The unabridged story behind Arcan should be written down (and maybe even published) during the coming year or so as the next thematic shift is around the corner. That will cover how it’s just me but also not. A lot of people has indirectly used the thing without ever knowing, which is my preferred strategy for most things.
Right now another fellow is on his way from another part of Europe for a hackaton in my fort out in the wilderness.
Arcan does look really cool in the demos, and I’d like to try it, but last time I tried to build it I encountered a compilation bug (and submitted a PR to fix it) and I’ve never been able to get any build of it to give me an actually usable DE.
I’m sure it’s possible, but last time I tried I gave up before I worked out how.
I also wasn’t able to get Wayland to work adequately, but I got further and it was more “this runs but is not very good” instead of “I don’t understand how to build or run this”.
Maybe being a massive effort is not actually a good sign.
Arguably Wayland took so long because it decided to fix issues that didn’t need fixing. Did somebody actually care about a rogue program already running on your desktop being able to capture the screen and clipboard?
edit: I mean, I guess so since they put in the effort. It’s just hard for me to fathom.
Was it a massive effort? Its development starting 20 years ago does not equate to a “massive effort,” especially considering that the first 5 years involved a mere handful of people working on it as a hobby. The remainder of the time was spent gaining enough network effect, rather than technical effort.
Sorry, but this appeal to the effort it took to develop wayland is just embarrassing.
Vaporware does not mean good. Au contraire, it usually means terrible design by committee, as is the case with wayland.
Besides, do you know how much effort it took to develop X?
It’s so tiring to keep reading this genre of comment over and over again, especially given that we have crazyloglad in this community utterly deconstructing it every time.
This is true but I do think it was also solved in X. Although there were only a few implementations as it required working around X more than using it.
IIRC GNOME and GDM would coordinate so that when you lock your screen it actually switched back to GDM. This way if anything started or crashed in your session it wouldn’t affect the screen locking. And if GDM crashed it would just restart without granting any access.
That being said it is much simpler in Wayland where the program just declares itself a screen locker and everything just works.
Crashing the locker hasn’t been a very good bypass route in some time now (see e.g. xsecurelock, which is more than 10 years old, I think). xlock, the program mentioned in the original mailing list thread, is literally 1980s software.
X11 screen lockers do have a lot of other problems (e.g. input grabbing) primarily because, unlike Wayland, X11 doesn’t really have a lock protocol, so screen lockers mostly play whack-a-mole with other clients. Technically Wayland doesn’t have one, either, as the session lock protocol is in staging, but I think most Wayland lockers just go with that.
Unfortunately, last time I looked at it Wayland delegated a lot of responsibilities to third-parties, too. E.g. session lock state is usually maintained by the compositor (or in any case, a lot of ad-hoc solutions developed prior to the current session lock protocol did). Years ago, “resilient” schemes that tried to restart the compositor if it crashed routinely suffered from the opposite problem: crashing the screen locker was fine, but if the OOM reaped the compositor, it got unlocked.
I’d have thought there was a fairly narrow space where CPU SIMD mattered in game engines. Most of the places where SIMD will get an 8x speedup, GPU offload will get a 1000x speedup. This is even more true on systems with a unified memory architecture where there’s much less cost in moving data from CPU to GPU. It would be nice for the article to discuss this.
Early SIMD (MMX, 3DNow, SSE1) on mainstream CPUs typically gave a 10-30% speedup in games, but then adding a dedicated GPU more than doubled the resolution and massively increased the rendering quality.
The big advantage of CSV and TSV is that you can edit them in a text editor. If you’re putting non-printing characters in as field separators you lose this. If you don’t need that property then there are a lot of better options up to, and including, sqlite databases.
Though I suppose that it has the advantage of not coming with any meaning pre-loaded into it. Yet. If we use these delimiter tokens for data files then people will be at least slightly discouraged from overloading them in ways that break those files.
A text editor or grep is one of M, and TSV is one of N.
If you have an arbitrary DSV, you’re not really in that world any more – now you need to write your own tools.
FWIW I switched from CSV and TSV, because the format is much simpler. As far as I can tell, there is exactly one TSV format, but multiple different CSV formats in practice. There’s less room for misunderstanding.
grep (GNU grep 3.11) does pass the non-printables through but doesn’t recognise \x1e as a line separator (and has no option to specify that either) which means you get the whole wash of data whatever you search for.
And there are more tools, like head and tail and shuf.
xargs -0 and find -print0 actually have the same problem – I pointed this out somewhere on https://www.oilshell.org
It kind of “infects” into head -0, tail -0, sort -0, … Which are sometimes spelled sort -z, etc.
The Oils solution is “TSV8” (not fully implemented) – basically you can optionally use JSON-style strings within TSV cells.
So head tail grep cat awk cut work for “free”. But if you need to represent something with tabs or with \x1f, you can. (It handles arbitrary binary data, which is a primary rationale for the J8 Notation upgrade of JSON - https://www.oilshell.org/release/latest/doc/j8-notation.html)
I don’t really see the appeal of \x1f because it just “pushes the problem around”.
Now instead of escaping tab, you have to escape \x1f. In practice, TSV works very well for me – I can do nearly 100% of my work without tabs.
If I need them, then there’s TSV8 (or something different like sqlite).
head can be done in awk, the rest likely require to output to zero on newline terminated output and pass it to the zero version of themselves. With both DSV and zero-terminated commands, I’d make a bunch of aliases and call it a day.
I guess that counts as “writing your own tools”, but I end up turning commonly used commands into functions and scripts anyway, so I don’t see as a great burden. I guess to each their workflow.
Question for Andrew: how do you make this style work in the presence of multithreading? The advantage with having each object be its own memory allocation and accessed through its own pointer is that you can modify it atomically with a copy + atomic pointer swap, and there’s tricks one can do to do fast memory reclamation without going all the way to a GC. This is kinda lost when you coalesce all the objects into a single memory allocation, but maybe there’s another way to handle the problem?
Non-cursed answer: you don’t
cursed answer: click if you dare
I wonder, how does this compare to counter-based pRNGs (e.g. Philox)?
Counter-based RNGs make it much easier to calculate skip-ahead states – adding 0,1,2,3,4 is more obvious and faster than two multiplications and an add – so they are a lot easier to vectorize than PCG!
I haven’t looked at counter-based RNGs in detail; I decided a few years ago that PCG is good enough (TM) so I could just use my standard implementation whenever I need some random numbers. But I can’t stop tinkering with it, and occasionally it attacks me with a cunning idea that I need to blog about.
Reading this makes me wonder, if you did have to come up with a system for shared hosting that was easier to manage nowadays, what would it be? It doesn’t feel like
fastCGIis the answer, but is there one?That’s a very good question, and I want to make CGI version 2 :-)
CGI hasn’t been updated since 1997 apparently - https://en.wikipedia.org/wiki/Common_Gateway_Interface
What I have in mind:
while (1)server%20x=42&y=99I think it can be useful for newer languages like YSH, Inko, Zig, etc.
They will relieve you of having to write say a multipart/MIME parser, which is very annoying. That can happen in a different process.
Instead you would just use a length-prefixed format for everything, like netstrings
It could also be JSON/JSON8, since those parsers already exist
Actually almost 15 years ago I wrote almost exactly this with https://tnetstrings.info/ , used by the Mongrel 2 web server
My implementation was actually used by a few people, but it wasn’t very good
I still think this is sorely needed… it’s weird that nothing has changed since 1997, but I guess that’s because there’s a lot of money to be made in the cloud, and locking customers in
The 12 factor app by Heroku (linked in post) is probably the only evidence I’ve seen of any hosting provider thinking about protocols and well-specified platforms, in the last 20+ years
If anyone has a nascent programming language that needs “CGI” / web support, feel free to contact me! It would be nice to iron out this protocol with someone who wants to use it
I think it can be deployed easily, because the first step would just be a simple
CGI execwrapper … So it would be a CGI script that execs your program, and hence can be deployed anywhere. This won’t be slow because Unix processes are not slow :-)Later it can be built into a web server
my previous comments on FastCGI
FastCGI is pointless (2014)
I’d say that proxying is fine and that’s what Heroku does, but it’s also easier to write a CGI script than an HTTP server.
It’s not easier to write a FastCGI script than an HTTP server, because you always need a FastCGI library, which is annoying.
But I think we can preserve the good parts of FastCGI in a “CGI version 2”
There is a actually a 3rd dimension I’m thinking of - “lazy” process management, which is similar to the cloud. It’s not just config and the protocol.
Also, WSGI / Rack / PSGI all use CGI-style requests, in process. So there a CGI-like protocol is still very natural for programming languages, more so than HTTP.
Just for refence, CGI has no state, right? I’ve always kinda wondered if there was some trick that could be used that would get us to something kinda like CGI but without paying bootstrap costs per request (though maybe that’s… fine)
Yes, CGI is stateless – it starts a process for every request.
The simplest way to avoid that would be to have the subprocesses run in a
while (1)loop as mentioned. So you can reuse your database connections and so forth between requests.And then a higher level would handle concurrency, e.g. starting 5 of these
while (1)processes in parallel, and multiplexing between them.You will get an occasional “cold start”, which is exactly how “serverless” products work, so that seems fine. Or of course you can always warm them up beforehand.
(FastCGI as deployed on Dreamhost has a weird quirk where it will start 2 processes, but then those 2 processes may each have say 4 threads, which to me seems to make provisioning more complex)
I’m guessing that it’s rare to run into any real-world problems with CGI now that machines have enough memory that they no longer have to hit the disk in order to find the code to run for each request.
It depends on the startup cost of the CGI program. You’ll run into problems at low load if it’s Python or if it’s library-heavy Perl. My link log is a CGI that ingests its entire database of tens of thousands of links on every request, which needed FastCGI when it was Perl; after rewriting with Rust+Serde it is fast enough that I could revert to much simpler CGI. At some point I might have to move it from a flat file to something with indexes…
If you use CGI anyway, can’t you embed the links in the binary at compile time? New links => recompile and swap the binary. Should just work, as there’s no state.
On an emotional level I dislike the idea of spinning up Python and doing parsing, compilation etc on every. Single. Request.
State and cache is “the enemy” for lots of stable systems but it would be neat to have something.
The stylesheet seems broken for me (Upvote icon replaced by black rectangle. Also text seems a deeper/harsher black? I’m on Chromium on Linux).
Confirm, in Chromium on Linux the triangle is now a black rectangle. (Also the border around the text editing box for comments is not showing.)
I get a white rectangle, and it doesn’t change when I upvote something.
Yup. stylesheet is completely broken and upvoting no longer works.
It still works AFAICT, it just isn’t reflected on the UI.
I’m starting to believe more and more that the IT security field is a huge scam.
This is based on anecdotal data:
I used to work at German Fintech. We were about to get a big German bank as a customer, they requested “a security audit”, whatever that meant. My startup looking to save as much money as possible went to some random security company in Germany. The security “experts” ran some off-the-shelf vulnerability scanner on our Python/Flask application, and sent us a “list of vulnerabilities.”
We had a bug that if you accessed something else than
/or/login/while being logged-off, you would get a 500 error. This was because your user wasNoneanduser.emailwould raise anAttributeError: 'NoneType' object has no attribute 'email'.The report was mostly vulnerabilities like:
Imagine this declined for
/wp-admin/, and all the other PHP cliché paths. We fixed the bug, and the bank was happy with our security audit…I used to work at another firm, which had some automaker as a customer, I mean this is Germany after all. They requested to run an audit on our system, before sending us any data. What was the audit? Sending a phishing email to all the employees. Of course somebody fell for it, so we had to go through phishing training. (There is a great article about how this is fake security: Don’t include social engineering in penetration tests)
My partner worked in a large non-tech firm. Their internal software has no granular ACL. People had global read/write access to everything by default, you just needed to login. If you were to manage to compromise an account, you could wreak total havoc on their systems. They had a dedicated security department, if you didn’t update your computer the latest Windows version when it was suggested to you, you would get a warning sent to your manager…
I was working at a multi-national firm, tech related, and we had to improve a risky system. Users had wide access, and there were severe insider risk. We designed a new solution, it was not fully secure, but it was definitely an improvement over the current state. We wrote down the design, sent it for security review, the security council was like “no! there are there problems, and these problems. You can’t move forward.” We explained that these problems already existed, and we were open to suggestions for solutions. They told us we were responsible for finding solutions, and blocked the project, thus leaving the current, worse situation as an indefinite solution. Basically, it was not about security, it was about bureaucracy and justifying their jobs… But I still needed to reboot my computer every X days for security reasons…
All of this leads me to believe IT security industry, such as CVEs, is entirely bullshit. I’m convinced that the only people doing real security are people like Theo de Raadt and Colin Percival…
Selecting examples of bad approaches doesn’t work. It’s like saying art is a scam because there’s lots of terrible quality work created on Fiverr. It’s sure there, but that doesn’t define the field itself. It’s a problem that companies can’t easily identify good quality security consultants. It’s a problem that PCI, SOX, etc are silly checkbox exercises. It’s a problem that standards are hard to apply to very custom internal solutions. It’s a problem that security testing is hard. But every field will have problems like that - it doesn’t mean the whole field is a scam.
The problem is if (my guess) 90% of people only have the experiences with audits as described then you can try to argue for bad examples, but if it’s the majority it’s a structural problem. And I think it is.
When I worked at a bank we also had mandatory security “checks” (not a proper audit by an external company) and 90% of the findings were simply bogus or non-exploitable, leaning mostly towards ridiculous. The more regulated the market the more it just attracts charlatans who know nothing except to check boxes.
In every industry there are good people doing good work, that doesn’t make the industry not bullshit overall.
This! Thank you. This was exactly my point.
One can take the Multi-Level-Marketing industry, as an example, which is basically a scam, and I’m pretty sure one can find a few multi-level-marketing companies which actually focus on selling products instead of running a pyramid scheme of salespeople. But one cannot use these isolated cases to dismiss the scam-ish behavior of MLMs as a whole.
If people have paid attention to the previous stories from Daniel (= the author of this story), he has been trying to fight windmills for years because people are gaming the CVE mechanism for their own benefits. (i.e. if you find a lot of critical vulnerabilities it helps your reputation as a security researcher, so there is an incentive to inflate the score of any vulnerability you find.)
I’m reading these stories, plugging in my anecdotal experiences with the industry, and folks go “no you can’t just throw a rotten industry under the bus, there are some competent people”. I’m sure there are some smart and well-intention people. In fact, I mentioned two: Theo de Raadt and Colin Percival. I’m convinced there are more, but this doesn’t mean the industry is healthy.
I’ve started referring to most CVEs as Curriculum Vitae Enhancers. I can’t even remember the last time we had a legitimate problem flagged by the scanner at work compared to the bogus ones. It makes it very difficult to take any of them seriously.
Depends on the prevalence of such problems. If a sufficient proportion of the field is such bullshit, then the field is a bullshit.
I’ll add another example, that I found was almost ubiquitous: bogus password management at work:
correct battery horse stapleCorrect battery horse staple.Correct battery horse staple1.It’s been some years now that the NIST updated their recommendations to “don’t use complexity rules, just ban the most common password”, and “don’t force password renewal unless there’s an actual suspicion of a breach”. Yet for some reason the people in charge keep with the old, less secure, more cumbersome way of doing things. This is at best incompetence at a global scale. (I mean, okay, I live in the EU, so applicable guidelines may be obsolete, but the larger point remains.)
Oh, password management blowing up in people’s faces is one of my favourite types of best intentions just not surviving contact with the real world.
My favourite is from a few years back, in a place that had a particularly annoying rule. When people were onboarded, they were offered a password that they could not change. They could only change a password when it expired.
Now, all passwords issued to new hires had a similar format, they were something like 8 random letters followed by 4 random numbers. Since you couldn’t change it in the first three months, people really did learn them by heart, it was more or less the only way.
When the time to change them came, the system obviously rejected previous passwords. But it also rejected a bunch of other things, like passwords that didn’t have both letters and numbers, or passwords based on a dictionary word – so “correct battery horse staple” didn’t work, I mean, it had not one, but four dictionary words, so it had to be, like, really bad, right?
Most people wouldn’t type in their “favourite” password (since they knew they had to change it soon), couldn’t type something more rememberable, so they did the most obvious thing: they used exactly the same password and just incremented the number. So if they’d started out with ETAOINSH-9421, their new password would be ETAOINSH-9422.
Turned out that this was juuuust long enough that most people had real difficulty learning it by heart (especially those who, not being nerds, weren’t used to remembering weird passwords). So most of them kept their onboarding sheet around forever – which contained their password. Well, not forever, they kept it around for a few weeks, after which they just forget it in a drawer somewhere.
If you got a hold of one of those – it was conveniently dated and all – you could compromise their password pretty reliably by just dividing the interval since they’d been hired to now by the password change interval, add that to their original password, and there you had it.
I mean even M$ is saying this nowadays. Nevertheless, our (Czech) beloved Cybernetic Security Agency decided to publish following recommendation:
So not only they force expiration, but they also introduce DoS vector. :facepalm:
And they are literally in M$ pocket, running 100% on their stack, cloud, etc…
Microsoft is weird here; they’re forcing my users to enter a pin number which is apparently as good as their complex password; yet its limited to 4-6 characters.
The lord knows how quickly one can blow through the entire numbers keyspace of even a 10 digit number, and numbers arent more memorable than phrases. I’m not sure how this is better security, but they are convinced it is: and market it as such.
PINs on Windows machines are backed by the TPM, which will lock itself down in the event that someone is trying to brute-force the PIN. That’s basically the entire point of those PINs: Microsoft basically says “we introduced an extra bit of hardware that protects you from brute force attacks, so your employees can use shorter, more memorable passwords (PINs)”.
Those PINs are actually protecting a (much longer) private key that is stored in the TPM chip itself. The chip then releases this key only if the PIN is correct. You can read more about the whole system here: Windows Hello.
To add to what @tudorr said: the reason that passwords need to be complex is that the threat model is usually an offline attack. If someone grabs a copy of your password database (this happens. A lot.) then they can create rainbow tables for your hash salt and start brute forcing it. If you’re using MD5, which was common until fairly recently, this cracking is very fast on a vaguely modern GPU for short passwords. If you’re using a modern password hash such as one of the Argon2 variants, you can tune it such that easy attempt costs 128 MiB of RAM (or more) and a couple of seconds of compute, so attacking it with a GPU is quite hard (you’ll be RAM limited on parallelisation and the compute has a load of sequential paths so making it take a second is not too hard, a GPU with 8 GiBs of RAM (cheap now) may be able to do 64 hashes per second, and it takes a long time to crack at that speed. Unless you care about attackers willing to burn hundreds of thousands of dollars on GPU cloud compute, you’re probably fine.
The PIN on Windows and most mobile devices is not used in the same way as the password. It is passed to the TPM or other hardware root of trust as one of the inputs to a key derivation function. This is then used to create a private key. When you log in, the OS sends this and a random number to the TPM. The TPM then generates the key with the PIN and some on-TPM secrets (which may include some secure boot attestation, so a different OS presenting the same PIN will generate a different key) and then encrypts the random number with this key. The OS then decrypts it with the public key that it has on file for you. If it matches the number that was sent to the TPM, you have logged in. Some variants also store things like home-directory encryption keys encrypted with this public key and so you can’t read the user’s files unless the TPM has the right inputs to the KDF and decrypts the encryption key. Even if you have root, you can’t access a user’s files until they log in.
If you compromise the password database with PINs, you get the public key. To log in, you need to create the private key that corresponds to the public key associated with the account. This is computationally infeasible.
The set of PINs may be small, but the PIN is only one of the inputs to the KDF. You need to know the others for the PIN to be useful. The TPM (or equivalent) is designed to make it hard to exfiltrate the keys even if you’re willing and able to decap the chips and attack them with a scanning-tunnelling electron microscope. If you can’t do that, they do rate limiting (in the case of TPMs, often by just being slow. Apple’s Secure Element chips implement exponential backoff, so require longer waits after each failed attempt). If you get three guesses and then have to wait a few minutes, and that wait gets longer every failed attempt, even a 4-digit PIN is fine (assuming it isn’t leaked or guessed some other way, such as being your birthday, but in those cases an arbitrarily long PIN is also a problem).
I knew someone that hated that rule, but he figured out that the system would forget his previous passwords after ten changes, so every time he changed his password eleven times, putting back his old password at the end.
Thus (at least partially) defeating the very purpose of the bogus policy to begin with. Oh well, as long as I can just number my password that’s not too bad.
At university I had to change my password on a regular basis. There was no requirement, but my 128 character password would trip up the occasional university system. And the ’ character (or some character) would trip up ASP.NET’s anti-SQL injection protections, and a couple of sites that I had to use very infrequently were ASP.NET. So I would change my password to something simpler, then change it back.
Eventually they instituted a password history thing, or I just got tired of picking a new password, or something like that. I can’t remember. I just remember sitting there trying over and over again to exhaust the history mechanism. I got up to multiple dozens of password changes before I gave up.
Your analogy works better than you think, but your conclusion is wrong.
The art field is absolutely a scam, and what art professionals do (which is different from artists) defines the field, making it a total con. Same with these security “professionals”.
There are real artists and there are real security researchers. They make for an insignificant amount of activity in their respective industries. Clearly a very important part, of course, otherwise the grifters wouldn’t have anything to grift on. But the dynamics of their industries come from the grifters, not from the researchers or the real artists.
It is mostly compliance theater, so one cannot be sued. Nobody really cares, just make sure all boxes are ticked.
Security is a huge problem so there are a lot of vendors and a lot of people in it. It’s a very immature industry and, frankly, the standards for getting into it are extremely low. Most security analysts will have virtually no ability to read or write code, it’s shocking but it’s true. Their understanding of networking isn’t even particularly strong - your average developer probably has a better understanding.
You’re describing the “bad” that I’ve seen a bit of, but it’s quite the opposite of my personal experiences. At Dropbox, when I was on the security team at least (years ago), we didn’t say things like “no” or “that’s your responsibility”. We had to pass a coding test. We built and maintained software used at the company. When we did security reviews we explained risks and how to mitigate, we never said “no” because it wasn’t our place - risk is a business decision, we just informed the process. Lots of companies operate this way but it takes investment.
Unfortunately the need is high and the bar is low, so this is where we find ourselves.
I would not write off the entire security industry other than a few people as a scam.
Some of us do understand networks. I’ve been doing infosec for about 30 years, I do agree that most of the auditors and folks that are analysts tend not to know enough. It’s very sad to see the complete lack of technical competence within the field. Even with supposed standards such as CISSP etc. I find a lack of understanding. I’m working to change that that I teach the younger folks about networking, and intelligent analysis. So I’m doing my bit.
Oh I’ve worked with a ton of security people who know a ton of stuff and are impressively technical. I just mean that the skillset levied at a company can vary wildly. What most security people do bring to the table is information on actual attacks, what attackers go for, an interest in the space, etc. But in terms of specific technical skills the bar is all over the place.
Kinda makes me think computer software business is a huge scam :P The examples read to me as if “business does not care about actual security, but does the bare minimum they can get away with and then is surprised that this isn’t appropriate”
Isn’t that how businesses do everything, though?
It’s no more BS than almost everything else. Any even moderately sized organization will consist largely of people who are some combination of: completely out of depth, unable to think about nuance, do not care about the result of their actions, selfish and so on. Most people just want to get paid and not get in trouble, and expect the same from other people. It takes a huge determination and wisdom from the leadership to steer any larger group towards desirable outcomes. Governments, corporations, etc are doomed to complete inefficiency for this very reason.
It’s unfortunately just not a thing, IMO, that’s well-suited to being commodified, reduced to a set of checklists (though checklists are a helpful tool in the field, to be sure) and turned into an “industry” at all. The people you list are especially good at making exploits harder. There are some people who are doing excellent work at detecting and responding to exploits as well.
Apart from those two areas, which are important, I feel like the big advances to be had are in the study of cognitive psychology (specifically human/machine interaction) but haven’t quite been able to persuade myself to go back to school for that yet and pursue them.
I would argue that checklists being created by people not understanding the subject matter is worse than no checklists. I’ve seen things be objectively made worse because of security check lists.
I think we probably agree, but I’d argue that the problem is that they’re often used to supplant rather than supplement expert judgement.
Two examples I can think of offhand are flight readiness checklists and pre-surgery checklists. Those are often created by managers or hospital administrators (but with expert input) who don’t understand the details of the subject matter the same way as the pilots, mechanics and surgeons who will execute them. And they’ve been shown to reduce errors by the experts who execute them.
What we’re doing with IT security checklists, though, is both having non-experts create them, then having non-experts execute them, then considering them sufficient to declare a system “secure”. A checklist, even created by non-experts, in the hands of someone who well understands the details of the system is helpful. A checklist, no matter who created it, being executed by someone who doesn’t then being signed off by someone who doesn’t, makes things objectively worse.
I was under the impression (mainly from Atul Gawande’s book) that flight checklists and surgical checklists were created mostly by experts. The managers are mostly responsible for requiring that checklists are used.
While I have heard about that book, I haven’t read it. My only experience with flight checklists is from a mechanic’s perspective, so I can’t authoritatively say who created them, but the impression I got was that it was project managers synthesizing input from pilots, mechanics and engineers. Working with hospital administrators, I would definitely argue that the administrators had creative input into the checklists. Not into the technical details, exactly. But they were collecting and curating input with help, because they were responsible for the “institutional” perspective, where the experts who were detailing specific list items were taking a more procedure-specific lens.
The big beneficial input of the managers was certainly requiring that checklists be used at all, and maybe “synthesized” is a better word than “created” in the context of their input to the lists themselves.
I think it’s worth mentioning that air travel is highly standardized and regulated. Commercial flight especially, but even private planes must comply with extensive FAA regulations (and similar outside the US, for most countries).
The world of IT isn’t even close to that level of industrial maturity on a log scale.
That’s a really good point. “Industrial maturity” is the phrase I was grasping for with my first sentence in my first reply, and kind of described but didn’t arrive at.
I think the meaningful difference is that people would push back by actively wrong things on these checklists.
Imagine “poke your finger into the open wound” levels of bad, not just “make sure to wash your hands three times” where someone would say “we used to do two, and it was fine, but ok, it takes me 1/10 the time of the operation ans hurts no one” versus “make sure to install Norton Antivirus on your server, and no we don’t care if you run OpenBSD” - which is “absolutely horrible take on security that would make everything worse even if it was possible”.
I remember one check list where it said “make sure antivirus X is installed” - the problem is that it made every linux box work at 20% speed (we measured) when it scanned git checkouts. So we made sure it was installed and not running. Check.
I don’t think the field is scam, however, some people working in IT security are scammers or amateurs.
Actually, just sending a phishing to everyone, and then checking who clicked would be a great thing to do. I work for a tiny company and not everyone is that savvy.
At previous $workplace, we had that cheesy warning in Outlook telling you the mail was from someone outside your organization. Of course when the security team ran a phishing test, they disabled that, and people fell for it. I guess because it’s really easy to spoof a sender that looks like they’re internal? If so, what’s the use of the warning in the first place?
Your mail server should reject mail from its domain that’s coming from other servers, let alone DKIM and SPF.
Yes. Well, my point is that the security team disabled one part of security theater to sneakily trap people to click on a link in an email that should reasonably have been flagged as outside the organization if the first piece of theater was active.
I don’t think it’s theater. If the detection works as intended, it’s a very good indicator of potential phishing.
In my experience it’s way too noisy to be a useful indicator. I’d estimate that 95% of the emails I get have a yellow WARNING: EXTERNAL EMAIL banner at the top. Every email from Jira, GitLab, Confluence, Datadog, AWS, you name it… they’re all considered “external” by the stupid mail system so they all get the warning. People develop something like ad banner blindness and tune this warning out very quickly.
Also helps that it’s the first line in an email, and we’re already trained to skip right past the “Dear So and so,”
I suppose this is a benefit to my employer self-hosting Jira, GitLab, Confluence, etc. - their emails all go through internal mail servers, and as such don’t have the “EXTERNAL EMAIL” warning on them (unless they’re not legit)
I tested this when I was at Microsoft. The SPF and DMARC records were correctly set up for microsoft.com, yet an email from my own Microsoft address to me, sent from my private mail server, arrived and was not marked as coming from an external sender. The fact that it failed both SPF and DKIM checks was sufficient for it to go in my spam folder, but real emails from colleagues sometimes went there too so I had to check it every few days.
There are many similar situations here at $WORK. I guess you could say I am the very security department you speak of. Hypothetically if I were in this scenario you speak of (I’m not, but I am in similar), the thing is we can convince people to install updates - they’re automatic these days anyways.
But we cannot convince internal dev teams that are understaffed and overworked (and cough under qualified) to create some perfect ACLs in the decade old internal crapware they’ve been handed to maintain.
We tell them it’s vulnerable, add it to the ever-growing risk register, make sure assurance is aware, and move on to the next crap heap.
There are just not enough “resources” (i.e. staff, people) around sometimes.
Apparently bloom filters can fit in a cache line. That’s pretty interesting. Any particular implementations?
The paper doesn’t say how big their Bloom filters are. You can’t fit a very large bloom filter in a cache line, certainly not large enough to be useful for speeding up a slow join. As a rough estimate you need about 1 byte of filter for each item for a 1% false positive rate. https://hur.st/bloomfilter/
I think they mean that the signature fits in a cache line. In the source they say
And based on the return type of
filterHash, I think they are using 64-bit signatures.I think they mean register of the sqlite virtual machine (see here).
What do you mean by “signature”?
The (lossy) hash of each row. Just borrowing terminology from the postgres documentation.
Sometimes people also call them “fingerprints”. Personally, I find “truncated hash” the most precise.
In point of fact, while most of the time people truncate to some round number of bytes (8, 16, 32, 64 bits, really) because this is very cheap, but it’s actually barely more expensive to truncate to any smaller than 64 number of bits with masks & shifts. Doing so with a back-to-back array of such truncated b-bit numbers in a hash table structure (such as robin hood linear probing where a hit or miss will likely only induce a nearly guaranteed single cache line miss) lets you construct a filter that has many fewer CPU cache misses than a Bloom filter. It’s really just an existence set table of fingerprints/signatures/truncated hashes. There are some example numbers and a “calculus analysis” at the bottom of the test program, but it’s all only a few hundred lines of Nim and you could re-do a test in your favorite ProgLang.
Of course, as usual “everything all depends”, but Bloom filters may save you a small multiple (2..4X) of space, but probably cost a much bigger multiple (10X) of cache-miss time - unless that space savings is right at some cache size cliff for your workloads. There are also fancier things you can do, like Cuckoo filters { though that will tend to be harder to get down to just 1 cache miss from 1.2 to 1.4 or so and tend to rely upon much stronger hashes to avoid bad failure modes }.
“Partitioned bloom filters” allow all hash lookups for a given key to span a small region of memory (say a page, or several cache lines). In practice, you just use a few bits from the key hash to pick a partition where you evaluate the bloom filter’s hash functions.
Not exactly a cache, but a somewhat similar case https://bitcoinops.org/en/topics/transaction-bloom-filtering/
One question for the author. The series is called “Pragmatic Category Theory”, but there is no category theory? (e.g. the definition of Semigroup is given in terms of elements instead of morphisms.)
Also, two notes. 1. Arguably this is incorrect:
Associativity means that, for MapReduce on
Nelements, if the reduce step is going to takelog(N)time instead ofN. Commutativity would let you start the reduce step before the map step has finished, but whether that is an improvement depends on how variable the map step is. Often you can compensate for variance in the completion time of the map step by having more jobs than processors/nodes, and dynamically filling nodes that finish early with extra jobs.log(N)time on parallel computers, but also prefix sums.With P2169 there is also now a much saner
std::ignorethat works in structured bindings:Which also leads to this fun semantical difference between the two:
(2) will throw away the return value, whereas (1) will stay alive for the remainder of the current scope.
The slightly cursed reason is that for backwards compatibility,
_acts the same as a variable name, except with C++26 you can now redeclare as many_as you want. But it’s actually really useful for RAII wrappers where you only care about the side effects, e.g.:Oh, I need to put a pin in this! Recently there was an article complaining about an identical semantic difference in Rust, from the angle of “this isn’t spelled out in the spec”
It would be interesting to compare corresponding wordings from the Rust reference and C++ standard! Would the C++ docs be really more clear than the Rust ones?
I would love to see a link to that Rust article!
For the C++ side, you can see the wording for the draft here.
https://lobste.rs/s/zmz8zv/rust_needs_official_specification
Seems that
_works like in Python.Searched the document for the word “fast” and it did not turn up.
The thing where dev tooling for other languages is written in Rust and then becomes much much faster… Somebody maybe should be doing that for Rust itself.
The goal of this project is to make it easier to integrate Rust into existing C and C++ projects that are traditionally built with
gcc(Linux being the most prominent).Given that
gccandclangare comparable at compilation speed, withclangmaaybe being slightly better, I wouldn’t expect this project to improve compilation speed, nor I believe it should be within scope for this project.Wow, newlines in filenames being officially deprecated?!
Re. modern C, multithreaded code really needs to target C11 or later for atomics. POSIX now requires C17 support; C17 is basically a bugfix revision of C11 without new features. (hmm, I have been calling it C18 owing to the publication year of the standard, but C23 is published this year so I guess there’s now a tradition of matching nominal years with C++ but taking another year for the ISO approval process…)
Nice improvements to make, and plenty of other good stuff too.
It seems like both C and POSIX have woken up from a multi-decade slumber and are improving much faster than they used to. Have a bunch of old farts retired or something?
Even in standard naming they couldn’t avoid off by 1 error. ¯\_(ツ)_/¯
I believe the date is normally they year when the standard is ratified. Getting ISO to actually publish the standard takes an unbounded amount of time and no one cares because everyone works from the ratified draft.
As a fellow brit, you may be amused to learn that the BSI shut down the BSI working group that fed WG14 this year because all of their discussions were on the mailing list and so they didn’t have the number of meetings that the BSI required for an active standards group. The group that feeds WG21 (of which I am a member) is now being extra careful about recording attendance.
Unfortunately, there were a lot of changes after the final public draft and the document actually being finished. ISO is getting harsher about this and didn’t allow the final draft to be public. This time around people will probably reference the “first draft” of C2y instead, which is functionally identical to the final draft of C23.
There are a bunch of web sites that have links to the free version of each standard. The way to verify that you are looking at the right one is
Sadly I can’t provide examples because www.open-std.org isn’t working for me right now :-( It’s been unreliable recently, does anyone know what’s going on?
Or just look at cppreference …
https://en.cppreference.com/w/cpp/language/history
https://en.cppreference.com/w/c/language/history
For C23, Cppreference links to N3301, the most recent C2y draft. Unfortunate that the site is down, so we can’t easily check whether all those June 2024 changes were also made to C23. The earlier C2y draft (N3220) only has minor changes listed. Cppreference also links to N3149, the final WD of C23, which is protected by low quality ZIP encryption.
I think most of open-std is available via the Archive, e.g. here is N3301: https://web.archive.org/web/20241002141328/https://open-std.org/JTC1/SC22/WG14/www/docs/n3301.pdf
For C23 the documents are
I think for C23 the final committee draft was last year but they didn’t finish the ballot process and incorporating the feedback from national bodies until this summer. Dunno how that corresponds to ISO FDIS and ratification. Frankly, the less users of C and C++ (or any standards tbh) have to know or care about ISO the better.
Re modern C: are there improvements in C23 that didn’t come from either C++ or are standardization of stuff existing implementations have had for ages?
It’s best to watch the standard editor’s blog and sometimes twitter for this information.
https://thephd.dev/
https://x.com/__phantomderp
I think the main ones are
_BitInt,<stdbit.h>,<stdckdint.h>,#embedGenerally the standard isn’t the place where innovation should happen, though that’s hard to avoid if existing practice is a load of different solutions for the same problem.
They made
realloc(ptr, 0)undefined behaviour. Oh, sorry, you said improvements.I learned about this yesterday in the discussion of rebasing C++26 on C23 and the discussion from the WG21 folks can be largely summarised as ‘new UB, in a case that’s trivial to detect dynamically? WTF? NO!’. So hopefully that won’t make it back into C++.
realloc(ptr,0) was broken by C99 because since then you can’t tell when NULL is returned whether it successfully freed the pointer or whether it failed to malloc(0).
POSIX has changed its specification so realloc(ptr, 0) is obsolescent so you can’t rely on POSIX to save you. (My links to old versions of POSIX have mysteriously stopped working which is super annoying, but I’m pretty sure the OB markers are new.)
C ought to require that malloc(0) returns NULL and (like it was before C99) realloc(ptr,0) is equivalent to free(ptr). It’s tiresome having to write the stupid wrappers to fix the spec bug in every program.
Maybe C++ can fix it and force C to do the sensible thing and revert to the non-footgun ANSI era realloc().
98% sure some random vendor with a representative via one of the national standards orgs will veto it.
In cases like this it would be really helpful to know who are the bad actors responsible for making things worse, so we can get them to fix their bugs.
Alas, I don’t know. I’ve just heard from people on the C committee that certain things would be vetoed by certain vendors.
Oh good grief, it looks like some of the BSDs did not implement C89 properly, and failed to implement realloc(ptr, 0) as free(ptr) as they should have
FreeBSD 2.2 man page / phkmalloc source
OpenBSD also used phkmalloc; NetBSD’s malloc was conformant with C89 in 1999.
It was already UB in practice. I guarantee that there are C++ compilers / C stdlib implementations out there that together will make 99% of C++ programs that do
realloc(ptr, 0)have UB.Not even slightly true. POSIX mandates one of two behaviours for this case, which are largely compatible. I’ve seen a lot of real-world code that is happy with either of those behaviours but does trigger things that are now UB in C23.
But POSIX is not C++. And
realloc(ptr, 0)will never be UB with a POSIX-compliant compiler, since POSIX defines the behavior. Compilers and other standards are always free to define things that aren’t defined in the C standard.realloc(ptr, 0)was UB “in practice” for C due to non-POSIX compilers. They could not find any reasonable behavior for it that would work for every vendor. Maybe there just aren’t enough C++ compilers out there for this to actually be a problem for C++, though.In general, POSIX does not change the behaviour of compiler optimisations. Compilers are free to optimise based on UB in accordance with the language semantics.
Then make it IB, which comes with a requirement that you document what you do, but doesn’t require that you do a specific thing, only that it’s deterministic.
No, the C++ standards committee just has a policy of not introducing new kinds of UB in a place where they’re trivially avoidable.
C23 does not constrain implementations when it comes to the behavior of
realloc(ptr, 0), but POSIX does. POSIX C is not the same thing as standard C. Any compiler that wants to be POSIX-compliant has to follow the semantics laid out by POSIX. Another example of this is function pointer tovoid *casts and vice versa. UB in C, but mandated by POSIX.They introduced lots of new UB in C++20, so I don’t believe this.
It doesn’t really list the integers in question, but an interesting article anyway!
“Integer multiplies” in this context means “x64 instruction that treats register content as integers and, among other things, multiplies them”.
When I’ve tried emulating X86-64 on Apple Silicon using QEMU it’s been incredibly slow, like doing
lstook like 1-2 seconds. So if these fine people manage to emulate games then I’m very impressed!QEMU emulation (TCG) is very slow! Its virtue is that it can run anything on anything, but it’s not useful for productivity or gaming. I used to use it to hack around a FEX RootFS as root, and even just downloading and installing packages with
dnfwas excruciatingly slow.Emulators that optimize for performance (such as FEX, box64, and Rosetta, and basically every modern game console emulator too) are in a very different league. Of course, the tradeoff is they only support very specific architecture combinations.
As @lina says, QEMU is general. It works a few instructions at a time, generates an IR (TGIR, which was originally designed for TCC, which was originally an IOCC entry), does a small amount of optimisation, and emits the result.
Rosetta 2 works on much larger units but, more importantly, AArch64 was designed to support x86 emulation and it can avoid the intermediate representation entirely. Most x86-64 instructions are mapped to 1-2 instructions. The x86-64 register file is mapped into 16 of the AArch64 registers, with the rest used for emulator state.
Apple has a few additional features that make it easier:
FEX doesn’t (I think) take advantage of these (or possible does but only on Apple hardware?), but even without them it’s quite easy (as in, it’s a lot of engineering work, but each bit of it is easy) to translate x86-64 binaries to AArch64. Arm got a few things wrong but both Apple and Microsoft gave a lot of feedback and newer AArch64 revisions have a bunch of extensions that make Rosetta 2-style emulation easy.
RISC-V’s decision to not have a flags register would make this much harder.
There are two more hardware features: SSE denormal handling (FTZ/DAZ) and a change in SIMD vector handling. Those are standardized as FEAT_AFP in newer ARM architectures, but Apple doesn’t implement the standard version yet. The nonstandard Apple version is not usable in FEX due to a technicality in how they implemented it (they made the switch privileged and global, while FEX needs to be able to switch between modes efficiently, unlike Rosetta, and calling into the kernel would be too slow).
FEX does use TSO mode on Apple hardware though, that’s by far the biggest win and something you can’t just emulate performantly if the hardware doesn’t support it. Replacing all the loads/stores with synchronized ones is both slower and also less flexible (fewer addressing modes) so it ends up requiring more instructions too.
Dumb question: is there a reason not to always ahead-of-time compile to the native arch anyway? (i believe that is what RPCS3 does, see the LLVM recompiler option).
As I understand it, that’s more or less what Rosetta 2 does: it hooks into mmap calls and binary translates libraries as they’re loaded. The fact that the mapping is simple means that this can be done with very low latency. It has a separate mode for JIT compilers that works more incrementally. I’m impressed by how well the latter works: the Xilinx tools are Linux Java programs (linked to a bunch of native libraries) and they work very well in Rosetta on macOS, in a Linux VM.
The Dynamo Rio work 20 or so years ago showed that JITs can do better by taking advantage of execution patterns. VirtualPC for Mac did this kind of thing to avoid the need to calculate flags (which were more expensive on PowerPC) when they weren’t used. In contrast, Apple Silicon simp,y makes it sufficiently cheap to calculate the flags that this is not needed.
Rosetta does do this, but you have to support runtime code generation (that has to be able to interact with AOT generated code) at minimum because of JITs (though ideally an JIT implementation should check to see if it is being translated and not JIT), but also if you don’t support JIT translating you can get a huge latency spike/pause when a new library is loaded.
So no matter what you always have to support some degree of runtime codegen/translation, so it’s just a question of can you get enough of a win from an AOT as well as the runtime codegen to justify the additional complexity.
Ignore the trashy title, this is actually really neat. They offload IO to separate threads which means the main thread now gets commands in batches; so the main thread can interleave the data structure traversals for multiple keys from the batch, so it can make much better use of the memory system’s concurrency.
That’s similar to the famous talk by Gor Nishanov about using coroutines to interleave multiple binary searches.
Can’t ignore the trashy title, it’s spam.
I wonder whether anyone has tried to fold system load into the concept of time. I.e. time flows slower if the system is under load from other requests.
It sounds like you’re looking for queue management algorithms, such as CoDel, PIE or BBR.
Slightly surprised there’s no mention of the seemingly-similar Prolly Tree. (At least, not in the first half of the paper…)
Oops, that’s a total mistake on our part! I’ve (re)added a reference (it was lost along the way somehow) and included it back in our latter discussion. Thanks for pointing out that omission!
That’s another very cute B-tree-like data structure. Thanks for the pointer!
The funny thing about the airline example is that airlines do overcommit tickets, and they will ask people not to board if too many people show up…
If anything, this speaks more how badly X is for modern use cases than anything. There are lots of reasons that the locker can die (not only for OOM), but the fact that this can “unlock” your desktop is the actual absurd part.
This would be impossible in Wayland, for example.
If wayland was good, we’d all be using it by now. It has had so, so much time to prove itself.
My experience with wayland has been:
I think the creators of Wayland have done us a disservice, by convincing everyone it is the way forward, while not actually addressing all the use cases swiftly and adequately, leaving us all in window manager limbo for two decades.
Maybe my opinion will change when I upgrade to Plasma 6. Although, if you search this page for “wayland” you see a lot of bugs…
Your information might be outdated, not only is the default in Plasma 6 and GNOME 46, but they’ve actually worked to allow compiling them with zero Xorg support. I believe a lot of distros are now not only enabling it by default but have express plans to no longer ship Xorg at all outside of xwayland.
Keep in mind that I gave Wayland as an example in how they should have fixed the issue (e.g.: having a protocol where if the locker fails, the session is just not opened to everyone).
My experience with Wayland is that it works for my use cases. I can understand the frustration of not working for yours (I had a similar experience 5 years ago, but since switching to Sway 2 years ago it seems finally good enough for me), but this is not a “Wayland is good and X is bad”, it is “X is not designed for modern use cases”.
Yeah I realize I’m changing the topic. Your point stands.
This is a thread about OS kernels and memory management. There are lots of us who use Linux but don’t need a desktop environment there. With that in mind, please consider saving the Wayland vs X discussion for another thread.
Lone nerd tries to stop nerd fight. Gets trampled. News at 11
By the same logic, we could argue that:
None of the above examples are perfect, I just want to insist that path dependence is a thing. Wayland, being so different than X, introduced many incompatibilities, so it had much inertia to overcome right from the start. People need clear, substantial, and immediate benefits to consider paying even small switching costs, and Wayland’s are pretty significant.
I think the logic works fine:
I think Wayland sucked a lot. And it has finally started to be good enough to get people to switch. And I’m mad that it took so long.
Except on the desktop. You could argue it’s just one niche among many, but it remains a bloody visible one.
Hmm, you’re effectively saying that no layout is very good, and all have extremely low value-add… I’m not sure I believe that, even if we ignore chording layouts that let stenotypists type faster than human speech: really, we can’t do significantly better than what was effectively a fairly random layout?
I call bullshit on this one. Last time I went there it was all about miles and inches and square feet. Scientists may use it, but day to day you’re still stuck with the imperial system, even down to your standard measurements: wires are gauged, your wood is 2 by 4 inches, even your screws use imperial threads.
Oh, and there was this Mars probe that went crashing down because of an imperial/metric mismatch. I guess they converted everything to metric since then, but just think of what it took to do that even for this small, highly technical niche.
That being said…
I can believe it did (I don’t have a first hand opinion on this).
Just on a detail, I believe it doesn’t matter to most people what their keyboard layout is, and I’ve wasted a lot of time worrying about it. A basically random one like qwerty is just fine. That doesn’t affect your main point though, especially since the example of stenography layouts is a slam dunk. Many people still do transcription using qwerty, and THAT is crazy path-dependence.
Linux isn’t very good on the desktop, speaking as a Linux desktop user since 2004.
The display protocol used by a system using the bazaar style of development is not a question of design, but that of community support/network effect. It can be the very best thing ever, if no client supports it.
Also, the creators of Wayland are the ex-maintainers of X, it’s not like they were not familiar with the problem at hand. You sometime have to break backwards compatibility for good.
Seems to be happening though? Disclaimer, self reported data.
The other survey I could find puts Wayland at 8%, but it dates to early 2022.
Sure, it’s good to be finally happening. My point is if we didn’t have Wayland distracting us, a different effort could have gotten us there faster. It’s always the poorly executed maybe-solution that prevents the ideal solution from being explored. Still, I’m looking forward to upgrading to Plasma 6 within the next year or so.
Designing Wayland was a massive effort, it wasn’t just the Xorg team going “we got bored of this and now you have to use the new thing”, they worked very closely with DE developers to design something that wouldn’t make the same mistakes Xorg did.
Meanwhile, Arcan is basically just @crazyloglad and does a far better job of solving the problems with X11 than Wayland ever will.
The appeal to effort argument in the parent comment is just https://imgur.com/gallery/many-projects-GWHoJMj which aptly describes the entire thing.
Being a little smug, https://www.divergent-desktop.org/blog/2020/10/29/improving-x/ has this little thing:
I think we are at 4 competing icon protocols now. Mechanism over policy: https://arcan-fe.com/2019/05/07/another-low-level-arcan-client-a-tray-icon-handler/
The closing bit:
That battle is quickly being lost.
The unabridged story behind Arcan should be written down (and maybe even published) during the coming year or so as the next thematic shift is around the corner. That will cover how it’s just me but also not. A lot of people has indirectly used the thing without ever knowing, which is my preferred strategy for most things.
Right now another fellow is on his way from another part of Europe for a hackaton in my fort out in the wilderness.
Arcan does look really cool in the demos, and I’d like to try it, but last time I tried to build it I encountered a compilation bug (and submitted a PR to fix it) and I’ve never been able to get any build of it to give me an actually usable DE.
I’m sure it’s possible, but last time I tried I gave up before I worked out how.
I also wasn’t able to get Wayland to work adequately, but I got further and it was more “this runs but is not very good” instead of “I don’t understand how to build or run this”.
Maybe being a massive effort is not actually a good sign.
Arguably Wayland took so long because it decided to fix issues that didn’t need fixing. Did somebody actually care about a rogue program already running on your desktop being able to capture the screen and clipboard?
edit: I mean, I guess so since they put in the effort. It’s just hard for me to fathom.
Was it a massive effort? Its development starting 20 years ago does not equate to a “massive effort,” especially considering that the first 5 years involved a mere handful of people working on it as a hobby. The remainder of the time was spent gaining enough network effect, rather than technical effort.
Sorry, but this appeal to the effort it took to develop wayland is just embarrassing.
Vaporware does not mean good. Au contraire, it usually means terrible design by committee, as is the case with wayland.
Besides, do you know how much effort it took to develop X?
It’s so tiring to keep reading this genre of comment over and over again, especially given that we have crazyloglad in this community utterly deconstructing it every time.
This is true but I do think it was also solved in X. Although there were only a few implementations as it required working around X more than using it.
IIRC GNOME and GDM would coordinate so that when you lock your screen it actually switched back to GDM. This way if anything started or crashed in your session it wouldn’t affect the screen locking. And if GDM crashed it would just restart without granting any access.
That being said it is much simpler in Wayland where the program just declares itself a screen locker and everything just works.
Crashing the locker hasn’t been a very good bypass route in some time now (see e.g. xsecurelock, which is more than 10 years old, I think).
xlock, the program mentioned in the original mailing list thread, is literally 1980s software.X11 screen lockers do have a lot of other problems (e.g. input grabbing) primarily because, unlike Wayland, X11 doesn’t really have a lock protocol, so screen lockers mostly play whack-a-mole with other clients. Technically Wayland doesn’t have one, either, as the session lock protocol is in staging, but I think most Wayland lockers just go with that.
Unfortunately, last time I looked at it Wayland delegated a lot of responsibilities to third-parties, too. E.g. session lock state is usually maintained by the compositor (or in any case, a lot of ad-hoc solutions developed prior to the current session lock protocol did). Years ago, “resilient” schemes that tried to restart the compositor if it crashed routinely suffered from the opposite problem: crashing the screen locker was fine, but if the OOM reaped the compositor, it got unlocked.
I’d have thought there was a fairly narrow space where CPU SIMD mattered in game engines. Most of the places where SIMD will get an 8x speedup, GPU offload will get a 1000x speedup. This is even more true on systems with a unified memory architecture where there’s much less cost in moving data from CPU to GPU. It would be nice for the article to discuss this.
Early SIMD (MMX, 3DNow, SSE1) on mainstream CPUs typically gave a 10-30% speedup in games, but then adding a dedicated GPU more than doubled the resolution and massively increased the rendering quality.
An earlier article from the same author mentions that to switch to GPU they’d have to change algorithm:
The big advantage of CSV and TSV is that you can edit them in a text editor. If you’re putting non-printing characters in as field separators you lose this. If you don’t need that property then there are a lot of better options up to, and including, sqlite databases.
Obvious solution is to put non-printing characters on the keyboard
…APL user?
Close; he uses J.
And after some time, people would start using them for crazy stuff that no one anticipated and this solution wouldn’t work anymore 👌
Though I suppose that it has the advantage of not coming with any meaning pre-loaded into it. Yet. If we use these delimiter tokens for data files then people will be at least slightly discouraged from overloading them in ways that break those files.
Also,
grepworks on both CSV and TSV, which is very useful … it won’t end up printing crap to your terminal.diffandgit mergecan work to a degree as well.Bytes and text are essential narrow waists :) I may change this to “M x N waist” to be more clear.
A text editor or
grepis one of M, and TSV is one of N.If you have an arbitrary DSV, you’re not really in that world any more – now you need to write your own tools.
FWIW I switched from CSV and TSV, because the format is much simpler. As far as I can tell, there is exactly one TSV format, but multiple different CSV formats in practice. There’s less room for misunderstanding.
Do you? I believe
awkandtrdeal with it just fine. E.g.trto convert from DSV to TSV for printing:and
awkfor selecting single columns, printing it TSV:Also, I think
grepshouldn’t have any problems either, it should pass the non-printable characters as-is?grep(GNU grep 3.11) does pass the non-printables through but doesn’t recognise\x1eas a line separator (and has no option to specify that either) which means you get the whole wash of data whatever you search for.You’d have to pipe it through
trto swap\x1efor\nbeforegrep.Fair, I didn’t know. You can use
awkas agrepsubstitute though.It’s cool that that works, but I’d argue it is indeed a case of writing your own tools! Compare with
And there are more tools, like
headandtailandshuf.xargs -0andfind -print0actually have the same problem – I pointed this out somewhere on https://www.oilshell.orgIt kind of “infects” into
head -0,tail -0,sort -0, … Which are sometimes spelledsort -z, etc.The Oils solution is “TSV8” (not fully implemented) – basically you can optionally use JSON-style strings within TSV cells.
So
head tail grep cat awk cutwork for “free”. But if you need to represent something with tabs or with\x1f, you can. (It handles arbitrary binary data, which is a primary rationale for the J8 Notation upgrade of JSON - https://www.oilshell.org/release/latest/doc/j8-notation.html)I don’t really see the appeal of
\x1fbecause it just “pushes the problem around”.Now instead of escaping tab, you have to escape
\x1f. In practice, TSV works very well for me – I can do nearly 100% of my work without tabs.If I need them, then there’s TSV8 (or something different like sqlite).
headcan be done inawk, the rest likely require to output to zero on newline terminated output and pass it to the zero version of themselves. With both DSV and zero-terminated commands, I’d make a bunch of aliases and call it a day.I guess that counts as “writing your own tools”, but I end up turning commonly used commands into functions and scripts anyway, so I don’t see as a great burden. I guess to each their workflow.
The other major advantage is the ubiquity of the format. You lose a lot of tools if you aren’t using the common formats.
Interesting stuff for certain types of number-crunching nerds. It’s impressive what AMD’s pulled off here.
Or if you just want an ironic laugh rather than anything too useful, open it up and ^F VP2INTERSECT.
Not just number crunching. Double shuffles is a big deal for certain kinds of string processing.