I really enjoyed reading this. There are good summaries at the end of each section that both authors agree represent their viewpoints.
Of course, this transcript resulted in me agreeing with Ousterhout entirely, and being puzzled by Martin’s takes on the points where they disagree. I’m curious to hear if that’s because I was already primed by anti-Clean Code rhetoric in advance. Do fans of Clean Code read this and think it’s fair and accurate?
The main feeling I have is that Uncle Bob is more of an evangelist than a technologist. He advocated extremely strongly for TDD, for SOLID, for short functions, for self-documenting code… And back when I hadn’t tried these things at all, his evangelism was part of what convinced me to make the attempt, and I observed my code quality improve significantly as a result. It’s certainly possible to go too far, and I’ve done that, too… But I don’t know if I’d have as much of a sense for the limitations of the approach if I hadn’t attempted the dogma first.
I too, agree that I find Uncle Bob’s reasoning puzzling. The two hardest problems in Computer Science is cache invalidation and naming things, and Uncle Bob wants to create yet more names?
Uncle Bob’s arguments against comments is also puzzling—one place were comments are gold are for workarounds for bugs in code you don’t control, or to provide a reference where an algorithm was described (I notice that neither one of them mentioned the Knuth article describing the primes generator in a comment).
The Reddit thread has some commenters who side with Martin more than they do Ousterhout. Others also described Ousterhout as different shades of “not cool”, but that may be them siding with Martin more.
Redditors tend to like Bob Martin’s personality or pseudo-“scientific method” approach, and so I often see TDD and Martin defended with cult-like ferociousness.
Lobsters tend to like the little mermaid’s personality or pseudo-“fish” approach, and so I often see Tail Driven Development and Ariel defended with cult-like ferociousness.
One problem with this critique of Clean Code is that Uncle Bob presents a rule of thumb, which Ousterhout goes on to interpret as a law, and critiques it as such.
Before 1.0, I wrote a linear types RFC, so I’m personally annoyed by how influential this article seems to have been. The costs are, as Aria admits, not as bad as they had thought when they started the article (rewriting APIs would certainly be highly annoying. I’m reasonably certain it could also be significantly automated, but the ecosystem churn could be pretty horrid. I also no longer think linear types can be reconciled with panic=unwind, but I always strongly preferred panic=abort anyway), and the benefits of linear types seem to have gotten stronger over time (they could solve perhaps the biggest foot-gun in async Rust, preventing implicit cancellation when some operations are expected to run to completion). A code pattern shown in the article (quoting)
let mut token = None;
while cond1 {
if cond2 {
token = Some(step1()); // ERROR 1: assignment to already initialized must-use value
}
if cond3 {
step2(token.take().unwrap());
}
}
// ERROR 2: must-use value must be used
isn’t something you’d want to encourage if you’re using linear types, in the same way that using bare pointers – even when provably safe! – isn’t something the Rust language encourages. (Rust can be comfortable ruling out some classes of correct programs in order to encourage more compiler-verifiable correctness.) Being able to array[i] = val; wouldn’t work, but you wouldn’t want it to work. The point of linear types is to force users to avoid letting values fall on the floor. old_val = std::mem::replace(&mut array[i], val); verifies that they’ve done so. The pain of linear types is the benefit.
I continue to think Rust would have been better off with linear types, but I never had the time I needed when I needed to push the argument forward, and now I feel like the ship has sailed. Oh well.
Because lexical shadowing cause lots of bugs in C and C++, hence D and Zig forbid it.
comptime is probably not as interesting as it looks
You would have to compare comptime to the swath of C++ templates it replaces.
Memory safety is highly underestimated
Because it turns out memory safety is highly overrated by some, and is only one of the desirable properties of a language, and probably behind learnability and productivity in terms of priority.
Because lexical shadowing cause lots of bugs in C and C++, hence D and Zig forbid it.
The point in the post is that there are languages that solved it.
comptime is probably not as interesting as it looks
You would have to compare comptime to the swath of C++ templates it replaces.
The post directly refers to C++ templates and doesn’t think it’s a valid solution.
comptime is one of those features that feel like a candy, but has a bitter taste to it. It feels like C++ templates back again, and the community will not address that.
Sure, people may disagree, but please be more specific. The post particularly lays out the point where they feel like they are recommitting previous mistakes.
Memory safety is highly underestimated
Because it turns out memory safety is highly overrated by some, and is only one of the desirable properties of a language, and probably behind learnability and productivity in terms of priority.
Memory safety is a tangible objective with a reasonably good definition. I have yet (as a trainer and someone very interested in that space) to find a tangible definition of learnability and productivity. Productivity is often mistaken with familiarity, learnability is often very biased to what the speaker knows.
Capers Jones FP are metrics of business value of a program. It’s a good productivity metric the whole process of programming. Sometimes, you can find productivity metrics that give you the relationship of FP to LOC on a programming language basis, but even this is very far away from a measure of programmer productivity on a language and are influenced by other effects, like surrounding domain or domain expertise of the programmer.
I looked at the 259 pages document you linked quite hard. The term “Learnability” is mentioned once. It seems to be a very good meta-study, however, reading this:
In this study, I have deliberately avoided taking any position as to the conclusions one should make regarding the actual design decisions. For example, I do not offer any analysis on whether static or dynamic typing is better. That task belongs properly to focused systematic literature reviews and is beyond the scope of any mapping study.
I’m not sure it supports your point, especially not at a whole language scale.
Because it turns out memory safety is highly overrated by some, and is only one of the desirable properties of a language, and probably behind learnability and productivity in terms of priority.
To me, memory safety is like the baseline, the bare minimum. Anything without it simply isn’t worth using for any reason.
When people say things like your parent, what they always mean is “by default.” Even programs written in GC’d languages can have memory safety issues via FFI, but we still call them memory safe. Unsafe is like an FFI to a (slightly larger) language.
Maybe, but I really wish engineers, at least, wouldn’t talk that way. Engineering almost always involves trade-offs, and the maximalist stance (“bare minimum”, “for any reason”) make those harder to identify and discuss. For example, the follow-up comment is much more interesting (IMO) than the one I responded to.
So, rereading your comment, I think I missed a word:
Is unsafe Rust not worth using for any reason?
Emphasis mine. I thought you were saying (like I hear often on the internet) that the existence of unsafe Rust means that Rust is not a memory safe language, and was trying to talk about that definition, not about the judgement.
I think doing something potentially unsafe can be worth it, if you are willing to pay the cost of spending a lot of time making sure the code is correct. For large C, C++, or Zig code bases, this isn’t really viable. Easily 10xes your development time.
I see unsafe Rust the same way I see the code generating assembly inside a compiler. Rustc or the Zig compiler can contain bugs that make them emit incorrect code. Same with Go or javac or the JIT inside the HotSpot VM. So you are never free of unsafe code. But the unsafety can be contained, be placed inside a box. You can design an interface around unsafety that doesn’t permit incorrect use. Then you can spend a lot of time proving it correct. And after that, you can write an infinite amount of code depending on that unsafe module without worry!
Essentially, I view unsafe in Rust as a method of adding a new primitive to the language, the same way you could add a new feature to Python by writing C code for the Python interpreter. The novelty is just that you write the new feature in the same language as you use the feature in.
If 1% of your code is unsafe modules with safe interfaces, and you 10x the cost of developing those parts by being very careful and proving the code correct, your overall development cost only went up by 9%. That’s a much lower cost, and thus what I mentioned at the start, being willing to pay the cost of doing it right, becomes much smaller. It will be worth it in many more situations.
That got a little rambly but I hope I communicated my stance correctly.
For large C, C++, or Zig code bases, this isn’t really viable. Easily 10xes your development time.
An effect that large probably requires poor coding practices. Stuff like global variables, shared state between threads, lack of separation between effects and computation… When you make such a mess of your code base, OK, you need strong static guarantees, like “no arbitrary code execution is ever possible no matter how buggy my program is” — that is, memory safety. Now don’t get me wrong, I love static guarantees. I love when my compiler is disciplined so I don’t have to be. But even then you want a nice and maintainable program.
My hypothesis, is that if you do the right thing, that is, properly decomposing your program into deep modules and avoid cutting corners too often, is that memory safety doesn’t boost your productivity nearly as much as a whopping 10x. Especially if you go for some safety, like just adding bounds checks.
I will concede that in my experience, we rarely do the right thing.
I don’t know, when Java introduced a wider audience to GC, software development has definitely experienced a huge boost. Maybe 10x just in itself is not a correct figure. But you don’t only ship software, you also maintain it, fix bugs etc - and here, guaranteed memory safety can easily have an order of magnitude advantage, especially as the software grows.
The important point, to me, is reducing the potential for harm, and there are many ways to do that. For example, memory unsafety can be mitigated by running code in a sandbox of some kind (such as by compiling and running in a WASM environment), or by only running it against trusted input in simple scenarios (my local programs I use for experimenting), or by aggressively using fuzzers and tools like valgrind, ASAN, LSAN, and static analyzers like Coverity. All of these are valid approaches for mitigating memory unsafe code, and depending on my risk model, I can be OK with some level of memory unsafety.
Further, there are domains in which harm is more likely to come from too-loose type systems than it is from memory unsafety. I can implement my CRUD app in pure memory-safe Python, but that won’t protect me from a SQL injection… Something that enforces that the database only interacts with SanitizedString’s might. On the other hand, in exploratory development, too strong of a type system might slow you down too much to maintain flow.
Anyways, I generally don’t like maximalist stances. If multiple reasonably popular approaches exist in the wild, there are more than likely understandable motivations for each of them.
Can you explain what exactly memory safety means to you?
I’m writing a lot of code that is 100% “memory safe” (in my view), but would not be possible to be written in memory safe language conveniently (like Rust).
They probably mean the definition you get if you stop at the summary of the Wikipedia page, which is easy to misinterpret as saying that they have to be enforced by the toolchain (in which I include things like linters and theorem provers) or VM having compile-time or runtime guard rails which cannot suffer from false negatives.
That’s why i’m asking. The “summary definition” of Wikipedia isn’t really helpful at all.
The chapter “Classification of memory safety errors” is much better in listing potential problems, but those are language dependent.
Zig (in safe modes!) will have checks against Buffer overflow and Buffer over-read, and has increased awareness for Uninitialized variables, as undefined is an explicit decision by the programmer. This doesn’t mean Undefined variables are a non-problem, but it’s less likely as you have to actively decide not to initialize.
<pedantic>Use-after-free, Double-free, Mismatched-free are not possible in Zig either, as the language has no concept of memory allocation.</pedantic> This doesn’t mean a userland allocator won’t suffer from this problems, but the GeneralPurposeAllocator has mitigations to detect and prevent these problems.
In C, you actually have a problem with the built-in support for malloc and free. Clang, for example, knows that malloc is special, and can be considered “pure” and returning a unique object. This is how the function is defined in the C standard.
This yields interesting problems with the assumptions of the programmers and what the compiler actually does.
This is because malloc will always return a pointer to a new object, so it can never alias with another result of mallocever. Also malloc won’t have side effects besides yielding a new object. So eliding the invocation of malloc is fine, as we don’t use the result.
If we do remove the glob = p1; line, the compiler will just make the function return true; and won’t emit any calls to mallocat all!
As Zig has no builtin allocator in the language (but it’s a userland concept inside the standard library), the compiler can’t make the assumptions above and thus actually integrate the checks.
All the Java tools (gradle,maven, bazel): I’m sorry, I can’t.
Buck2 is not… I found Bazel fairly impenetrable when I tried using it for something with a fairly complicated build, possibly because so much of what I needed to understand was built into the system itself. If I were evaluating something similar today, I’d look a lot closer at buck2. I don’t know if it would work for the author, though.
In an old team, I argued that adding a dependency is, in large part, a political decision: In addition to the technical implications, one should ask if the incentives of the dependency’s stewards are aligned with our own? If not, while things may look good now, there may be trouble down the line. (See also Platform as a Reflection of Values). If incentives and tech are well aligned, then it’s often a mistake not to take on the dependency.
I was reminded of a data structure @trishume described here; I’m honestly still not completely clear on the difference between what Tristan described and Fenwick trees… That is, by following the breadcrumbs from Tristan’s post, I found this alternative write-up . Figure 1 in that paper looks very similar to Tristan’s graphic…
This article is making it sound more profound than it really is.
Yes, as the performance of IO deviced increased, the latency of IO became a problem, making it hard to saturate the bandwidth. The simplistic (but convenient) abstraction of the blocking IO become increasingly insufficient.
Almost everything in computing eventually gains async interfaces. From CPU instruction execution pipelining, CPUs communicating asynchronously over coherence bus, with each other or memory, IO device interfaces, remote systems, threads, processes. Asynchronicity due to latency is just the fact of physical world.
But notably this model has a downside - reasoning about state (especially atomicy) of things is much harder and programming model and architecture needs to be redone in a new paradigm.
The idea of sending all your systems calls in the form of messages to what feels like a little mini server feels completely different than just making an async function calls.
You’re right about the change in programming model - though of course it’s a very common and successful one to use between machines. I was surprised it was this univesal.
The other thing that pushes towards synchronous interfaces (besides making reasoning easier) is latency sensitivity. The things you’re describing as gaining async interfaces are (generally) making a trade-off to sacrifice some latency in order to gain higher bandwidth, but the opposite trade-off may make sense, sometimes. For example, persistent storage with very low latency (e.g. optane memory, if it weren’t cancelled) likely prefers to be used synchronously.
Following up for posterity – another really interesting thing happening in high-performance computer architectures that do optimize for latency is by attempting to align execution with data location. More examples may exist, but consider that eBPF moves execution into the kernel enabling a new class of synchronous execution primitives, and that SmartNICs make PCI device virtualization much more efficient by moving device the virtualization onto a dedicated PCI card. CUDA optimization is highly concerned with unblocking the speed of synchronous execution by aligning the data location with the execution structure, and the Chapel programming language makes code and data location a central concern for unlocking synchronous code performance.
Minix is designed to be readable, but it probably isn’t a good reflection of modern hardware.
For simple devices, I’d recommended looking at an embedded OS. Embedded systems have many of the same concepts but the hardware that they support is much simpler.
At the core, there are really only two ways that the OS-hardware interface works. The first is memory-mapped I/O (MMIO)[1]. The device has a bunch of registers and they are exposed into the CPU’s physical address space. You interact with them as if the6 were memory locations. If you look at an RTOS, you’ll often see these accessed directly. On a big kernel, there’s often an abstraction layer where you say ‘write a 32-bit value at offset X in device Y’s mapping’. These can often be exposed directly to userspace by just mapping the physical page containing the device registers into userspace.
The next step up from that is direct memory access (DMA). This is one of the places you see a lot of variation across different kinds of systems. The big split in philosophies is whether DMA is a thing devices do or a thing that some dedicated hardware does. Most modern busses (including things like AXI in small SoCs) support multiple masters. Rather than just passively exporting MMIO registers, a device can initiate transactions to read or write memory. This is usually driven by commands from the CPU: you write a command to an MMIO register with an address and the device will do some processing there. The best place to look at for this is probably VirtIO, which is a simplified model of how real devices work. It has a table of descriptors and a ring buffer, which is how a lot of higher-performance devices work. You write commands into the ring and then do an MMIO write to tell the device that more commands are ready. It will then read data from and write it to the descriptors for the devices. I think the NetBSD VirtIO drivers are probably the easiest to read. You might also look at DPDK, which has userspace drivers for a bunch of network interfaces, which remove a lot of the kernel-specific complexity.
The other model for DMA is similar but the DMA controller is a separate device. This is quite common on embedded systems and mainframes, less common on things in between. The DMA controller is basically a simplified processor that has a small instruction set that’s optimised for loads and stores. If a device exposes a MMIO FIFO, you can use a programmable DMA engine to read from that FIFO and write into memory, without the device needing to do DMA itself. More complex ones let you set up pipelines between devices. More recent Intel chips have something like this now.
Once you get DMA working, you realise that you can’t just expose devices to userspace (or guest VMs) because they can write anywhere in memory. This is when you start to need a memory management unit for IO (IOMMU)[2], which Arm calls a System MMU (SMMU). These do for devices what the MMU does for userspace: let you set up mappings that expose physical pages into the device’s address space for DMA. If you have one of these, you can map some pages into both userspace and a device’s address space and then you can do userspace command submission. Modern GPUs and NICs support this, so the kernel is completely off the fast path. The busdma framework in NetBSD and FreeBSD is a good place to look for this. It’s designed to support both DMA via the direct map and DMA via IOMMU regions.
For this kind of kernel-bypass abstraction to be useful, you need the device to pretend to be multiple devices. Today, that’s typically done with Single Root I/O Virtualisation (SR-IOV). There’s a lot here that you don’t need to care about unless you’re building a PCIe device. From a software perspective you basically have a control-plane interface to a device that lets you manage a set of virtual contexts. You can expose one to userspace or to a guest VM and treat it as a separate device with independent IOMMU mappings.
To do any of this, you need to have some kind of device enumeration. On simple SoCs, this can be entirely static. You get something like a flattened device tree and it tells you where all of the devices are, which you either compile into the kernel or get from a bootloader. Systems with expansion need to do this via ACPI, PCIe device enumeration, and things like USB. This is universally horrible and I’d recommend that you never look at how any of it works unless you really need to because it will cause lasting trauma. OpenFirmware was the least bad way of doing this, and so naturally was replace by something much worse.
Beyond that, the way devices work is in the process of changing with a few related technologies. In PCIe, IDE and TDISP let you establish an end-to-end encrypted and authenticated connection between some software abstraction (for example a confidential VM, or Realm in Arm-speak) and a device context. This lets you communicate with a device and know that the hypervisor and physical attackers on the bus can’t tamper with or intercept your communication. This is probably going to cause a bunch of things to move onto the SoC (there’s no point doing TLS offload on a NIC if you need to talk AES to the NIC).
The much more interesting thing is what Intel calls Scalable I/O Virtualisation (S-IOV, not to be confused with SR-IOV), and Arm calls Revere. This tags each PCIe message generated from an MMIO read or write with an address-space identifier. This makes it possible to create devices with very large numbers of virtual functions because the amount of on-device state is small (and can be DMA’d out to host memory when a context is not actively being used). This is the thing that will make it possible for every process in every VM to have its own context on a NIC and talk to the network without the kernel doing anything.
The end of this trend is that the kernel-hardware interface becomes entirely control plane (much like mainframes and supercomputers 30 years ago) and anything that involves actually using the device moves entirely into userspace. The Nouveau drivers are a good place to look for example of how this can work though they’re not very readable. They have some documentation, which is a bit better.
[1] x86 also has IO ports but they’re really just MMIO in a different namespace done using extra stupidity and it’s safe to pretend it isn’t real.
[2] These weren’t originally created for security. I believe the first ones were in Sun workstations, where Sun wanted to ship cheap 32-bit NICs in a machine with 8 GiB of RAM and wanted the device to be able to DMA anywhere into physical memory.
x86 also has IO ports but they’re really just MMIO in a different namespace done using extra stupidity and it’s safe to pretend it isn’t real.
To give some background on this, I/O mapped I/O (which is what this method is called) was used to make computers cheaper by keeping I/O decoding to a minimum (using fewer logic gates) while at the same time allowing as much memory as possible. CPUs with I/O mapped I/O have dedicated instructions (on the x86, these are IN and OUT) with restrictions that other movement instructions don’t have.
It really makes sense only when memory is tightly coupled. As soon as you start having something that looks like a memory bus, it’s much easier for the CPU to just send a bus message and let a bridge chip (or block on an SoC) deal with it for regions that are mapped to non-memory devices. This gives a cleaner separation of concerns.
The other benefit is that, on systems where you control the memory, you know loads respond in a bounded number of cycles, whereas I/O might block and you can handle interrupts in the middle differently. If you have external DRAM and moderately high clocks, this is much less of a benefit but it can be useful on small microcontrollers with tightly-coupled SRAM.
For another esoteric benefit, I think I’ve seen some QEMU docs that say that PMIO is more performant to dispatch. I think that’s because you end up doing less software address decoding in the virtual machine monitor. I don’t know how relevant these concerns still are, but I thought that was worth mentioning.
So maybe a para-virtualized device might want its doorbell to the host use PMIO instead of MMIO when possible.
MMIO is fairly slow on QEMU because it has an emulated memory map and needs to make normal memory accesses as fast as possible so anything that leaves that path ends up being very slow. Some other emulators just map I/O pages as no access and catch the fault. This is fairly fast with most hypervisors but most PV devices tend to favour hypercalls for prodding their doorbells because hypercalls can avoid saving and restoring most register (just zero them on return) and so can be faster.
At the core, there are really only two ways that the OS-hardware interface works.
A third one: some old CPUs — I only know about the 8080 & Z80 — had dedicated instructions to read and write a numbered I/O port. I think they were called IN and OUT.
To do any of this, you need to have some kind of device enumeration. On simple SoCs, this can be entirely static. You get something like a flattened device tree and it tells you where all of the devices are, which you either compile into the kernel or get from a bootloader. Systems with expansion need to do this via ACPI, PCIe device enumeration, and things like USB. This is universally horrible and I’d recommend that you never look at how any of it works unless you really need to because it will cause lasting trauma. OpenFirmware was the least bad way of doing this, and so naturally was replace by something much worse.
This is the truth.
One thing to note for others (since I’m sure David already knows)…
all of the current ‘device tree’ stuff we have today is originally from OpenFirmware.
OpenFirmware was just a standardized ‘forth’ used for booting systems. (A system monitor)
There were some other nice things in OpenFirmware. My favourite was that it provided simple device drivers in Forth that were portable across operating systems so you just needed to OS to provide a generic driver to get basic functionality. These were not as fast as a native driver (though, for something like a UART, this might not matter) but they were enough to get the device basically working.
The down side of this was that the FORTH drivers were larger than the tiny amount of BUOS firmware that most devices needed on PCs and so needed larger ROM or flash chips to hold, which pushed up the price a lot for simpler devices and required a different board run (with much lower volumes) for other things.
Vendors then stuck a bigger markup on the OpenFirmware version knowing that you couldn’t just use the PC version. It was quite entertaining that, while PC users complained about Apple’s markup on ATi video cards, Sun users were buying the Apple versions because they were a third the price of the Sun packaging of the identical hardware.
The only firmware standard in existence with its own song! (Continuing the theme of OpenFirmware being replaced with worse tech, I can no longer find the .au file download, but https://www.youtube.com/watch?v=b8Wyvb9GotM exists.)
This seems an excellent guide to the history and near future of HW/SW interfaces. Going forward, I’m personally interested in how computation is being distributed through the system. SmartNICs, as I understand the term, include general-purpose CPUs to virtually implement PCI interfaces in physical hardware. GPUs also have heavy compute (obviously) and are becoming more tightly integrated with CPUs through things like Linux’s Heterogeneous Memory Management (I was motivated to post this largely because I’m not aware of other HMM implementations besides Linux’s, though the idea feels generally important – are there other implementations of the idea?), and through CXL. Compute-In-Memory may be interesting, and I posted this here a couple of years ago (!).
There are a bunch of interesting research projects looking at how you properly program heterogeneous distributed systems. The first one I knew about (almost certainly not the first) was BarrelFish, though it didn’t really do much hardware-software co-design.
Because software is possibly the most consequential field affecting humanity today. Anthropogenic climate change is accelerated and meliorated through software. The human social experience today is highly interactive with software. Whole industries are created and destroyed through software. Attacks against and defenses of our society writ large are made and mediated through software. I feel if I have an ability to be part of these really existential struggles, if I have the capacity to develop and defend an opinion about how I hope these struggles will resolve, and if I’m given the opportunity to help “my team” with my skills, with (honestly) not very much sacrifice on my part to do so, then how could I not participate?
I left my previous job when my company was taken over by (what I think of as) rapacious private equity. I took my current job to help make charitable giving more effective. I’m trying to make the right software, and whether I’m remembered for it isn’t as important as trying to do the part I can.
As someone who has only done some Zig, I didn’t find anything particularly revolutionary in here. Though, the section on asserts feels like it’s (mostly) solving the problem at the wrong level. If you parse data correctly at the boundaries, then asserts shouldn’t be needed later without coding errors or intentional manipulation of memory. Some of the sections would benefit from some concrete code examples. I think the asserts section would be one of those, to demonstrate where asserts catch things that code review / tests / parsing wouldn’t.
This is a replicated state machine, which maintains a cluster-consistent log of events. Events in the log are hash-chained — next event contains a content hash sum of the previous event.
Here, we check that the latest event we have hash-chains to our known-good checkpoint.
Crucially, this invariant does not always hold — your log might be incomplete, or there might be a corruption in your log, or you might actually be truncating erroneous events as a part of consensus protocol.
This thing is relatively easy to assert, but relatively painful to type. This is not a simple structural thing like “this string is a valid email”, which you check once at the boundary and are done with it. This is a complex invariant that spans multiple objects and evolves with time. You could still encode it in the types using a phantom type parameter, but, because it is non-structural phantom thing, you’d still have to audit a lot of code to double check that the types are not lying (a lot of things manipulate the log), and you’d probably assert there for a good measure anyway.
without coding errors
We absolutely do want to have extra safety nets, to make sure that, even if we mess up our code, the thing safely crashes instead of returning a wrong result. As I like to say, it’s impossible to have a bug in TigerBeetle, any issue is at least three bugs:
the first bug is the absence of an assert stating the correct invariant positively
the second bug is that the randomized testing setup doesn’t cover the issue reliably
and the least interesting bug is the actual wrong behavior
or intentional manipulation of memory.
And, while we do recommend running with ECC Ram, we’d also love to opportunistically catch random bit flips, if possible and not to costly.
Another important thing here is thorough randomized testing. A failing assert, if you can reproduce it, and a type error are comparable in terms of developer experience. Easy type errors are better that easy crashes, but complex type errors are worse than easy crashes.
The catch of course is that typically you dont have an easy crash. An assert is just a programmer’s guess which sits there until once in a blue moon a specific sequence of events trips it. Types have that superpower that they work even if you never ever run the code. And actually running the code is very tricky — everything off the happy path is typically uncovered.
But if you actually do have thorough explore-all-the-code-paths testing, that significantly narrows the gap here between assertions and types, and you can lean on assets more.
Combined with the fact that we care a lot about data representation, we ended up mostly using types to express just the physical representation of data, and using mostly assertions for logical invariants.
I guess, my overall thrust is that this all is very context specific and nuanced. These things are all exceptionally good advice, and tremendously improved my own personal style, but they might be rather dangerous if not properly digested :P
I wrote at one point about using the type-system to enforce invariants. In that post, I described a time where I had to change the granularity of times in the TSDB I worked on from seconds to milliseconds. This effort would have been highly error-prone without newtypes. I didn’t talk about this in the post, but one of the things I found interesting from this effort was that the cost/benefit of using the type-system to enforce invariants changed over time: While the system was in transition from seconds-based timings to milliseconds-based timings, getting the units wrong would have been really bad. But once the system fully transitioned to milliseconds, using the type system no longer provided a clear benefit, while the costs (increased friction whenever needing to work with time) remained obvious. The types only paid for themselves in the transitional phase, and once that was past, if I recall correctly, they were removed.
I definitely agree on the last part – if you can approach 100% state space test coverage, then assertions become like the most powerful and expressive system, in terms of the experience you have writing the code, and the confidence it gives you
It’s hard to measure when you have such coverage (it’s not line or branch coverage)
And IME you need to constantly and drastically simplify your code to achieve small state spaces … and I don’t even know how to get everybody on the team on board with it
Another thing I do is write BESPOKE harnesses for specific pieces of code / types of correctness – IME this makes it more ergonomic to get to this happy place
But if you get to the point where you can randomly change if statement conditions in your code, and tests fail within ~1 minute or ~5 minutes, then that’s like a superpower
It’s not too hard to get there with a parser, but it’s indeed much harder to get there with a stateful and concurrent runtime
IME most large projects never achieve this, and small projects that achieve it have to fight to keep it that way
So in large projects people only know how to add features in a suboptimal way. i.e. there starts to be “superstition” and “dark corners” of the code
It’s also why I am continually puzzled when people put seg faults in the same category as memory safety. Seg faults are what PREVENT unsafety. You HOPE for a seg fault. If you don’t get it, now you have a big problem.
The doc has a good way of putting it:
Assertions downgrade catastrophic correctness bugs into liveness bugs.
It’s also why I am continually puzzled when people put seg faults in the same category as memory safety. Seg faults are what PREVENT unsafety.
Other way round. A segfault is caused by a memory safety bug. When a segfault occurs you don’t know what other havoc the bug has wrought, like how much memory was corrupted before the crash.
The affected versions of OpenSSL allocate a memory buffer for the message to be returned based on the length field in the requesting message, without regard to the actual size of that message’s payload. Because of this failure to do proper bounds checking, the message returned consists of the payload, possibly followed by whatever else happened to be in the allocated memory buffer
So in that case, you hoped for a seg fault, but you didn’t get one!!! You hoped for it for 10 or 20 years I think! Instead you got silent data exfiltration, for years, in one of the most common network-facing services in the world.
In my case, I have scars from an intermittent audio bug that I tried to debug for weeks very early in my career, which was ultimately caused by memory unsafety. That’s why I draw a HUGE distinction between segfault and no segfault. If there was a segfault, I would have had a clue, but I didn’t
Related - What ASAN does is precisely what the Tiger Beetle doc says:
downgrade catastrophic correctness bugs into liveness bugs
It gives you a crash / seg fault / stack trace, e.g. on
use after free (lifetime safety)
array out of bounds (spatial safety)
uninitialized vars (initialization safety)
none of which cause seg faults in normal C implementations! The segfault is great!
e.g. ASAN was a game-changer when developing a garbage collector, because the state space explodes at every malloc – GC or no GC. It’s a 2**N state space, where N is the number of allocations in a program.
The only way to make it correct is to test exhaustively, with the guarantee that memory unsafety will cause hard crashes – there is no type system that will help you
Any nascent garbage collector is basically a big pile of memory safety bugs :-)
Garbage collection bugs tend to be among the last to be removed from a new programming language implementation.
— Garbage Collection in an Uncooperative Environment (Boehm, 1988)
Previous comments on Cardelli’s definition of “Safe language”, which is IMO the most correct and useful one:
It is useful to distinguish between two kinds of execution errors: the ones that cause the computation to stop immediately, and the ones that go unnoticed (for a while) and later cause arbitrary behavior. The former are called trapped errors, whereas the latter are untrapped errors.
…
A program fragment is safe if it does not cause untrapped errors to occur. Languages where all program fragments are safe are called safe languages. Therefore, safe languages rule out the most insidious form of execution errors: the ones that may go unnoticed.
Type Systems, Luca Cardelli
– Luca Cardelli
So seg faults are trapped errors. Heartbleed is an untrapped error. Languages where all errors are (immediately) trapped are safe.
But I take your point that it sometimes does NOT happen immediately, and you can have unsafe behavior first, and then a segfault. That is indeed bad – the seg fault did not protect you.
But I’d say it’s just as common that it never causes a seg fault, which is the absolute worst.
The other example I gave in one of those threads is that an MMU makes stack overflow safe, by giving you a segfault
Without an MMU, the stack could overflow into heap memory or some other memory, which would be unsafe
There is no possibility of catching it statically – there is no type system that can help you with stack overflow / infinite recursion. Whether it exceeds a bound is an inherently dynamic property of a program
That isn’t true, a wild offset can easily jump past any stack guard pages.
You also need stack probing when anything large is allocated on the stack. If you don’t have stack probes then a wild alloca() or variable-length array can move the stack pointer to an arbitrary address.
I have used this to implement stackful coroutines without any assembly: I allocated space for a stack on the heap, calculated the difference between the heap space and address of a local variable (as a proxy for the current stack pointer), created a VLA of the appropriate size with a bit of slop, and the next function call’s stack frame was on the heap. Criminal.
If you parse data correctly at the boundaries, then asserts shouldn’t be needed later without coding errors or intentional manipulation of memory.
I think, much like perimeter-only firewalls are eschewed for defense in depth, that asserts serve as a seatbelt/guardian against future changes in expectations for code/state logic.
Assertions also double as inline descriptions of invariants (compared to a comment, they are more concise and at less risk of becoming stale). Even if the assertion is never tripped, it’s valuable for readers of the code so they know which assumptions they can make.
This property is perhaps less valuable in languages with more sophisticated type systems.
Asserts can also catch bad usage of functions. I know of at least one coworker thanking me for using assert() in my DNS library that caught an incorrect usage for him. In the same library, I have several functions, used for assert(), that ensure invariants in structures used to parse DNS packets, to help ensure that data is parsed correctly at the boundaries.
It struck me reading this that a big theme is that the difficulty of round-tripping is a significant part of what doomed 4GLs… But round-tripping, if actually practiced, would be accidental complexity – and probably accidental complexity of the sort that AI can help reduce? For example, one could ask an LLM something like, “How does this code differ from this UML diagram?”, or, “Please create a new diagram that reflects the code as written. Keep it as similar to this old version of the diagram as possible, highlighting the changes you make.” The output of such prompts might be useful to Subject Matter Experts without needing to involve Software Engineers – though one might want to pick a different 4GL than UML to better appeal to SMEs. Would this idea (making 4GLs actually productive) yield the order-of-magnitude improvement that Brooks would have called a “silver bullet”? I don’t think I buy it, but it’s food for thought…
I find this interesting to contrast with another entry on the front-page today, Ratchets in software development: The other article provides what seems like a counter-example to the strong version of Goodhart’s Law: The count of locally-deprecated uses of an API is successfully used as both a measure and a target. To be a bit more explicit:
A value is to encourage common understanding and use of current engineering patterns;
The measure is the count of how many patterns contrary to what local engineers currently prefer exist in the code-base;
This is targeted such that deprecated patterns monotonically decrease across the lifetime of the code-base.
100% honest use of a metric, taken far enough, is probably contrary to the orgs goals (acknowledged in the other article, as the author allows that circumstances can allow the target to increase sometimes)… But the “far enough” clause is doing a lot of work. In this case at least, for a fair-minded reader, Goodhart’s Law doesn’t seem to have held.
Edit to add: I think the important variable in whether Goodhart’s Law exhibits or not will be how aligned incentives are. If the folks who use the measure have the same incentives as those who affect the measure, then Goodhart’s Law will not apply. OTOH, if these incentives go out of alignment (e.g. a PHB observes “the number of deprecated uses hasn’t gone down in 6 months” with some form of implicit or explicit threat), then expect the negative implications of the law to arise.
That article is interesting in conjunction with ~lcapaldo’s comment on using metrics… That ratchet mechanism is a very dumb metric, and a good part of that article is taken up by explaining exactly how dumb it is. But that’s fine because the goal of the metric isn’t to reduce use of that sort of code to zero. That would have costs that are basically unrelated to improving the operation of the program. The goal is to not let it increase, which is much cheaper (in terms of code not needing to be rewritten/modified/re-tested/etc) and still a good proxy for the real goal, which is something like “make future development easier, in this case by avoiding particular antipattern-prone constructs”.
He holds std::sync::Mutex across block_on. Diving into the tokio source code, this seems to poll on the Future until it is ready. This is essentially the same thing as .await, which desugars into poll upon compilation. Then, the Mutex exhibits broken behavior as documented:
Note that, although the compiler will not prevent the std Mutex from holding its guard across .await points in situations where the task is not movable between threads, this virtually never leads to correct concurrent code in practice as it can easily lead to deadlocks.
As I understand it, block_on is a synchronous call, and shouldn’t be called from an async task in general: block_on is used to run async code from a synchronous context, and the author is not in a synchronous context. The named fix (to use tokio::sync::Mutex may address this specific issue, but they’re almost certain to run into another issue later.
Right you are, I misread. My overall impression that the author is confused about sync and async contexts still seems correct, to me. Blindly replacing std::sync::Mutex with tokio::sync::Mutex still won’t work correctly (I haven’t checked, sorry) without also switching the synchronous caller to use blocking_lock.
This was a fun article! The dynamic linker ended up being a real focus of mine after working on Project Radium at VMware, I’m happy to learn something new in the area. That said, I’m surprised they aren’t using OCI images for CI? That’s what we did, and unless I misunderstand something, it totally addressed the concern raised in the article, without needing to build any custom tooling. Just run the same image that CI uses locally, and you can trivially debug your crashes.
Yeah that would have worked too. I consider this more a learning experience with something (potentially) useful at the end.
If I have to argue for myself, I’d say this tool probably would work in more cases, e.g. when the source environment isn’t a container. And also debugging inside a container image can be limiting, maybe?
I’ll try to distill this: there’s a tension between mathematical purity and mechanical sympathy, in which functional languages have tended to value the former significantly more than the latter, which I guess is frustrating to the author. I happen to think that the actor model can fit in this space neatly, and personally wish the actor-based approach was more broadly used.
I view the underlying issue as one of bringing more rigor to mutation. Actors are, in essence, mutable state (previously received messages affect the behavior of later messages), but they’re mutable state that can be rigorously analyzed more easily than Arc<Mutex<T>>. Further, they’re mutable state with (IMO) good mechanical sympathy: in an actor system, you are much less vulnerable to memory-model foot-guns than you are even in traditional imperative languages, because the main interface for interactions is serial (the actor isn’t simultaneously doing multiple things), and comes with clear data ownership. For example, I suspect false sharing can be completely avoided. This comes at a cost of making some forms of parallelism more difficult, true, and aren’t ideal for many application domains (I wouldn’t try to rewrite a GPU shader to use actors for parallelism). I’m not trying to say that actors fit everything well. But I do think they can occupy a space closer to “mechanical sympathy” than most FP languages, and closer to “mathematical purity” than most imperative languages.
For what it’s worth, here’s a raytracer in Monte with per-pixel parallelism via subprocesses. Each subprocess only knows the geometry relevant to its current pixel; if you were rendering multiple frames at a time, each frame would be isolated. The application domain isn’t the problem; the issue is that GPUs are programmed like straight-line DSPs and can’t support the pointer-juggling implied by fat pointers. (If you have something more efficient than fat pointers for implementing actors, let me know; they are the current state of the art.)
Could you expand this, perhaps? When I implemented an actor language, I found myself using techniques from Scheme and Smalltalk implementations; actors are like lambdas and also like objects. I think that mutation is largely orthogonal, in the sense that there are actor languages with no mutation, local mutation, tuple spaces, or global mutation.
When I implemented an actor language, I eventually realized that even if I used a mutation-free functional language for actor behaviour, I ended up getting a system that behaved like one with mutation, though with a somewhat more Smalltalkish flavour than Cish. That is to say: It was very easy to make an actor that behaves like an assignable variable (just have it implement a protocol that implements set! and get, the latter simply returning the last value that was set! into it), and I found that most of the actors I made for practical tasks essentially just used their call stacks to store state.
So I ended up with a stateful system, even though all the actors themselves had their behaviour defined in a purely functional manner. In practice message-send was a side-effecting operation, that often observably behaved like mutation with extra steps, meaning that reasoning about programs had to account for all the annoyances of just using mutation - but without the raw performance thereof. It was pretty fun to use this to play with the kinds of state management that could be built using nothing but send, receive and various patterns of recursion and continuations that are normally considered “functional” - but you also couldn’t reason about it like you would a pure functional language.
(This is not to say that actors are a bad idea; there are lots of other benefits to them. And something new and exciting may have happened in the actor space in the nearly 20 years since then.)
First of all, you’re definitely a step ahead of me in actually implementing a language – I formed most of my ideas by implementing an actor system in C++. But if one is implementing a language (rather than bolting a system onto an existing language), then I don’t think that this:
reasoning about programs had to account for all the annoyances of just using mutation - but without the raw performance thereof.
needs to be true. For example, if your actor implements “set” and “get” of a 32-bit integer (using a modified Pony syntax):
actor IntActor
var _state: U32
new create(initial: U32) =>
_state = initial
be set(value: U32) =>
_state = value
be get(respond_to: Tag, callback: {U32}: None) =>
callback(_state)
actor Main:
new create(env: Env) =>
let IntActor = IntActor(42)
intActor.get(|Value| =>
env.out.print("The value is: " + Value.string()))
I don’t see why, in principle, IntActor can’t optimize to what in Rust would be an AtomicU32, so that performance of mutation could be recoverable: all the information needed to perform this optimization is available at compile-time. Or, if you wanted to keep the interface and throttle the speed of change for some reason, one could do something like:
actor IntActor
...
be set(value: U32) =>
_timer.delay(_delay_time, |()| => _state = value)
demonstrating that the mutation is easier to work with (though, admittedly, eliminating the opportunity for optimization described above). I’m skipping a lot since this is too long, but we did build something reasonably effective in C++, so I suggest that many possible objections can be overcome.
I have found it useful in the context of Erlang-like systems to distinguish between mutability and effects. It is indeed possible to immutably interact with other actors via message passing, but message send/receive is effectful, i.e. non-idempotent. Thus we give up one of the benefits of pure functional computation when we introduce actors. That said, IMO an actor-based system with explicitly tracked mutability and effects is an excellent place to land in the design space, precisely what I’m exploring with the Hemlock programming language.
It helps that the name “Hemlock” is remarkably underutilized for programming projects. Perhaps that’s due to people associating it with poison, but where I live it’s a native tree. Also, at the time I was trying hard to have “ML” in the name. It’s a happy accident that those things lined up, and there was the bonus double meaning added to BranchTaken – the root of the domain name I was already hosting the project at.
There are a couple of references I’d bring to the conversation. I’m going to get details of these recommendations wrong, since they’re larger reads than I have time to check at the moment, but I hope they’re helpful in broad strokes.
First and foremost, Fred Brooks’ No Silver Bullet. One of the things I took away from Brooks’s essay is that resolving some part of the “software crisis” does not actually make engineering easier. Rather, it increases the scope and complexity of the domains to which we apply engineering. Whatever else you’d say, software is extremely more important to functioning society than it was in the ’60s when the Software Crisis was described.
Secondly, and I’m not as confident here, Peter Drucker’s The Effective Executive. I have the idea from Drucker (I believe) that “executives” (the way he used the term) are characterized by those whose decisions (as opposed to more straightforward execution of stereotyped activities) make material difference to the success of an organization. This is, I think, most of us, though not all in the same way. I bring this up because in the software industry, we’re working largely with thought-stuff, rather than physical or chemical processes. When the medium is thought, the bottleneck of forward progress is the speed of thought, and of communication. Being a bottleneck is difficult, and (I seem to recall) Drucker’s book had much to say about managing priorities in such circumstances. When forward progress is (more-or-less) always possible, time spent on business needs to be explicitly balanced with other desires and obligations. Feeling that the set of technical demands on practitioners constitutes a crisis feels (to me) like a failure of prioritization or of realistic planning.
I agree that unstructured plaintext is not ideal for logs – it’s painful to do automated log analysis when there’s no schema at all – but I don’t see why structured logging is so quickly dismissed. JSON-per-line is not that hard to read, and even if it is more painful than one would like, it’s trivial to translate to something more readable.
Json per line is less brittle than text+regex if you want a modicum of structure, imho. With text, all the edge cases are terrible: can you emit a log message that spans multiple lines? Can you easily retrieve entries’ timestamps?
it’s as usual all about expectations. Unspoken often. Do I want reporting? Then logfile analysis may be inferior to a dedicated data outlet. Do I want bug or incident forensics? Then flexibility and readability may be more important than structure. I expect logging to log the unforeseen, the surprises. The less assumptions made upfront, the more likely to succeed.
I have yet to see the cases where complexity in logging is beneficial (indirectly, naturally) to the end user.
In the future, you can find out information like this in a quicker way by grepping through /usr/include and reading the definition of errno and/or reading glibc source.
If you want to understand libc, I’d recommend you steer away from glibc (it’s a torturous mess of indirection, macros, and incomprehensibility) and instead read musl or a *BSD libc which are much easier to grok.
I agree that glibc is really tough to follow… But if you want to know how this behaves for your system, then you have to read glibc, not musl. And it may even tell you interesting things. For errno, for example, even if we restrict to just Linux on x86_64, it works differently in different places. Follow the breadcrumbs, and you’ll eventually find the SYSCALL_SET_ERRNO macro. And we see that there’s a differenterrno in different contexts: the dynamic linker uses its own copy, which does not appear to be thread-local; the C library uses the __libc_errno symbol, and other parts of the distribution (such as libpthread) use errno (though my guess is that these resolve to the same address most of the time), which are at known offsets from the thread-local-storage base register. This suggests that dlopen (which is largely implemented in dynamic linker code) doesn’t set errno if it fails? Now I feel like testing this… I wouldn’t have wondered if I hadn’t actually gone through my own system’s code.
It’s not necessarily clear from header files alone. For example stuff gets weird with vDSO and address space mapping. Also the thread local variable stuff gets confusing if you’re not familiar with the details. But yes, you are right in theory.
What I don’t understand is why everyone should have to go through this trouble (which isn’t all that complicated in the end, I realise), instead of this being upfront in documentation/man pages?
cppreference.com is your friend here. It’s the best resource for reading stuff from the C and C++ standards. The actual standards documents are a tough slog.
As for Linux man pages, it seems to be pretty clear about it (although this one is for C99, not C11).
errno is defined by the ISO C standard to be a modifiable lvalue of type int, and must not be explicitly declared; errno may be a macro. errno is thread-local; setting it in one thread does not affect its value in any other thread.
That doesn’t tell you how it’s implemented. There are at least three plausible ways of implementing it given that description:
Have the kernel return unambiguous success or errno values and have libc maintain errno.
Have the VDSO expose an initial-exec thread-local variable and have the kernel always write the errno value at that offset from the userspace thread pointer (this could also be done in a completely ad-hoc way, without the VDSO).
Have a system call that allows a userspace thread to specify its errno location and have the kernel write into that.
It happens that most (all?) *NIX systems, including Linux, pick the first option from this list. If I were designing a POSIX system today, I’d be somewhat tempted by option 2 so that the system calls could implement the POSIX semantics directly even without libc, at the cost of one extra copyout per failed system call. The main down side is that system calls would then have no mechanism for reporting failure as a result of the thread pointer being invalid, but signals can handle that kind of everything-is-broken failure.
True, the documentation doesn’t say anthing about implementation (thankfully, at least in the case of the C standard), but as I understood the OP the question was about whether errno is kernel-based or libc-based in general. Given the fact that it is documented as part of the C standard that should be a big clue that it is libc-based. On the systems I support it can only be libc based because there is no operating system.
If the OP question was really about whether errno is libc or kernel based on Linux, then there is some room for ambiguity. Perhaps the article should have phrased the question better.
but as I understood the OP the question was about whether errno is kernel-based or libc-based in general. Given the fact that it is documented as part of the C standard that should be a big clue that it is libc-based
Why? Signals are part of the C standard, but are implemented in the kernel on most *NIX systems, for example. The POSIX standard doesn’t differentiate between kernel and libc functionality at all: it is defined in terms of C interfaces, but some things are implemented in the kernel and some in libc. It’s entirely reasonable to ask what the division of responsibilities between kernel and libc is for any part of the C or POSIX standard, particularly a part that is set on system call returns.
On the systems I support it can only be libc based because there is no operating system.
That doesn’t mean that file I/O is a purely libc service in a hosted environment, yet it is also specified in the C standard.
When I was working on a toy-kernel, my Idea was that syscalls would return carry-zero for success and a opaque handle on error with the carry bit set.
You could interrogate the kernel and vDSO to learn more, so finding out if you can retry would be relatively simple and fast, stored in the vDSO, but you could get stack traces over the various nanokernel services that were touched and tell the user what went wrong; (in pseudocode)
let result: SyscallResult = syscall_open_file("/etc/passwd");
if result.carry_bit() {
if vdso_err_retryable(result) {
goto retry;
} else {
panic("could not read file: {reason}\n{stacktrace}",
reason = syscall_err_message(result),
stacktrace = syscall_err_stacktrace(result)
);
}
}
let file_handle: FileHandle = result.cast();
goto do_stuff;
I keep pondering reaching LLVM about the carry-bit-on-failure calling convention. I think it would be a nice way of implementing lightweight exceptions: set carry on exception return and implement exceptions in the caller as branch-on-carry to the unwind handler. You’d get one extra branch per call, but in exchange for that you don’t need an unwind library.
The extra branch per call is virtually free if you branch to the error case and the error is rare (and it should be). Both on big OoO super scalar and small in order microarchs.
Also you shouldn’t place a subroutine call in your hot loop 😇.
I don’t think Herb proposed a calling convention in that document (it’s purely C++, which regards the ABI as a separable concern). I did discuss this as a possibility with him around the time that he wrote that though.
“A robot will be truly autonomous when you instruct it to go work and it decides to go to the beach instead.” Brad Templeton
https://arstechnica.com/ai/2025/03/anthropics-ceo-wonders-if-future-ai-should-have-option-to-quit-unpleasant-tasks/
I really enjoyed reading this. There are good summaries at the end of each section that both authors agree represent their viewpoints.
Of course, this transcript resulted in me agreeing with Ousterhout entirely, and being puzzled by Martin’s takes on the points where they disagree. I’m curious to hear if that’s because I was already primed by anti-Clean Code rhetoric in advance. Do fans of Clean Code read this and think it’s fair and accurate?
The main feeling I have is that Uncle Bob is more of an evangelist than a technologist. He advocated extremely strongly for TDD, for SOLID, for short functions, for self-documenting code… And back when I hadn’t tried these things at all, his evangelism was part of what convinced me to make the attempt, and I observed my code quality improve significantly as a result. It’s certainly possible to go too far, and I’ve done that, too… But I don’t know if I’d have as much of a sense for the limitations of the approach if I hadn’t attempted the dogma first.
I too, agree that I find Uncle Bob’s reasoning puzzling. The two hardest problems in Computer Science is cache invalidation and naming things, and Uncle Bob wants to create yet more names?
Uncle Bob’s arguments against comments is also puzzling—one place were comments are gold are for workarounds for bugs in code you don’t control, or to provide a reference where an algorithm was described (I notice that neither one of them mentioned the Knuth article describing the primes generator in a comment).
The Reddit thread has some commenters who side with Martin more than they do Ousterhout. Others also described Ousterhout as different shades of “not cool”, but that may be them siding with Martin more.
Redditors tend to like Bob Martin’s personality or pseudo-“scientific method” approach, and so I often see TDD and Martin defended with cult-like ferociousness.
Lobsters tend to like the little mermaid’s personality or pseudo-“fish” approach, and so I often see Tail Driven Development and Ariel defended with cult-like ferociousness.
One problem with this critique of Clean Code is that Uncle Bob presents a rule of thumb, which Ousterhout goes on to interpret as a law, and critiques it as such.
Before 1.0, I wrote a linear types RFC, so I’m personally annoyed by how influential this article seems to have been. The costs are, as Aria admits, not as bad as they had thought when they started the article (rewriting APIs would certainly be highly annoying. I’m reasonably certain it could also be significantly automated, but the ecosystem churn could be pretty horrid. I also no longer think linear types can be reconciled with
panic=unwind, but I always strongly preferredpanic=abortanyway), and the benefits of linear types seem to have gotten stronger over time (they could solve perhaps the biggest foot-gun in async Rust, preventing implicit cancellation when some operations are expected to run to completion). A code pattern shown in the article (quoting)isn’t something you’d want to encourage if you’re using linear types, in the same way that using bare pointers – even when provably safe! – isn’t something the Rust language encourages. (Rust can be comfortable ruling out some classes of correct programs in order to encourage more compiler-verifiable correctness.) Being able to
array[i] = val;wouldn’t work, but you wouldn’t want it to work. The point of linear types is to force users to avoid letting values fall on the floor.old_val = std::mem::replace(&mut array[i], val);verifies that they’ve done so. The pain of linear types is the benefit.I continue to think Rust would have been better off with linear types, but I never had the time I needed when I needed to push the argument forward, and now I feel like the ship has sailed. Oh well.
Some of the points are highly debatable.
Because lexical shadowing cause lots of bugs in C and C++, hence D and Zig forbid it.
You would have to compare comptime to the swath of C++ templates it replaces.
Because it turns out memory safety is highly overrated by some, and is only one of the desirable properties of a language, and probably behind learnability and productivity in terms of priority.
The point in the post is that there are languages that solved it.
The post directly refers to C++ templates and doesn’t think it’s a valid solution.
Sure, people may disagree, but please be more specific. The post particularly lays out the point where they feel like they are recommitting previous mistakes.
Memory safety is a tangible objective with a reasonably good definition. I have yet (as a trainer and someone very interested in that space) to find a tangible definition of learnability and productivity. Productivity is often mistaken with familiarity, learnability is often very biased to what the speaker knows.
No, I’ve read that correctly. It describe a scoping exactly like in C++. The Rust solution does create bugs. https://rules.sonarsource.com/cpp/RSPEC-1117/?search=shadow
Please refer to Capers Jones for data points about various languages, expressed in cost per function point.
https://jyx.jyu.fi/handle/123456789/47698 languages with “pub” and “fn” as keywords avoid the fact that this reduces learnability for non-english speakers.
Capers Jones FP are metrics of business value of a program. It’s a good productivity metric the whole process of programming. Sometimes, you can find productivity metrics that give you the relationship of FP to LOC on a programming language basis, but even this is very far away from a measure of programmer productivity on a language and are influenced by other effects, like surrounding domain or domain expertise of the programmer.
I looked at the 259 pages document you linked quite hard. The term “Learnability” is mentioned once. It seems to be a very good meta-study, however, reading this:
I’m not sure it supports your point, especially not at a whole language scale.
To me, memory safety is like the baseline, the bare minimum. Anything without it simply isn’t worth using for any reason.
This must be more extreme than you mean. Unsafe Rust forgoes memory safety. Is unsafe Rust not worth using for any reason?
When people say things like your parent, what they always mean is “by default.” Even programs written in GC’d languages can have memory safety issues via FFI, but we still call them memory safe. Unsafe is like an FFI to a (slightly larger) language.
Maybe, but I really wish engineers, at least, wouldn’t talk that way. Engineering almost always involves trade-offs, and the maximalist stance (“bare minimum”, “for any reason”) make those harder to identify and discuss. For example, the follow-up comment is much more interesting (IMO) than the one I responded to.
So, rereading your comment, I think I missed a word:
Emphasis mine. I thought you were saying (like I hear often on the internet) that the existence of unsafe Rust means that Rust is not a memory safe language, and was trying to talk about that definition, not about the judgement.
My apologies.
I think doing something potentially unsafe can be worth it, if you are willing to pay the cost of spending a lot of time making sure the code is correct. For large C, C++, or Zig code bases, this isn’t really viable. Easily 10xes your development time.
I see unsafe Rust the same way I see the code generating assembly inside a compiler. Rustc or the Zig compiler can contain bugs that make them emit incorrect code. Same with Go or javac or the JIT inside the HotSpot VM. So you are never free of unsafe code. But the unsafety can be contained, be placed inside a box. You can design an interface around unsafety that doesn’t permit incorrect use. Then you can spend a lot of time proving it correct. And after that, you can write an infinite amount of code depending on that unsafe module without worry!
Essentially, I view
unsafein Rust as a method of adding a new primitive to the language, the same way you could add a new feature to Python by writing C code for the Python interpreter. The novelty is just that you write the new feature in the same language as you use the feature in.If 1% of your code is unsafe modules with safe interfaces, and you 10x the cost of developing those parts by being very careful and proving the code correct, your overall development cost only went up by 9%. That’s a much lower cost, and thus what I mentioned at the start, being willing to pay the cost of doing it right, becomes much smaller. It will be worth it in many more situations.
That got a little rambly but I hope I communicated my stance correctly.
An effect that large probably requires poor coding practices. Stuff like global variables, shared state between threads, lack of separation between effects and computation… When you make such a mess of your code base, OK, you need strong static guarantees, like “no arbitrary code execution is ever possible no matter how buggy my program is” — that is, memory safety. Now don’t get me wrong, I love static guarantees. I love when my compiler is disciplined so I don’t have to be. But even then you want a nice and maintainable program.
My hypothesis, is that if you do the right thing, that is, properly decomposing your program into deep modules and avoid cutting corners too often, is that memory safety doesn’t boost your productivity nearly as much as a whopping 10x. Especially if you go for some safety, like just adding bounds checks.
I will concede that in my experience, we rarely do the right thing.
I don’t know, when Java introduced a wider audience to GC, software development has definitely experienced a huge boost. Maybe 10x just in itself is not a correct figure. But you don’t only ship software, you also maintain it, fix bugs etc - and here, guaranteed memory safety can easily have an order of magnitude advantage, especially as the software grows.
Java also popularised huge standard libraries. It’s easier to be productive when a good portion of the work is already done.
The important point, to me, is reducing the potential for harm, and there are many ways to do that. For example, memory unsafety can be mitigated by running code in a sandbox of some kind (such as by compiling and running in a WASM environment), or by only running it against trusted input in simple scenarios (my local programs I use for experimenting), or by aggressively using fuzzers and tools like valgrind, ASAN, LSAN, and static analyzers like Coverity. All of these are valid approaches for mitigating memory unsafe code, and depending on my risk model, I can be OK with some level of memory unsafety.
Further, there are domains in which harm is more likely to come from too-loose type systems than it is from memory unsafety. I can implement my CRUD app in pure memory-safe Python, but that won’t protect me from a SQL injection… Something that enforces that the database only interacts with
SanitizedString’s might. On the other hand, in exploratory development, too strong of a type system might slow you down too much to maintain flow.Anyways, I generally don’t like maximalist stances. If multiple reasonably popular approaches exist in the wild, there are more than likely understandable motivations for each of them.
Can you explain what exactly memory safety means to you?
I’m writing a lot of code that is 100% “memory safe” (in my view), but would not be possible to be written in memory safe language conveniently (like Rust).
the things we can universally agree on:
What would that baseline contain for you?
They probably mean the definition you get if you stop at the summary of the Wikipedia page, which is easy to misinterpret as saying that they have to be enforced by the toolchain (in which I include things like linters and theorem provers) or VM having compile-time or runtime guard rails which cannot suffer from false negatives.
That’s why i’m asking. The “summary definition” of Wikipedia isn’t really helpful at all.
The chapter “Classification of memory safety errors” is much better in listing potential problems, but those are language dependent.
Zig (in safe modes!) will have checks against Buffer overflow and Buffer over-read, and has increased awareness for Uninitialized variables, as
undefinedis an explicit decision by the programmer. This doesn’t mean Undefined variables are a non-problem, but it’s less likely as you have to actively decide not to initialize.<pedantic>Use-after-free, Double-free, Mismatched-free are not possible in Zig either, as the language has no concept of memory allocation.</pedantic>This doesn’t mean a userland allocator won’t suffer from this problems, but the GeneralPurposeAllocator has mitigations to detect and prevent these problems.In C, you actually have a problem with the built-in support for
mallocandfree. Clang, for example, knows thatmallocis special, and can be considered “pure” and returning a unique object. This is how the function is defined in the C standard.This yields interesting problems with the assumptions of the programmers and what the compiler actually does.
Consider the following program:
The compiler will optimize it to this:
This is because
mallocwill always return a pointer to a new object, so it can never alias with another result ofmallocever. Alsomallocwon’t have side effects besides yielding a new object. So eliding the invocation ofmallocis fine, as we don’t use the result.If we do remove the
glob = p1;line, the compiler will just make the functionreturn true;and won’t emit any calls tomallocat all!As Zig has no builtin allocator in the language (but it’s a userland concept inside the standard library), the compiler can’t make the assumptions above and thus actually integrate the checks.
Why not https://bazel.build/?
Didn’t find it mentioned the post
Because I don’t work at Google.
Bazel is mentioned in Why not $toolname:
Buck2 is not… I found Bazel fairly impenetrable when I tried using it for something with a fairly complicated build, possibly because so much of what I needed to understand was built into the system itself. If I were evaluating something similar today, I’d look a lot closer at buck2. I don’t know if it would work for the author, though.
In an old team, I argued that adding a dependency is, in large part, a political decision: In addition to the technical implications, one should ask if the incentives of the dependency’s stewards are aligned with our own? If not, while things may look good now, there may be trouble down the line. (See also Platform as a Reflection of Values). If incentives and tech are well aligned, then it’s often a mistake not to take on the dependency.
I was reminded of a data structure @trishume described here; I’m honestly still not completely clear on the difference between what Tristan described and Fenwick trees… That is, by following the breadcrumbs from Tristan’s post, I found this alternative write-up . Figure 1 in that paper looks very similar to Tristan’s graphic…
This article is making it sound more profound than it really is.
Yes, as the performance of IO deviced increased, the latency of IO became a problem, making it hard to saturate the bandwidth. The simplistic (but convenient) abstraction of the blocking IO become increasingly insufficient.
Almost everything in computing eventually gains async interfaces. From CPU instruction execution pipelining, CPUs communicating asynchronously over coherence bus, with each other or memory, IO device interfaces, remote systems, threads, processes. Asynchronicity due to latency is just the fact of physical world.
But notably this model has a downside - reasoning about state (especially atomicy) of things is much harder and programming model and architecture needs to be redone in a new paradigm.
The idea of sending all your systems calls in the form of messages to what feels like a little mini server feels completely different than just making an async function calls.
You’re right about the change in programming model - though of course it’s a very common and successful one to use between machines. I was surprised it was this univesal.
The other thing that pushes towards synchronous interfaces (besides making reasoning easier) is latency sensitivity. The things you’re describing as gaining async interfaces are (generally) making a trade-off to sacrifice some latency in order to gain higher bandwidth, but the opposite trade-off may make sense, sometimes. For example, persistent storage with very low latency (e.g. optane memory, if it weren’t cancelled) likely prefers to be used synchronously.
Following up for posterity – another really interesting thing happening in high-performance computer architectures that do optimize for latency is by attempting to align execution with data location. More examples may exist, but consider that eBPF moves execution into the kernel enabling a new class of synchronous execution primitives, and that SmartNICs make PCI device virtualization much more efficient by moving device the virtualization onto a dedicated PCI card. CUDA optimization is highly concerned with unblocking the speed of synchronous execution by aligning the data location with the execution structure, and the Chapel programming language makes code and data location a central concern for unlocking synchronous code performance.
Minix is designed to be readable, but it probably isn’t a good reflection of modern hardware.
For simple devices, I’d recommended looking at an embedded OS. Embedded systems have many of the same concepts but the hardware that they support is much simpler.
At the core, there are really only two ways that the OS-hardware interface works. The first is memory-mapped I/O (MMIO)[1]. The device has a bunch of registers and they are exposed into the CPU’s physical address space. You interact with them as if the6 were memory locations. If you look at an RTOS, you’ll often see these accessed directly. On a big kernel, there’s often an abstraction layer where you say ‘write a 32-bit value at offset X in device Y’s mapping’. These can often be exposed directly to userspace by just mapping the physical page containing the device registers into userspace.
The next step up from that is direct memory access (DMA). This is one of the places you see a lot of variation across different kinds of systems. The big split in philosophies is whether DMA is a thing devices do or a thing that some dedicated hardware does. Most modern busses (including things like AXI in small SoCs) support multiple masters. Rather than just passively exporting MMIO registers, a device can initiate transactions to read or write memory. This is usually driven by commands from the CPU: you write a command to an MMIO register with an address and the device will do some processing there. The best place to look at for this is probably VirtIO, which is a simplified model of how real devices work. It has a table of descriptors and a ring buffer, which is how a lot of higher-performance devices work. You write commands into the ring and then do an MMIO write to tell the device that more commands are ready. It will then read data from and write it to the descriptors for the devices. I think the NetBSD VirtIO drivers are probably the easiest to read. You might also look at DPDK, which has userspace drivers for a bunch of network interfaces, which remove a lot of the kernel-specific complexity.
The other model for DMA is similar but the DMA controller is a separate device. This is quite common on embedded systems and mainframes, less common on things in between. The DMA controller is basically a simplified processor that has a small instruction set that’s optimised for loads and stores. If a device exposes a MMIO FIFO, you can use a programmable DMA engine to read from that FIFO and write into memory, without the device needing to do DMA itself. More complex ones let you set up pipelines between devices. More recent Intel chips have something like this now.
Once you get DMA working, you realise that you can’t just expose devices to userspace (or guest VMs) because they can write anywhere in memory. This is when you start to need a memory management unit for IO (IOMMU)[2], which Arm calls a System MMU (SMMU). These do for devices what the MMU does for userspace: let you set up mappings that expose physical pages into the device’s address space for DMA. If you have one of these, you can map some pages into both userspace and a device’s address space and then you can do userspace command submission. Modern GPUs and NICs support this, so the kernel is completely off the fast path. The busdma framework in NetBSD and FreeBSD is a good place to look for this. It’s designed to support both DMA via the direct map and DMA via IOMMU regions.
For this kind of kernel-bypass abstraction to be useful, you need the device to pretend to be multiple devices. Today, that’s typically done with Single Root I/O Virtualisation (SR-IOV). There’s a lot here that you don’t need to care about unless you’re building a PCIe device. From a software perspective you basically have a control-plane interface to a device that lets you manage a set of virtual contexts. You can expose one to userspace or to a guest VM and treat it as a separate device with independent IOMMU mappings.
To do any of this, you need to have some kind of device enumeration. On simple SoCs, this can be entirely static. You get something like a flattened device tree and it tells you where all of the devices are, which you either compile into the kernel or get from a bootloader. Systems with expansion need to do this via ACPI, PCIe device enumeration, and things like USB. This is universally horrible and I’d recommend that you never look at how any of it works unless you really need to because it will cause lasting trauma. OpenFirmware was the least bad way of doing this, and so naturally was replace by something much worse.
Beyond that, the way devices work is in the process of changing with a few related technologies. In PCIe, IDE and TDISP let you establish an end-to-end encrypted and authenticated connection between some software abstraction (for example a confidential VM, or Realm in Arm-speak) and a device context. This lets you communicate with a device and know that the hypervisor and physical attackers on the bus can’t tamper with or intercept your communication. This is probably going to cause a bunch of things to move onto the SoC (there’s no point doing TLS offload on a NIC if you need to talk AES to the NIC).
The much more interesting thing is what Intel calls Scalable I/O Virtualisation (S-IOV, not to be confused with SR-IOV), and Arm calls Revere. This tags each PCIe message generated from an MMIO read or write with an address-space identifier. This makes it possible to create devices with very large numbers of virtual functions because the amount of on-device state is small (and can be DMA’d out to host memory when a context is not actively being used). This is the thing that will make it possible for every process in every VM to have its own context on a NIC and talk to the network without the kernel doing anything.
The end of this trend is that the kernel-hardware interface becomes entirely control plane (much like mainframes and supercomputers 30 years ago) and anything that involves actually using the device moves entirely into userspace. The Nouveau drivers are a good place to look for example of how this can work though they’re not very readable. They have some documentation, which is a bit better.
[1] x86 also has IO ports but they’re really just MMIO in a different namespace done using extra stupidity and it’s safe to pretend it isn’t real.
[2] These weren’t originally created for security. I believe the first ones were in Sun workstations, where Sun wanted to ship cheap 32-bit NICs in a machine with 8 GiB of RAM and wanted the device to be able to DMA anywhere into physical memory.
To give some background on this, I/O mapped I/O (which is what this method is called) was used to make computers cheaper by keeping I/O decoding to a minimum (using fewer logic gates) while at the same time allowing as much memory as possible. CPUs with I/O mapped I/O have dedicated instructions (on the x86, these are
INandOUT) with restrictions that other movement instructions don’t have.It really makes sense only when memory is tightly coupled. As soon as you start having something that looks like a memory bus, it’s much easier for the CPU to just send a bus message and let a bridge chip (or block on an SoC) deal with it for regions that are mapped to non-memory devices. This gives a cleaner separation of concerns.
The other benefit is that, on systems where you control the memory, you know loads respond in a bounded number of cycles, whereas I/O might block and you can handle interrupts in the middle differently. If you have external DRAM and moderately high clocks, this is much less of a benefit but it can be useful on small microcontrollers with tightly-coupled SRAM.
For another esoteric benefit, I think I’ve seen some QEMU docs that say that PMIO is more performant to dispatch. I think that’s because you end up doing less software address decoding in the virtual machine monitor. I don’t know how relevant these concerns still are, but I thought that was worth mentioning.
So maybe a para-virtualized device might want its doorbell to the host use PMIO instead of MMIO when possible.
MMIO is fairly slow on QEMU because it has an emulated memory map and needs to make normal memory accesses as fast as possible so anything that leaves that path ends up being very slow. Some other emulators just map I/O pages as no access and catch the fault. This is fairly fast with most hypervisors but most PV devices tend to favour hypercalls for prodding their doorbells because hypercalls can avoid saving and restoring most register (just zero them on return) and so can be faster.
I’ve seen it called PMIO (Port-Mapped I/O), don’t know if that’s an Intel-ism or not.
A third one: some old CPUs — I only know about the 8080 & Z80 — had dedicated instructions to read and write a numbered I/O port. I think they were called IN and OUT.
I mentioned I/O ports, but you’re much better off pretending that they never existed.
This is the truth. One thing to note for others (since I’m sure David already knows)… all of the current ‘device tree’ stuff we have today is originally from OpenFirmware. OpenFirmware was just a standardized ‘forth’ used for booting systems. (A system monitor)
There were some other nice things in OpenFirmware. My favourite was that it provided simple device drivers in Forth that were portable across operating systems so you just needed to OS to provide a generic driver to get basic functionality. These were not as fast as a native driver (though, for something like a UART, this might not matter) but they were enough to get the device basically working.
The down side of this was that the FORTH drivers were larger than the tiny amount of BUOS firmware that most devices needed on PCs and so needed larger ROM or flash chips to hold, which pushed up the price a lot for simpler devices and required a different board run (with much lower volumes) for other things.
Vendors then stuck a bigger markup on the OpenFirmware version knowing that you couldn’t just use the PC version. It was quite entertaining that, while PC users complained about Apple’s markup on ATi video cards, Sun users were buying the Apple versions because they were a third the price of the Sun packaging of the identical hardware.
The only firmware standard in existence with its own song! (Continuing the theme of OpenFirmware being replaced with worse tech, I can no longer find the
.aufile download, but https://www.youtube.com/watch?v=b8Wyvb9GotM exists.)IA to the rescue, in all its “8-bit ISDN mu-law, mono, 8000 Hz” glory.
Lol - i never knew that. This elevates my respect for OpenFirmware even further.
This seems an excellent guide to the history and near future of HW/SW interfaces. Going forward, I’m personally interested in how computation is being distributed through the system. SmartNICs, as I understand the term, include general-purpose CPUs to virtually implement PCI interfaces in physical hardware. GPUs also have heavy compute (obviously) and are becoming more tightly integrated with CPUs through things like Linux’s Heterogeneous Memory Management (I was motivated to post this largely because I’m not aware of other HMM implementations besides Linux’s, though the idea feels generally important – are there other implementations of the idea?), and through CXL. Compute-In-Memory may be interesting, and I posted this here a couple of years ago (!).
There are a bunch of interesting research projects looking at how you properly program heterogeneous distributed systems. The first one I knew about (almost certainly not the first) was BarrelFish, though it didn’t really do much hardware-software co-design.
Because software is possibly the most consequential field affecting humanity today. Anthropogenic climate change is accelerated and meliorated through software. The human social experience today is highly interactive with software. Whole industries are created and destroyed through software. Attacks against and defenses of our society writ large are made and mediated through software. I feel if I have an ability to be part of these really existential struggles, if I have the capacity to develop and defend an opinion about how I hope these struggles will resolve, and if I’m given the opportunity to help “my team” with my skills, with (honestly) not very much sacrifice on my part to do so, then how could I not participate?
I left my previous job when my company was taken over by (what I think of as) rapacious private equity. I took my current job to help make charitable giving more effective. I’m trying to make the right software, and whether I’m remembered for it isn’t as important as trying to do the part I can.
As someone who has only done some Zig, I didn’t find anything particularly revolutionary in here. Though, the section on asserts feels like it’s (mostly) solving the problem at the wrong level. If you parse data correctly at the boundaries, then asserts shouldn’t be needed later without coding errors or intentional manipulation of memory. Some of the sections would benefit from some concrete code examples. I think the asserts section would be one of those, to demonstrate where asserts catch things that code review / tests / parsing wouldn’t.
The best example for “just use types” is perhaps
https://github.com/tigerbeetle/tigerbeetle/blob/f66289bd9c7357546b379dadbffb38e91a8eb11c/src/vsr/replica.zig#L6726
This is a replicated state machine, which maintains a cluster-consistent log of events. Events in the log are hash-chained — next event contains a content hash sum of the previous event.
Here, we check that the latest event we have hash-chains to our known-good checkpoint.
Crucially, this invariant does not always hold — your log might be incomplete, or there might be a corruption in your log, or you might actually be truncating erroneous events as a part of consensus protocol.
This thing is relatively easy to assert, but relatively painful to type. This is not a simple structural thing like “this string is a valid email”, which you check once at the boundary and are done with it. This is a complex invariant that spans multiple objects and evolves with time. You could still encode it in the types using a phantom type parameter, but, because it is non-structural phantom thing, you’d still have to audit a lot of code to double check that the types are not lying (a lot of things manipulate the log), and you’d probably assert there for a good measure anyway.
We absolutely do want to have extra safety nets, to make sure that, even if we mess up our code, the thing safely crashes instead of returning a wrong result. As I like to say, it’s impossible to have a bug in TigerBeetle, any issue is at least three bugs:
And, while we do recommend running with ECC Ram, we’d also love to opportunistically catch random bit flips, if possible and not to costly.
Another important thing here is thorough randomized testing. A failing assert, if you can reproduce it, and a type error are comparable in terms of developer experience. Easy type errors are better that easy crashes, but complex type errors are worse than easy crashes.
The catch of course is that typically you dont have an easy crash. An assert is just a programmer’s guess which sits there until once in a blue moon a specific sequence of events trips it. Types have that superpower that they work even if you never ever run the code. And actually running the code is very tricky — everything off the happy path is typically uncovered.
But if you actually do have thorough explore-all-the-code-paths testing, that significantly narrows the gap here between assertions and types, and you can lean on assets more.
Combined with the fact that we care a lot about data representation, we ended up mostly using types to express just the physical representation of data, and using mostly assertions for logical invariants.
I guess, my overall thrust is that this all is very context specific and nuanced. These things are all exceptionally good advice, and tremendously improved my own personal style, but they might be rather dangerous if not properly digested :P
I’d guess this flips around once more: I think I’d take a complex TLA+ error over a complex crash, but I am yet to write that blog post…
I wrote at one point about using the type-system to enforce invariants. In that post, I described a time where I had to change the granularity of times in the TSDB I worked on from seconds to milliseconds. This effort would have been highly error-prone without newtypes. I didn’t talk about this in the post, but one of the things I found interesting from this effort was that the cost/benefit of using the type-system to enforce invariants changed over time: While the system was in transition from seconds-based timings to milliseconds-based timings, getting the units wrong would have been really bad. But once the system fully transitioned to milliseconds, using the type system no longer provided a clear benefit, while the costs (increased friction whenever needing to work with time) remained obvious. The types only paid for themselves in the transitional phase, and once that was past, if I recall correctly, they were removed.
I definitely agree on the last part – if you can approach 100% state space test coverage, then assertions become like the most powerful and expressive system, in terms of the experience you have writing the code, and the confidence it gives you
It’s hard to measure when you have such coverage (it’s not line or branch coverage)
And IME you need to constantly and drastically simplify your code to achieve small state spaces … and I don’t even know how to get everybody on the team on board with it
Another thing I do is write BESPOKE harnesses for specific pieces of code / types of correctness – IME this makes it more ergonomic to get to this happy place
But if you get to the point where you can randomly change if statement conditions in your code, and tests fail within ~1 minute or ~5 minutes, then that’s like a superpower
It’s not too hard to get there with a parser, but it’s indeed much harder to get there with a stateful and concurrent runtime
IME most large projects never achieve this, and small projects that achieve it have to fight to keep it that way
So in large projects people only know how to add features in a suboptimal way. i.e. there starts to be “superstition” and “dark corners” of the code
It’s also why I am continually puzzled when people put seg faults in the same category as memory safety. Seg faults are what PREVENT unsafety. You HOPE for a seg fault. If you don’t get it, now you have a big problem.
The doc has a good way of putting it:
Other way round. A segfault is caused by a memory safety bug. When a segfault occurs you don’t know what other havoc the bug has wrought, like how much memory was corrupted before the crash.
But when a seg fault DOESN’T occur, it doesn’t mean the code was safe! You still have no idea how many memory safety bugs there are.
Unsafe code often runs forever. Heartbleed was one such example - https://en.wikipedia.org/wiki/Heartbleed#Behavior
So in that case, you hoped for a seg fault, but you didn’t get one!!! You hoped for it for 10 or 20 years I think! Instead you got silent data exfiltration, for years, in one of the most common network-facing services in the world.
In my case, I have scars from an intermittent audio bug that I tried to debug for weeks very early in my career, which was ultimately caused by memory unsafety. That’s why I draw a HUGE distinction between segfault and no segfault. If there was a segfault, I would have had a clue, but I didn’t
Related - What ASAN does is precisely what the Tiger Beetle doc says:
It gives you a crash / seg fault / stack trace, e.g. on
none of which cause seg faults in normal C implementations! The segfault is great!
e.g. ASAN was a game-changer when developing a garbage collector, because the state space explodes at every malloc – GC or no GC. It’s a 2**N state space, where N is the number of allocations in a program.
The only way to make it correct is to test exhaustively, with the guarantee that memory unsafety will cause hard crashes – there is no type system that will help you
Any nascent garbage collector is basically a big pile of memory safety bugs :-)
Previous comments on Cardelli’s definition of “Safe language”, which is IMO the most correct and useful one:
https://news.ycombinator.com/item?id=35836307
https://news.ycombinator.com/item?id=35834513
https://news.ycombinator.com/item?id=21832009
…
Type Systems, Luca Cardelli
So seg faults are trapped errors. Heartbleed is an untrapped error. Languages where all errors are (immediately) trapped are safe.
But I take your point that it sometimes does NOT happen immediately, and you can have unsafe behavior first, and then a segfault. That is indeed bad – the seg fault did not protect you.
But I’d say it’s just as common that it never causes a seg fault, which is the absolute worst.
The other example I gave in one of those threads is that an MMU makes stack overflow safe, by giving you a segfault
Without an MMU, the stack could overflow into heap memory or some other memory, which would be unsafe
There is no possibility of catching it statically – there is no type system that can help you with stack overflow / infinite recursion. Whether it exceeds a bound is an inherently dynamic property of a program
That isn’t true, a wild offset can easily jump past any stack guard pages.
You also need stack probing when anything large is allocated on the stack. If you don’t have stack probes then a wild alloca() or variable-length array can move the stack pointer to an arbitrary address.
I have used this to implement stackful coroutines without any assembly: I allocated space for a stack on the heap, calculated the difference between the heap space and address of a local variable (as a proxy for the current stack pointer), created a VLA of the appropriate size with a bit of slop, and the next function call’s stack frame was on the heap. Criminal.
True, but when I say “stack overflow” I mean pushing a new frame onto the call stack, which the guard page should catch
bounds safety bugs can occur either on the stack or heap
missed a word
I think, much like perimeter-only firewalls are eschewed for defense in depth, that asserts serve as a seatbelt/guardian against future changes in expectations for code/state logic.
Assertions also double as inline descriptions of invariants (compared to a comment, they are more concise and at less risk of becoming stale). Even if the assertion is never tripped, it’s valuable for readers of the code so they know which assumptions they can make.
This property is perhaps less valuable in languages with more sophisticated type systems.
Asserts can also catch bad usage of functions. I know of at least one coworker thanking me for using
assert()in my DNS library that caught an incorrect usage for him. In the same library, I have several functions, used forassert(), that ensure invariants in structures used to parse DNS packets, to help ensure that data is parsed correctly at the boundaries.It struck me reading this that a big theme is that the difficulty of round-tripping is a significant part of what doomed 4GLs… But round-tripping, if actually practiced, would be accidental complexity – and probably accidental complexity of the sort that AI can help reduce? For example, one could ask an LLM something like, “How does this code differ from this UML diagram?”, or, “Please create a new diagram that reflects the code as written. Keep it as similar to this old version of the diagram as possible, highlighting the changes you make.” The output of such prompts might be useful to Subject Matter Experts without needing to involve Software Engineers – though one might want to pick a different 4GL than UML to better appeal to SMEs. Would this idea (making 4GLs actually productive) yield the order-of-magnitude improvement that Brooks would have called a “silver bullet”? I don’t think I buy it, but it’s food for thought…
I find this interesting to contrast with another entry on the front-page today, Ratchets in software development: The other article provides what seems like a counter-example to the strong version of Goodhart’s Law: The count of locally-deprecated uses of an API is successfully used as both a measure and a target. To be a bit more explicit:
100% honest use of a metric, taken far enough, is probably contrary to the orgs goals (acknowledged in the other article, as the author allows that circumstances can allow the target to increase sometimes)… But the “far enough” clause is doing a lot of work. In this case at least, for a fair-minded reader, Goodhart’s Law doesn’t seem to have held.
Edit to add: I think the important variable in whether Goodhart’s Law exhibits or not will be how aligned incentives are. If the folks who use the measure have the same incentives as those who affect the measure, then Goodhart’s Law will not apply. OTOH, if these incentives go out of alignment (e.g. a PHB observes “the number of deprecated uses hasn’t gone down in 6 months” with some form of implicit or explicit threat), then expect the negative implications of the law to arise.
That article is interesting in conjunction with ~lcapaldo’s comment on using metrics… That ratchet mechanism is a very dumb metric, and a good part of that article is taken up by explaining exactly how dumb it is. But that’s fine because the goal of the metric isn’t to reduce use of that sort of code to zero. That would have costs that are basically unrelated to improving the operation of the program. The goal is to not let it increase, which is much cheaper (in terms of code not needing to be rewritten/modified/re-tested/etc) and still a good proxy for the real goal, which is something like “make future development easier, in this case by avoiding particular antipattern-prone constructs”.
He holds
std::sync::Mutexacrossblock_on. Diving into thetokiosource code, this seems topollon the Future until it is ready. This is essentially the same thing as.await, which desugars intopollupon compilation. Then, the Mutex exhibits broken behavior as documented:As I understand it,
block_onis a synchronous call, and shouldn’t be called from an async task in general:block_onis used to run async code from a synchronous context, and the author is not in a synchronous context. The named fix (to usetokio::sync::Mutexmay address this specific issue, but they’re almost certain to run into another issue later.It’s not called from an async task, it’s called from a
spawn_blockingwhich runs on a dedicated “blocking” threadpool.Right you are, I misread. My overall impression that the author is confused about sync and async contexts still seems correct, to me. Blindly replacing
std::sync::Mutexwithtokio::sync::Mutexstill won’t work correctly (I haven’t checked, sorry) without also switching the synchronous caller to useblocking_lock.Yes, the pattern they are using can deadlock for other reasons as explained here.
This was a fun article! The dynamic linker ended up being a real focus of mine after working on Project Radium at VMware, I’m happy to learn something new in the area. That said, I’m surprised they aren’t using OCI images for CI? That’s what we did, and unless I misunderstand something, it totally addressed the concern raised in the article, without needing to build any custom tooling. Just run the same image that CI uses locally, and you can trivially debug your crashes.
Yeah that would have worked too. I consider this more a learning experience with something (potentially) useful at the end.
If I have to argue for myself, I’d say this tool probably would work in more cases, e.g. when the source environment isn’t a container. And also debugging inside a container image can be limiting, maybe?
I’ll try to distill this: there’s a tension between mathematical purity and mechanical sympathy, in which functional languages have tended to value the former significantly more than the latter, which I guess is frustrating to the author. I happen to think that the actor model can fit in this space neatly, and personally wish the actor-based approach was more broadly used.
Why do you think actors can fix the problem? Also, do you mean simple state machines or actors with separate stacks?
I view the underlying issue as one of bringing more rigor to mutation. Actors are, in essence, mutable state (previously received messages affect the behavior of later messages), but they’re mutable state that can be rigorously analyzed more easily than
Arc<Mutex<T>>. Further, they’re mutable state with (IMO) good mechanical sympathy: in an actor system, you are much less vulnerable to memory-model foot-guns than you are even in traditional imperative languages, because the main interface for interactions is serial (the actor isn’t simultaneously doing multiple things), and comes with clear data ownership. For example, I suspect false sharing can be completely avoided. This comes at a cost of making some forms of parallelism more difficult, true, and aren’t ideal for many application domains (I wouldn’t try to rewrite a GPU shader to use actors for parallelism). I’m not trying to say that actors fit everything well. But I do think they can occupy a space closer to “mechanical sympathy” than most FP languages, and closer to “mathematical purity” than most imperative languages.For what it’s worth, here’s a raytracer in Monte with per-pixel parallelism via subprocesses. Each subprocess only knows the geometry relevant to its current pixel; if you were rendering multiple frames at a time, each frame would be isolated. The application domain isn’t the problem; the issue is that GPUs are programmed like straight-line DSPs and can’t support the pointer-juggling implied by fat pointers. (If you have something more efficient than fat pointers for implementing actors, let me know; they are the current state of the art.)
Could you expand this, perhaps? When I implemented an actor language, I found myself using techniques from Scheme and Smalltalk implementations; actors are like lambdas and also like objects. I think that mutation is largely orthogonal, in the sense that there are actor languages with no mutation, local mutation, tuple spaces, or global mutation.
When I implemented an actor language, I eventually realized that even if I used a mutation-free functional language for actor behaviour, I ended up getting a system that behaved like one with mutation, though with a somewhat more Smalltalkish flavour than Cish. That is to say: It was very easy to make an actor that behaves like an assignable variable (just have it implement a protocol that implements set! and get, the latter simply returning the last value that was set! into it), and I found that most of the actors I made for practical tasks essentially just used their call stacks to store state.
So I ended up with a stateful system, even though all the actors themselves had their behaviour defined in a purely functional manner. In practice message-send was a side-effecting operation, that often observably behaved like mutation with extra steps, meaning that reasoning about programs had to account for all the annoyances of just using mutation - but without the raw performance thereof. It was pretty fun to use this to play with the kinds of state management that could be built using nothing but send, receive and various patterns of recursion and continuations that are normally considered “functional” - but you also couldn’t reason about it like you would a pure functional language.
(This is not to say that actors are a bad idea; there are lots of other benefits to them. And something new and exciting may have happened in the actor space in the nearly 20 years since then.)
First of all, you’re definitely a step ahead of me in actually implementing a language – I formed most of my ideas by implementing an actor system in C++. But if one is implementing a language (rather than bolting a system onto an existing language), then I don’t think that this:
needs to be true. For example, if your actor implements “set” and “get” of a 32-bit integer (using a modified Pony syntax):
I don’t see why, in principle,
IntActorcan’t optimize to what in Rust would be anAtomicU32, so that performance of mutation could be recoverable: all the information needed to perform this optimization is available at compile-time. Or, if you wanted to keep the interface and throttle the speed of change for some reason, one could do something like:demonstrating that the mutation is easier to work with (though, admittedly, eliminating the opportunity for optimization described above). I’m skipping a lot since this is too long, but we did build something reasonably effective in C++, so I suggest that many possible objections can be overcome.
I have found it useful in the context of Erlang-like systems to distinguish between mutability and effects. It is indeed possible to immutably interact with other actors via message passing, but message send/receive is effectful, i.e. non-idempotent. Thus we give up one of the benefits of pure functional computation when we introduce actors. That said, IMO an actor-based system with explicitly tracked mutability and effects is an excellent place to land in the design space, precisely what I’m exploring with the Hemlock programming language.
That was a language name that I was keeping in my back pocket! How did you pick it?
It helps that the name “Hemlock” is remarkably underutilized for programming projects. Perhaps that’s due to people associating it with poison, but where I live it’s a native tree. Also, at the time I was trying hard to have “ML” in the name. It’s a happy accident that those things lined up, and there was the bonus double meaning added to BranchTaken – the root of the domain name I was already hosting the project at.
This is how I stumbled across it! It was by far the most attractive option. Great minds think alike!
There are a couple of references I’d bring to the conversation. I’m going to get details of these recommendations wrong, since they’re larger reads than I have time to check at the moment, but I hope they’re helpful in broad strokes.
First and foremost, Fred Brooks’ No Silver Bullet. One of the things I took away from Brooks’s essay is that resolving some part of the “software crisis” does not actually make engineering easier. Rather, it increases the scope and complexity of the domains to which we apply engineering. Whatever else you’d say, software is extremely more important to functioning society than it was in the ’60s when the Software Crisis was described.
Secondly, and I’m not as confident here, Peter Drucker’s The Effective Executive. I have the idea from Drucker (I believe) that “executives” (the way he used the term) are characterized by those whose decisions (as opposed to more straightforward execution of stereotyped activities) make material difference to the success of an organization. This is, I think, most of us, though not all in the same way. I bring this up because in the software industry, we’re working largely with thought-stuff, rather than physical or chemical processes. When the medium is thought, the bottleneck of forward progress is the speed of thought, and of communication. Being a bottleneck is difficult, and (I seem to recall) Drucker’s book had much to say about managing priorities in such circumstances. When forward progress is (more-or-less) always possible, time spent on business needs to be explicitly balanced with other desires and obligations. Feeling that the set of technical demands on practitioners constitutes a crisis feels (to me) like a failure of prioritization or of realistic planning.
I agree that unstructured plaintext is not ideal for logs – it’s painful to do automated log analysis when there’s no schema at all – but I don’t see why structured logging is so quickly dismissed. JSON-per-line is not that hard to read, and even if it is more painful than one would like, it’s trivial to translate to something more readable.
Json is brittle, poorly understood and hostile to the human reader.
Json per line is less brittle than text+regex if you want a modicum of structure, imho. With text, all the edge cases are terrible: can you emit a log message that spans multiple lines? Can you easily retrieve entries’ timestamps?
it’s as usual all about expectations. Unspoken often. Do I want reporting? Then logfile analysis may be inferior to a dedicated data outlet. Do I want bug or incident forensics? Then flexibility and readability may be more important than structure. I expect logging to log the unforeseen, the surprises. The less assumptions made upfront, the more likely to succeed.
I have yet to see the cases where complexity in logging is beneficial (indirectly, naturally) to the end user.
grep “something” | cut -d -f 3 “some_field_that_might_exist” | grep “something else” | cut -d “not everybody writes good terminal output” | echo “whoops I forgot some filenames have special characters”
vs –output json | jq “.thing[]|as_array()|that_i_want|{.CPU,.RAM}”
doesn’t convince me. Why not transform the plaintext logs to the data format of choice?
What about upward and downward compatibility over decades and multiple updates? IMO brittle.
How often do you need to investigate decade-old logs?
oh, you’re right. One usually deletes them after a few weeks. Still I uphold the rest of the argument.
This may be too self-promoting, but you’re touching on a theme that I spent a lot of my on-line writing time talking about. I touched on novice vs. expert understanding of code in a stackexchane answer. I named my blog “Communicating with Code”, and my favorite posts are on that theme: Design Is Learning Elegance Is Teaching When Writing Code, Know Your Audience Sources of Accidental Complexity
I hope these may be of interest here…
In the future, you can find out information like this in a quicker way by grepping through /usr/include and reading the definition of
errnoand/or reading glibc source.If you want to understand libc, I’d recommend you steer away from glibc (it’s a torturous mess of indirection, macros, and incomprehensibility) and instead read musl or a *BSD libc which are much easier to grok.
I agree that glibc is really tough to follow… But if you want to know how this behaves for your system, then you have to read glibc, not musl. And it may even tell you interesting things. For
errno, for example, even if we restrict to just Linux on x86_64, it works differently in different places. Follow the breadcrumbs, and you’ll eventually find theSYSCALL_SET_ERRNOmacro. And we see that there’s a differenterrnoin different contexts: the dynamic linker uses its own copy, which does not appear to be thread-local; the C library uses the__libc_errnosymbol, and other parts of the distribution (such as libpthread) useerrno(though my guess is that these resolve to the same address most of the time), which are at known offsets from the thread-local-storage base register. This suggests thatdlopen(which is largely implemented in dynamic linker code) doesn’t seterrnoif it fails? Now I feel like testing this… I wouldn’t have wondered if I hadn’t actually gone through my own system’s code.It’s not necessarily clear from header files alone. For example stuff gets weird with vDSO and address space mapping. Also the thread local variable stuff gets confusing if you’re not familiar with the details. But yes, you are right in theory.
What I don’t understand is why everyone should have to go through this trouble (which isn’t all that complicated in the end, I realise), instead of this being upfront in documentation/man pages?
cppreference.com is your friend here. It’s the best resource for reading stuff from the C and C++ standards. The actual standards documents are a tough slog.
As for Linux man pages, it seems to be pretty clear about it (although this one is for C99, not C11).
That doesn’t tell you how it’s implemented. There are at least three plausible ways of implementing it given that description:
It happens that most (all?) *NIX systems, including Linux, pick the first option from this list. If I were designing a POSIX system today, I’d be somewhat tempted by option 2 so that the system calls could implement the POSIX semantics directly even without libc, at the cost of one extra
copyoutper failed system call. The main down side is that system calls would then have no mechanism for reporting failure as a result of the thread pointer being invalid, but signals can handle that kind of everything-is-broken failure.True, the documentation doesn’t say anthing about implementation (thankfully, at least in the case of the C standard), but as I understood the OP the question was about whether
errnois kernel-based or libc-based in general. Given the fact that it is documented as part of the C standard that should be a big clue that it is libc-based. On the systems I support it can only be libc based because there is no operating system.If the OP question was really about whether
errnois libc or kernel based on Linux, then there is some room for ambiguity. Perhaps the article should have phrased the question better.Why? Signals are part of the C standard, but are implemented in the kernel on most *NIX systems, for example. The POSIX standard doesn’t differentiate between kernel and libc functionality at all: it is defined in terms of C interfaces, but some things are implemented in the kernel and some in libc. It’s entirely reasonable to ask what the division of responsibilities between kernel and libc is for any part of the C or POSIX standard, particularly a part that is set on system call returns.
That doesn’t mean that file I/O is a purely libc service in a hosted environment, yet it is also specified in the C standard.
When I was working on a toy-kernel, my Idea was that syscalls would return carry-zero for success and a opaque handle on error with the carry bit set.
You could interrogate the kernel and vDSO to learn more, so finding out if you can retry would be relatively simple and fast, stored in the vDSO, but you could get stack traces over the various nanokernel services that were touched and tell the user what went wrong; (in pseudocode)
I keep pondering reaching LLVM about the carry-bit-on-failure calling convention. I think it would be a nice way of implementing lightweight exceptions: set carry on exception return and implement exceptions in the caller as branch-on-carry to the unwind handler. You’d get one extra branch per call, but in exchange for that you don’t need an unwind library.
This calling convention for exceptions was proposed for C++ by Herb Sutter.
The extra branch per call is virtually free if you branch to the error case and the error is rare (and it should be). Both on big OoO super scalar and small in order microarchs.
Also you shouldn’t place a subroutine call in your hot loop 😇.
I don’t think Herb proposed a calling convention in that document (it’s purely C++, which regards the ABI as a separable concern). I did discuss this as a possibility with him around the time that he wrote that though.
See top of page 17.
Some manual pages do in fact talk about it in more detail; e.g., errno(3C).