as an ACM member I was disappointed to see that in an ACM article, so I called it out, and they have now updated the article. Thanks @Gaelan and @FeepingCreature for highlighting this unacceptable language, hopefully the ACM will learn and improve.
I’m a snmalloc, we returned null for a 0-byte realloc call. We changed it because it broke a load of real-world code. There is no other way for realloc to signal failure and a load of things wrap realloc in a call that aborts on a null return.
Given that the WG14 has historically refused to make the language better for fear of breaking existing code, I’m surprised that they went in this direction.
No, UB means that there does not have to exist a definition. Any implementation is free to do whatever it wants. If C says it’s UB but POSIX says that it’s well-defined, then a non-POSIX C implementation can do anything, but an implementation that complies with both the C and POSIX specs has to do whatever POSIX wants. This is easy to reason about because the only constraints that apply come from POSIX, you don’t need to reconcile different C and POSIX rules.
My understanding is that it’s two different answers to the question “does an implementation define this?” – “it MAY” vs “it MUST”. In other words, “undefined” == C imposes no restrictions; “unspecified” == not required to yield the same behavior every time, but there’s some set of allowable behaviors; “implementation-defined” == the implementation must pick, document and deterministically yield a specific behavior.
If something is implementation defined, then any given implementation of the C language must define it in a specific way and must be consistent. If it is undefined, an implementation may nondeterministically do different things for the same behaviour in the source. If something is UB, an implementation is not required to nondeterministically do unexpected things, it may choose to consistently do something. For example, C says that out of bounds accesses are UB but in CHERI C we deterministically trap. On *NIX systems, we deliver a signal. C says that use after free is UB, but on CHERIoT we deterministically load an invalid pointer and if you use it then we trap and invoke your compartment’s error handler.
It’s fine for a particular implementation to define some classes of UB, it’s just not a requirement. In contrast, an implementation must define cases of IB.
There is also the license to delete your code aspect. If I have realloc(ptr, size) in my code, is the compier allowed to infer that the size == 0 check I have over there is never taken, and delete it?
That to me is the defining difference between UB and IB. I don’t get why IB was not enough here. Or why they had to care that BSD’s implementation was incompliant.
allow POSIX to define the otherwise undefined behavior however they please.
Essentially the WG kept lowering expectations for realloc(ptr, 0), and ultimately just gave up on it entirely because there was no way to reconcile the spec and (non-POSIX) implementations.
The essay’s inflammatory take on the subject does match its dubious judgement on unreachable (whose entire point is to be a user-controlled UB, otherwise behaving exactly like other UBs), as well as their inane assertion on ZSTs:
Scour the annals of computing and you’ll find few things more perfectly useless than a zero-length object and few things more hazardous than a pointer thereto.
(with a bonus side-swipe insult thrown in for good measure)
This is something which was defined to free on some major implementations, and a lot of code relies on that. So this just happens, and, even if the behavior was completely unreasonable, changing that is a major silent backwards compatibility break, and backwards compatibility is one of the only two benefits C has over other languages. It is a bit as if Rust added a new unsafe API without unsafe marker motivating it as “it’s hard to misuse, so it should be fine”.
The second answer is that’s a genuinely useful API. If you do something like a growing array, then realloc handles special cases for you. If you build a generic library and want you users to be able to supply the allocator, realloc is the only parameter you need, because it’s the whole allocator, it’s contains within both alloc and free.
I have debugged code that did precisely this. In snmalloc (as I mentioned earlier), we returned NULL for malloc / realloc with zero length. This is permitted by the spec and is fine. Unfortunately, some code crashed hard with this. It turned out that they used a realloc wrapper that checked for NULL and called abort on NULL values, because that is the only way that realloc has of reporting failure (yay, in-band signalling). The code that I looked at was a generic header-only tree implementation, where each tree node owned an array of something. When you resized the arrays, it called realloc. It avoided special casing zero-length nodes, because they were rare, and just always stored a zero value in the length field and a pointer to whatever malloc(0) returned. Because free(NULL) is well defined (does nothing), this simplified the code. Unfortunately, this ended up failing with an allocator that returned NULL on realloc(ptr, 0), because of their extra check.
I could have fixed the code, but after the second time I saw exactly the same pattern, we decided to just make snmalloc return the equivalent of malloc(1) for these cases.
I no longer recall the exact details but on some ancient systems reallocing something with size 0 was equivalent to free() + some quirks. I think, for example, that on some MS-DOS implementations it would not trigger a reordering of memory blocks, so you got a slight performance gain if you had to frequently alloc and free stuff in a loop. I may be misremembering the particular quirk here, I haven’t seen code doing realloc(p, 0) in a very, very long time.
I think it’s 100% ancient history, this has not been a valid idiom since at least C11.
Edit: I suspect this was never intended by the standard per se, and it was just a historical accident, which further led to weird behaviour like the one @david_chisnall mentions here.
realloc on macOS 13.3 (definitely not ancient) is documented to be “free() + new alloc” (although that man page is also dated 2008 which may count as “ancient system”.)
If size is zero and ptr is not NULL, a new, minimum sized object is allocated and the original object is freed.
Oh, there’s even more history there :-D. The “ancient history” bit I’m referring to is pre-C99. After C99, which made some additional provisions related to realloc, POSIX was also amended (see the note on application usage here) to forbid an implementation from freeing and returning NULL on realloc(p, 0) with a non-null ptr. macOS did the straightforward thing there, freeing p and returning the size 0 chunk it allocates.
The ancient history that had useful quirks is from the C89 era. I could swear I had some useful info on it but it was in a dead tree book that I can’t find anymore, some early 90s book on DOS programming.
… excuse you?
Did somebody put
s/mad/neurodivergent/
in their webserver config? I know euphemisms are a treadmill, but this is unwarranted, no?PS: also, what the fuck C23.
It… yeah. I had to stop reading after that.
“More inclusive language” isn’t worth the paper it’s written on if you’re just going to use it as a new flavor of slur.
In my opinion this is just the author’s style. Sarcastic usage of euphemistic language.
Good to know, so I don’t have to bother reading their crap.
as an ACM member I was disappointed to see that in an ACM article, so I called it out, and they have now updated the article. Thanks @Gaelan and @FeepingCreature for highlighting this unacceptable language, hopefully the ACM will learn and improve.
That’s how every slur was originally born
As an autistic person, I can confirm that zero-length objects are always on my mind. /s
I’m a snmalloc, we returned null for a 0-byte realloc call. We changed it because it broke a load of real-world code. There is no other way for realloc to signal failure and a load of things wrap realloc in a call that aborts on a null return.
Given that the WG14 has historically refused to make the language better for fear of breaking existing code, I’m surprised that they went in this direction.
Wait what? It has to be April 1st joke, no?
Helpful context: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf
In particular, it says:
This clashes with my understanding of what UB is. Shouldn’t it be implementation-defined behavior in this case?
No, UB means that there does not have to exist a definition. Any implementation is free to do whatever it wants. If C says it’s UB but POSIX says that it’s well-defined, then a non-POSIX C implementation can do anything, but an implementation that complies with both the C and POSIX specs has to do whatever POSIX wants. This is easy to reason about because the only constraints that apply come from POSIX, you don’t need to reconcile different C and POSIX rules.
What’s the difference between “undefined” and “implementation defined” behavior then?
My understanding is that it’s two different answers to the question “does an implementation define this?” – “it MAY” vs “it MUST”. In other words, “undefined” == C imposes no restrictions; “unspecified” == not required to yield the same behavior every time, but there’s some set of allowable behaviors; “implementation-defined” == the implementation must pick, document and deterministically yield a specific behavior.
If something is implementation defined, then any given implementation of the C language must define it in a specific way and must be consistent. If it is undefined, an implementation may nondeterministically do different things for the same behaviour in the source. If something is UB, an implementation is not required to nondeterministically do unexpected things, it may choose to consistently do something. For example, C says that out of bounds accesses are UB but in CHERI C we deterministically trap. On *NIX systems, we deliver a signal. C says that use after free is UB, but on CHERIoT we deterministically load an invalid pointer and if you use it then we trap and invoke your compartment’s error handler.
It’s fine for a particular implementation to define some classes of UB, it’s just not a requirement. In contrast, an implementation must define cases of IB.
There is also the license to delete your code aspect. If I have
realloc(ptr, size)
in my code, is the compier allowed to infer that thesize == 0
check I have over there is never taken, and delete it?That to me is the defining difference between UB and IB. I don’t get why IB was not enough here. Or why they had to care that BSD’s implementation was incompliant.
Nope (7.24.3.7p3).
An interesting note, however, is that this decision stems from the observed non-portability of
realloc(ptr, 0)
and was taken specifically to:Essentially the WG kept lowering expectations for
realloc(ptr, 0)
, and ultimately just gave up on it entirely because there was no way to reconcile the spec and (non-POSIX) implementations.The essay’s inflammatory take on the subject does match its dubious judgement on
unreachable
(whose entire point is to be a user-controlled UB, otherwise behaving exactly like other UBs), as well as their inane assertion on ZSTs:(with a bonus side-swipe insult thrown in for good measure)
I am curious – why would someone realloc something to size 0? I’ve never written large systems in C and I can’t imagine a use case.
There are two answers here:
This is something which was defined to free on some major implementations, and a lot of code relies on that. So this just happens, and, even if the behavior was completely unreasonable, changing that is a major silent backwards compatibility break, and backwards compatibility is one of the only two benefits C has over other languages. It is a bit as if Rust added a new unsafe API without
unsafe
marker motivating it as “it’s hard to misuse, so it should be fine”.The second answer is that’s a genuinely useful API. If you do something like a growing array, then realloc handles special cases for you. If you build a generic library and want you users to be able to supply the allocator,
realloc
is the only parameter you need, because it’s the whole allocator, it’s contains within both alloc and free.I have debugged code that did precisely this. In snmalloc (as I mentioned earlier), we returned NULL for malloc / realloc with zero length. This is permitted by the spec and is fine. Unfortunately, some code crashed hard with this. It turned out that they used a realloc wrapper that checked for NULL and called abort on NULL values, because that is the only way that realloc has of reporting failure (yay, in-band signalling). The code that I looked at was a generic header-only tree implementation, where each tree node owned an array of something. When you resized the arrays, it called realloc. It avoided special casing zero-length nodes, because they were rare, and just always stored a zero value in the length field and a pointer to whatever malloc(0) returned. Because free(NULL) is well defined (does nothing), this simplified the code. Unfortunately, this ended up failing with an allocator that returned NULL on realloc(ptr, 0), because of their extra check.
I could have fixed the code, but after the second time I saw exactly the same pattern, we decided to just make snmalloc return the equivalent of malloc(1) for these cases.
I no longer recall the exact details but on some ancient systems
realloc
ing something with size 0 was equivalent tofree()
+ some quirks. I think, for example, that on some MS-DOS implementations it would not trigger a reordering of memory blocks, so you got a slight performance gain if you had to frequentlyalloc
andfree
stuff in a loop. I may be misremembering the particular quirk here, I haven’t seen code doingrealloc(p, 0)
in a very, very long time.I think it’s 100% ancient history, this has not been a valid idiom since at least C11.
Edit: I suspect this was never intended by the standard per se, and it was just a historical accident, which further led to weird behaviour like the one @david_chisnall mentions here.
realloc
on macOS 13.3 (definitely not ancient) is documented to be “free() + new alloc” (although that man page is also dated 2008 which may count as “ancient system”.)Oh, there’s even more history there :-D. The “ancient history” bit I’m referring to is pre-C99. After C99, which made some additional provisions related to realloc, POSIX was also amended (see the note on application usage here) to forbid an implementation from freeing and returning
NULL
onrealloc(p, 0)
with a non-nullptr
. macOS did the straightforward thing there,free
ingp
and returning the size 0 chunk it allocates.The ancient history that had useful quirks is from the C89 era. I could swear I had some useful info on it but it was in a dead tree book that I can’t find anymore, some early 90s book on DOS programming.