They will really confuse tools like valgrind, heap checkers or the Address Sanitizer unless you add some kind of special instrumentation.
“[Only] a minority of programs inherently require general purpose allocation” got a WTF from me. We must not write the same kind of programs! Though I suppose if you don’t care about memory usage you can just make a big enough arena and anything will run…
Naming your allocation macro new is a terrible idea IMHO. So is redefining sizeof to return a different type.
I agree it’s nice for code not to have to check allocation failures, but abort as a failure handler is a bad idea, except possibly in some ephemeral CLI tool. At the very least raise a different signal; for instance, macOS doesn’t create a crash report or display any “unexpectedly quit” alert when a process aborts with SIGABORT.
Using setjmp/longjmp is a slippery slope towards reimplementing C++ exceptions. I know because I’ve done it in the past. You end up needing “catch blocks” to clean up resources.
The article is about C, but Zig has Valgrind integration in the compiler (it integrates with the undefined value) so arena allocators do not suffer from this problem. Also they play well with Zig’s debug allocator which is the equivalent of Address Sanitizer.
“[Only] a minority of programs inherently require general purpose allocation” got a WTF from me.
I think it’s true. Or, more accurately, that different parts of a program benefit from allocators tuned for different things and few require the full generality of malloc. This is one of the reasons that I like the region model in Verona: it’s an extension point where you can easily define the allocator and deallocation strategy (arena, reference counting, tracing) to use for a particular data structure. That said, there’s always Hans Boehm’s early ’90s paper in my mind, where he showed that having a lot of different allocators can hurt more in fragmentation and poor TLB / cache usage than it saves in fast allocation and deallocation (mostly true for type pooling).
You probably remember the NSZone infrastructure that NeXT used to great effect and Apple ripped out when they went from targeting machines where 8 MiB of RAM was a lot to ones where 64 MiB was a small amount.
This isn’t an aliasing concern, unless you were to access the elements of the underlying array while the objects allocated “inside” it were still live (edit: and then there’s a carve-out to the aliasing rule anyway). There is indeed (also) a carve-out to let objects be allocated inside an unsigned char array providing storage (in C++, but not in C). I agree: the “laundering” isn’t necessary (if the array type in the example is changed to unsigned char).
You do, however, have to use placement new to create the objects in the storage provided by the array. That’s required regardless.
The problem something I’ve discussed with a couple of WG21 members in the past. Some allocators, in particular the .bss allocator (but also memory coming directly from mmap) guarantee that the memory is full of zeroes. But some C++ types (including arrays in structures) will force the object’s constructors to zero the memory. This first bit me when I created a type for a slab in a memory allocator and ended up with a memset that wrote 8 MiB of zeroes into zero-initialised memory (causing every single page to be faulted in as read-write and completely killing performance). There’s no way of avoiding this in C++. The thing that I want is something like a constructor overload that takes a tag type that guarantees that the memory is zeroed, so that allocators that make this guarantee can have simple constructors (the default constructor for many objects will be empty if the space is guaranteed to be zeroed first). If you’re lucky, the compiler will optimise out stores of zeroes if it inlines the constructor. LLVM is pretty good at doing this for globals, but you’re relying on a particular set of optimisations working, not something that’s part of the language.
Note that (as far as I am aware) the only way of actually changing the type of memory in C++ is to do a placement new into it, so this kind of laundering is probably UB.
That surprised me too, but maybe that carve-out is only for things like memcpy, where you view some object using a char pointer. The static array is kind of the opposite situation: it really contains char objects, and you’re trying to view it with some other pointer type.
“If a complete object is created (8.5.2.4) in storage associated with another object e of type “array of N
unsigned char” or of type “array of N std::byte” (21.2.1), that array provides storage for the created
object”
Note that, oddly enough, it has to be unsignedchar (or byte), regular char is not allowed.
Note that, oddly enough, it has to be unsigned char (or byte), regular char is not allowed.
Well, kind of. The char type is equivalent to either signed char or unsigned char, but that choice is implementation defined. This means that regular char might be allowed, depending on the target platform or compiler flags. I’ve seen this lead to some fun performance anomalies when code moved from char-is-signed targets to char-is-unsigned ones and suddenly the compiler became a lot more conservative about aliasing assumptions.
No, in C++ it’s a distinct type. It is either signed or unsigned, but it is not the same type as either signed char or unsigned char. See https://godbolt.org/z/9qM7T5K8b
Huh, interesting. I wonder if that cause strange behaviour in C interop. How does this interact with the promotion rules (is char always equivalent rank to one of the other two)?
I believe char, unsigned char and signed char all have the same rank. None of them will be promoted to any of the other. I’ve never known it to be an issue with C interop but there could be some edge cases I’ve not seen. Even C doesn’t specify (AFAIK) that char is actually the same type as either signed char or unsigned char, but in C I don’t think there’s any context where it matters, unlike C++ which has template specialisation and function overloads and so the distinct third type is visible.
However, reading the linked post again: I think it’s talking about C and not C++ at all (despite the C++ tag on this post). In that case, there’s no such allowance.
Eh, in their example, I think stack allocation is preferable to allocating out of arena scratch, and calling a callback with each line is preferable to storing the lines as a list in arena perm. So this is less than persuasive about the goodness of what the author calls arenas.
Stack allocation is dangerous: if you are allocating a variable-sized object, you are creating a gadget that might allow an attacker to move the stack to an arbitrary address. Or, if your system has stack clash protection, the compiler will insert a loop to touch each page and cause a crash instead of a handy gadget.
With arenas, it is possible to have an allocation limit that is very large and handled gracefully. You can’t get both with stack allocation, and in many cases the safe maximum stack size is really hard to discover.
And, if your system is not Windows, you can’t detect stack allocation failure in a usefully recoverable way. On Windows, you’ll get an SEH exception from stack overflow but on *NIX you’ll just get a pointer that (hopefully) crashes if you use it (or overlaps some other memory object and causes memory corruption). On CHERI systems you’ll get an untagged pointer.
Oh, and doing variable-sized allocations on the stack means you need a frame pointer and possibly a base pointer, so burns an extra register or two for the duration of the call. And means that stack spills now can’t use the fast sp-relative instructions.
Does anyone know why arena allocators are not part of standard lib?
Having a standard way of allocating memory all at once would encourage developers towards less brittle strategies, I imagine, and thus lead to to more robust, less crashy C applications.
C++ 17 added polymorphic allocators in the memory_resource header; the monotonic_buffer_resource class is an arena allocator. There’s limited support for debugging and usage tracking, though, so I would probably start with Bloomberg’s BDEAllocators.
Arena allocators are a useful tool, but
new
is a terrible idea IMHO. So is redefiningsizeof
to return a different type.abort
as a failure handler is a bad idea, except possibly in some ephemeral CLI tool. At the very least raise a different signal; for instance, macOS doesn’t create a crash report or display any “unexpectedly quit” alert when a process aborts with SIGABORT.The article is about C, but Zig has Valgrind integration in the compiler (it integrates with the
undefined
value) so arena allocators do not suffer from this problem. Also they play well with Zig’s debug allocator which is the equivalent of Address Sanitizer.I think it’s true. Or, more accurately, that different parts of a program benefit from allocators tuned for different things and few require the full generality of malloc. This is one of the reasons that I like the region model in Verona: it’s an extension point where you can easily define the allocator and deallocation strategy (arena, reference counting, tracing) to use for a particular data structure. That said, there’s always Hans Boehm’s early ’90s paper in my mind, where he showed that having a lot of different allocators can hurt more in fragmentation and poor TLB / cache usage than it saves in fast allocation and deallocation (mostly true for type pooling).
You probably remember the NSZone infrastructure that NeXT used to great effect and Apple ripped out when they went from targeting machines where 8 MiB of RAM was a lot to ones where 64 MiB was a small amount.
Can highly recommend also doing:
Why is pointer laundering required here? I thought there was an explicit carve-out to let
char
pointers alias anything.This isn’t an aliasing concern, unless you were to access the elements of the underlying array while the objects allocated “inside” it were still live (edit: and then there’s a carve-out to the aliasing rule anyway). There is indeed (also) a carve-out to let objects be allocated inside an
unsigned char
array providing storage (in C++, but not in C). I agree: the “laundering” isn’t necessary (if the array type in the example is changed tounsigned char
).You do, however, have to use placement new to create the objects in the storage provided by the array. That’s required regardless.
The problem something I’ve discussed with a couple of WG21 members in the past. Some allocators, in particular the .bss allocator (but also memory coming directly from mmap) guarantee that the memory is full of zeroes. But some C++ types (including arrays in structures) will force the object’s constructors to zero the memory. This first bit me when I created a type for a slab in a memory allocator and ended up with a memset that wrote 8 MiB of zeroes into zero-initialised memory (causing every single page to be faulted in as read-write and completely killing performance). There’s no way of avoiding this in C++. The thing that I want is something like a constructor overload that takes a tag type that guarantees that the memory is zeroed, so that allocators that make this guarantee can have simple constructors (the default constructor for many objects will be empty if the space is guaranteed to be zeroed first). If you’re lucky, the compiler will optimise out stores of zeroes if it inlines the constructor. LLVM is pretty good at doing this for globals, but you’re relying on a particular set of optimisations working, not something that’s part of the language.
Note that (as far as I am aware) the only way of actually changing the type of memory in C++ is to do a placement new into it, so this kind of laundering is probably UB.
That surprised me too, but maybe that carve-out is only for things like memcpy, where you view some object using a char pointer. The static array is kind of the opposite situation: it really contains char objects, and you’re trying to view it with some other pointer type.
[intro.object]
“If a complete object is created (8.5.2.4) in storage associated with another object e of type “array of N unsigned char” or of type “array of N std::byte” (21.2.1), that array provides storage for the created object”
Note that, oddly enough, it has to be unsigned
char
(or byte), regularchar
is not allowed.Well, kind of. The char type is equivalent to either signed char or unsigned char, but that choice is implementation defined. This means that regular char might be allowed, depending on the target platform or compiler flags. I’ve seen this lead to some fun performance anomalies when code moved from char-is-signed targets to char-is-unsigned ones and suddenly the compiler became a lot more conservative about aliasing assumptions.
No, in C++ it’s a distinct type. It is either signed or unsigned, but it is not the same type as either signed char or unsigned char. See https://godbolt.org/z/9qM7T5K8b
Huh, interesting. I wonder if that cause strange behaviour in C interop. How does this interact with the promotion rules (is char always equivalent rank to one of the other two)?
I believe
char
,unsigned char
andsigned char
all have the same rank. None of them will be promoted to any of the other. I’ve never known it to be an issue with C interop but there could be some edge cases I’ve not seen. Even C doesn’t specify (AFAIK) thatchar
is actually the same type as eithersigned char
orunsigned char
, but in C I don’t think there’s any context where it matters, unlike C++ which has template specialisation and function overloads and so the distinct third type is visible.However, reading the linked post again: I think it’s talking about C and not C++ at all (despite the C++ tag on this post). In that case, there’s no such allowance.
Eh, in their example, I think stack allocation is preferable to allocating out of arena scratch, and calling a callback with each line is preferable to storing the lines as a list in arena perm. So this is less than persuasive about the goodness of what the author calls arenas.
Stack allocation is dangerous: if you are allocating a variable-sized object, you are creating a gadget that might allow an attacker to move the stack to an arbitrary address. Or, if your system has stack clash protection, the compiler will insert a loop to touch each page and cause a crash instead of a handy gadget.
With arenas, it is possible to have an allocation limit that is very large and handled gracefully. You can’t get both with stack allocation, and in many cases the safe maximum stack size is really hard to discover.
And, if your system is not Windows, you can’t detect stack allocation failure in a usefully recoverable way. On Windows, you’ll get an SEH exception from stack overflow but on *NIX you’ll just get a pointer that (hopefully) crashes if you use it (or overlaps some other memory object and causes memory corruption). On CHERI systems you’ll get an untagged pointer.
Oh, and doing variable-sized allocations on the stack means you need a frame pointer and possibly a base pointer, so burns an extra register or two for the duration of the call. And means that stack spills now can’t use the fast sp-relative instructions.
Does anyone know why arena allocators are not part of standard lib?
Having a standard way of allocating memory all at once would encourage developers towards less brittle strategies, I imagine, and thus lead to to more robust, less crashy C applications.
glibc
hasobstack
s https://www.gnu.org/software/libc/manual/html_node/Obstacks.htmlC++ 17 added polymorphic allocators in the memory_resource header; the monotonic_buffer_resource class is an arena allocator. There’s limited support for debugging and usage tracking, though, so I would probably start with Bloomberg’s BDE Allocators.
I kinda like the perm/scratch pattern, though I’m a bit skeptical of the claims of how general this technique is
e.g. it still relies on a stack discipline, so if you’re doing async / state machine style code, that’s completely out the window