Inlining really is a mess. #[inline(always)] has occasionally proven useful, but I still find it too coarse-grained. I’m surprised that Rust does not have caller-based inlining at this point. But to add to the point of the article: I have observed similar gains with forced inlining in my single-pass compiler. On the order of 15%-25% increase in compilation throughput, for a codebase that was already written in a mostly data-oriented way.
Can confirm mixed results with panic = abort. It didn’t seem to improve things for me, so I’ve since settled on returning Result<T, ()> in favor of Result<T, Box<MyError>> for better stack efficiency. Diagnostics are pushed into the state passed around in every method.
Also with the allocators I’ve had the same experience. Memory allocation cost was negligible in most of my measured workloads. Most data structures were memory pools that would be cleared and re-used after each function codegen step. Some places benefited from using a slab arena allocator or memory tiling, but that’s about it.
This was an amazing read, great job on both detail and design! One thing I do wonder is if for the portion of sorting and cloning data, could moving the memory into a sorted order to match the sorted root data structures instead of just naive cloning reduce memory pressure/usage and therefore execution time? I don’t know if this is possible in safe Rust though, so it may not be an option.
As mentioned in the article, a principled method over naive cloning would be to use an arena to precisely control in which order allocated leafs are organized in memory, which is possible in Rust although usually clunky due to the self-referential nature of arenas (and it’s definitely more code and refactoring than adding a simple .clone() call).
I’m not sure I understood your question though, perhaps you meant whether it would be possible to swap the leaf objects in place, rather than cloning them all and then deleting them all? If it could be done, the main benefit would be to limit the peak memory usage. However, it would be difficult to do in practice, because the main reason for having allocated objects is that they have a variable size, so swapping them would be like trying to fit a square peg in a round hole. In theory, it should be possible to write an in-place defragmentation algorithm, but in practice that’d be lots of unsafe Rust and would likely end up slower than a clone (for those who can afford the peak memory of a clone).
Inlining really is a mess.
#[inline(always)]has occasionally proven useful, but I still find it too coarse-grained. I’m surprised that Rust does not have caller-based inlining at this point. But to add to the point of the article: I have observed similar gains with forced inlining in my single-pass compiler. On the order of 15%-25% increase in compilation throughput, for a codebase that was already written in a mostly data-oriented way.Can confirm mixed results with
panic = abort. It didn’t seem to improve things for me, so I’ve since settled on returningResult<T, ()>in favor ofResult<T, Box<MyError>>for better stack efficiency. Diagnostics are pushed into the state passed around in every method.Also with the allocators I’ve had the same experience. Memory allocation cost was negligible in most of my measured workloads. Most data structures were memory pools that would be cleared and re-used after each function codegen step. Some places benefited from using a slab arena allocator or memory tiling, but that’s about it.
Overall cool article, thanks for posting!
This was an amazing read, great job on both detail and design! One thing I do wonder is if for the portion of sorting and cloning data, could moving the memory into a sorted order to match the sorted root data structures instead of just naive cloning reduce memory pressure/usage and therefore execution time? I don’t know if this is possible in safe Rust though, so it may not be an option.
As mentioned in the article, a principled method over naive cloning would be to use an arena to precisely control in which order allocated leafs are organized in memory, which is possible in Rust although usually clunky due to the self-referential nature of arenas (and it’s definitely more code and refactoring than adding a simple
.clone()call).I’m not sure I understood your question though, perhaps you meant whether it would be possible to swap the leaf objects in place, rather than cloning them all and then deleting them all? If it could be done, the main benefit would be to limit the peak memory usage. However, it would be difficult to do in practice, because the main reason for having allocated objects is that they have a variable size, so swapping them would be like trying to fit a square peg in a round hole. In theory, it should be possible to write an in-place defragmentation algorithm, but in practice that’d be lots of unsafe Rust and would likely end up slower than a clone (for those who can afford the peak memory of a clone).
I’d love to know what tool he’s using for the diagrams. Is there a text-based tool that can generate them?
Presumably Excalidraw.
You can read the source of the SVG files and find
<!-- svg-source:excalidraw -->on the second line.Still mouse-based and therefore sure to give me RSI, but looks nice!