1. 55
  1. 4

    Cool article!

    Nit: it’s spelled kosher with a k.

    1. 2

      Thanks! I was wondering why spellchecker didn’t like that…

    2. 3

      This is interesting, thanks for writing it!

      Some thoughts.

      (1) This is essentially fallible dynamic memory allocation (from an arena), right? You fail the function app::run() when said function requests more memory than was passed in.

      This seems to result in a system which reliably does not run out of memory, but for any particular input it is (consistent, but) hard to predict whether the system will render said input. To me, “static allocation of [memory] at startup” suggests making the opposite choice that you made; instead of

      The scenes we are going to render are fundamentally dynamically sized. They can contain arbitrary number of objects. So we can’t just statically allocate all the memory up-front.

      I’d expect the system to limit the number of objects, and guarantee success; or maybe app::run() can require the caller to pass in e.g. 12 bytes per triangle. (I think the only allocation in your program that isn’t just N times the number of triangles or whatever is the bounding volume hierarchy?)

      (2) You allow the user to configure the amount of memory at runtime. If you’d drop that requirement, you could just stack-allocate (a const generic amount of?) scratch space and bypass quite a bit of complexity. Indeed, stack-allocation seems fairly common in memory-limited (embedded) C?

      (3) I may be mistaken, but the bounding volume hierarchy construction does not appear to have an obvious maximum number of recursive calls (i.e. an obvious limit to the amount of stack space consumed)? Using an explicit stack may fit best with the “fallible dynamic memory allocation” strategy of the rest of this code?

      Again, thanks for writing this! And please do excuse me if I misunderstood something; I’m a little worried that there isn’t already a comment along the above lines here or in r/rust.

      1. 2

        1

        I think this fairly close, but note quite, see this comment.

        2

        Yeah, I’d say I used the word “static” completely wrong – this is definitely not static, as you can render bigger scenes on bigger machines. and smaller scenes on smaller machines. Allocating just everything in statics, if you can, is of course the best solution, but that does limit you to working with essentially constant amount of data, without the ability to scale down or up without a rebuild. Which is often fine!

        3

        The depth is guaranteed to be logarithmic in size, as at each step we divide triangles into equal halves. So, by the time you run out of stack, you run out of the heap as well (but yeah, “stack” is another thing which prevents truly reliable programs. In the user-space, there’s usually little guarantees about how much stack you have, and how much stack is required for various outside-of-your-control library functions).

        Traversal can be worst-case linear, and that’s one of the reasons its written in a non-recursive fashion. It also uses “constant amount of data” approach for the workqueue of items:

        Ideally, you’d allocate that from a scratch space and pass that in, but that requires extending the in_parallel API to allow some per-thread setup. This is not too hard, but I ran out of steam by that point :-)

        1. 1

          Thanks for your response!

          (1) Yes, I think we mean the same thing there - the C developer in me doesn’t expect an arena allocator to be type-safe, but just to hand out void *. ;-)

          (Sorry, I didn’t check for new comments posted after I started writing. Thanks for the reference!)

          (3) I see, thanks for the clarification and extra information! I agree that being logarithmic in the input is probably good enough in practice.

      2. 2

        I am pretty new to Rust, so excuse the ignorant question(s)… Could you do some of this by changing memory allocators? Or is there only a single global allocator, whereas you need (at least) two to make this work? I would think the Rust compiler would need an arena allocator for efficiency, which would seem to present similar difficulties. If so, how does that work?

        1. 4

          Rust only lets you set the global allocator for now. There is work on standardizing generic allocators in std containers, like C++ does in the STL. There’s also a proposal for generic storages, which would allow implementing things like tinyvec as a specialization of Vec from std rather than a completely separate type.

          1. 1

            I found a good article on Rust arena allocators that seems to come out a lot simpler than the “hard mode” code. Now it’s on my todo list to understand why the same techniques wouldn’t work for the ray tracer.

            1. 4

              Arenas definitely would work. The article is essentially about using a type of an arena allocator exclusively, the hard way.

              1. 4

                I think that’s a bit more nuanced. With arenas, you’d either have to make them dynamic (so, backed by a real allocator with realloc&free) or you need to allocate arenas for each type up-front (so, instead of learning the number of, eg, triangles from the input, you’d have a hard guesstimate limit based on total available memory at the start).

                I’d love to see this re-implemented using a more standard arena! My guess would be that the result would be substantially different

            2. 1

              Is allocation really so special that it needs language support (versus defining an allocator trait and passing impls)? Is the idea that allocation is just so common that you don’t want to have to pass an allocator to every function? If so, why not solve this by allowing generics to specify a default implementation so you could use it for loggers and other things?

              1. 3

                We don’t want to pass around an allocator everywhere, but we do want to be able to redefine the allocator that std uses, rust doesn’t have a great way to handle this other than a magic language feature.

                Maybe don’t quote me on this, but I believe the optimizer also wants to be aware of allocations, so that it can remove unnecessary alloc/free pairs even though that has observable side effects like changing what address future allocs are allocated at.

                1. 1

                  The only necessary language support for allocators in Rust is, I believe, the #[global_allocator] attribute. Rust doesn’t have a new operator like C++, it has the Box type. There are other attributes and compiler builtins for core types like Box, but I think those just exist for ergonomics and optimization. Rust has never made a huge effort to decouple std from the compiler, and I see no practical reason it should. I like having compiler builtins explicitly supporting certain use cases, with std providing abstractions for many of those use cases, rather than stupid bullshit exploiting edge case properties of the language like SFINAE.

                  Also you’re describing the generic allocator proposal, the one I mentioned that already exists in C++. This is the type signature for std::vector in C++:

                  template<
                      class T,
                      class Allocator = std::allocator<T>
                  > class vector;
                  
                  1. 1

                    Is the “generic allocator proposal” specific to allocators? If so, I don’t think it’s what I’m describing. I don’t think allocators should be special, I think a default impl should be specifiable for any generic parameter.

                    1. 2

                      Rust does have default generic type parameters already. So does C++, demonstrated in the vector example above.

                      1. 1

                        So the “generic allocator proposal” is about leveraging default generic type parameters to make allocators pluggable? Just rewriting std to accept an allocator argument in every allocating type?

                        1. 2

                          Pretty much, yes. And the storages proposal is about going a step further than that, for implementing things like Vec or HashMap with hybrid inline storage, or statically allocated external storage, or whatever else. Storages with no associated allocator could even be used in no_std environments. Currently such specializations of std types are implemented in crates and re-implement all the usual methods themselves, duplicating a lot of code.

                          I am not sure how far the storages proposal goes, but it would be cool if you could use it to implement things like smartstring or Java’s singletonList, which reuse or optimize out the heap pointer, length, and capacity fields.

            3. 1

              Interesting article!

              I’m toying with the idea of using technique combined with the Sans IO style (like seen in Quinn https://docs.rs/quinn-proto/0.8.4/quinn_proto/struct.Connection.html)