1. 7

    How does it obtain safety while allowing multiple mutable references? Is it by enforcing single threaded only code?

    1. 6

      Yeah, they don’t attempt to guarantee thread safety:

      Rust permits only one mutable reference to a value at a time. For greater flexibility, (standard) Rust supports interior mutation of an object through an immutable reference to it [19]. The programmer may borrow a special reference to the value that permits mutation, and the runtime ensures that only one such reference can exist at a time, enabling a safe relaxation of compile-time checks. With Bronze, mutation is permitted through all references to each garbage-collected object, with no extra effort. A key tradeoff is that Bronze does not guarantee thread safety; as in other garbage-collected languages, it is the programmer’s responsibility to ensure safety. For example, Figure 1 shows how GcRef simplifies code when there are multiple mutable aliases.

      1. 1

        Bad idea. They should just make the references !Send and !Sync and only use it in single threaded contexts.

        1. 2

          That would work, but multiple mutable references are still unsound in a single-threaded context. See: https://manishearth.github.io/blog/2015/05/17/the-problem-with-shared-mutability/

          1. 1

            True. They would also need to add a runtime check when getting a mutable reference, like RefCell.

      2. 5

        Yeah, Manish Goregaokar raised this and a couple of related issues it on Twitter: Bronze doesn’t provide its own sound way to have multiple mutable references (which can be a problem in single-threaded contexts too, e.g. with iterator invalidation), and he thought that for the particular exercise they gave students here, RC might have worked fine and GC might not have been why students found the exercise easier to complete. (I haven’t read the paper, just figured the reference might be useful to folks.)

        1. 4

          It sounds like this paper says more about the usability of current interior mutability APIs in Rust than about GCs. I can attest to the fact that it’s very hard to pick up and use the interior mutability APIs in Rust. I’d be interested in seeing more work on how those APIs could be changed to improve their usability.

      1. 2

        Great post, it’s nice to see someone took the time to write this up! Keeping the overhead low when deciding when to garbage collect is a delicate balance. A classic example I can think of is HotSpot’s approach to fast safepoint polling. Mutator threads poll the VM every so often to see if they need to hand control to the collector (among other things). Polling happens often, and in the common case the mutator won’t need to enter GC code. To make this fast, they try and read a particular “polling” address in memory; its contents are irrelevant, but if the read was possible, the mutator continues on as if nothing had happened. If the VM wants the safepoint poll to trigger a collection, it will mprotect (or equivalent) the address, and the custom SIGSEGV handler will jump into GC land. The poll boils down to a single read along the fast path. [0]

        V8 (Chrome’s JS engine) also has an interesting approach on deciding when to collect. They work out how much time is needed on the main-thread to render animations at 60fps and schedule GC work to hide in the idle time between each frame render. They can also do clever things like compaction when a tab is inactive etc. The PLDI paper is well worth a read [1].

        [0] https://shipilev.net/jvm/anatomy-quarks/22-safepoint-polls/ [1] https://ai.google/research/pubs/pub45361

        1. 1

          Using mprotect is an interesting approach, but I wonder how expensive it would be when a SIGSEGV gets triggered. If my memory serves me right, that’s quite expensive; then again it would happen only rarely.

          1. 1

            Indeed. Safepoint polls happen regularly inside tight loops in Java, but a collection is less common, so handling a SIGSEGV is fine when the poll consists of a single test instruction in x86 in most cases.

        1. 6

          It is true that memory unsafety is real security threat for performance sensitive software written in C and C++, it’s partly why you see browsers like Firefox and Chrome switching to garbage collection for things that were previously managed manually, such as the DOM.

          I don’t think that languages like Rust and Swift as they stand are the answer. While it’s true that Firefox have switched to Rust for some of their internal components, if I remember correctly they have forms of tracing garbage collection tacked on where they enforce correct usage with a set of additional programming rules, which are checked with linting plugins.

          If you want complex graph-like data structures in standard Rust, you will soon find the single ownership model unsuitable. At this point, your choices are: i) try very hard to get things right with unsafe blocks, at which point you’re back in memory unsafety land or ii) make heavy use of reference counting containers, which have a significant performance overhead. In Swift the story is similar - reference counting is not cheap. In addition, to prevent leaks, care must be taken to ensure cycles are broken with weak reference placement, which in complex graphs is also non-trivial.

          It feels to me like an acceptable trade-off here would be some form of fast, opt-in tracing GC for managing complex data structures, where reasoning about liveness is hard and UAFs are more likely; and C++ style RAII for everything else. There is no silver bullet when it comes to memory, and as much as I’m sure lots of these large C / C++ projects would be relieved to reduce memory bugs, convincing them to switch to Rust or Swift as they currently stand may be impractical from a performance perspective.

          1. 16

            try very hard to get things right with unsafe blocks, at which point you’re back in memory unsafety land

            The point of Rust isn’t just about not using unsafe. The point of Rust is that if you need to use unsafe, then you should be able to button it up behind a safe API that can never be used in a way that results in memory unsafety. That is a striking improvement over languages that are “unsafe by default.”

            It feels to me like an acceptable trade-off here would be some form of fast, opt-in tracing GC for managing complex data structures, where reasoning about liveness is hard and UAFs are more likely; and C++ style RAII for everything else.

            This would be an interesting project, yes. Rust used to have garbage collected pointers (via the @ sigil), but dropped them in the move to 1.0. I wouldn’t be surprised if something like this came back in library form. I know some folks have worked on this in the past, but it’s hard.

            There is no silver bullet when it comes to memory, and as much as I’m sure lots of these large C / C++ projects would be relieved to reduce memory bugs, convincing them to switch to Rust or Swift as they currently stand may be impractical from a performance perspective.

            It’s a game of inches. Getting folks to recognize that memory safety is a problem in unsafe-by-default languages is step 1, and I don’t see any problem with that. There are still tons of folks that don’t think it is a problem.

            1. 3

              While it’s true that Firefox have switched to Rust for some of their internal components, if I remember correctly they have forms of tracing garbage collection tacked on where they enforce correct usage with a set of additional programming rules, which are checked with linting plugins.

              The only reason why that’s there is because, by specification fiat, they have to share the DOM with a JavaScript program.

              1. 3

                The only reason why that’s there is because, by specification fiat, they have to share the DOM with a JavaScript program.

                To be fair, some Rust libraries have collection strategies. They are not traditional GC, but e.g. epoch based reclamation is sometimes used.

                But that argument isn’t very interesting: not having a garbage collector in the language enables such techniques can be opted in or even kept local.

            1. 12

              I don’t think this comes across as well as you think it does. The incompetency comments are uncalled for.

              1. 4

                I have yet to see a GC’d language written in rust that doesn’t cop out and use Rc .

                1. 3

                  What’s wrong with using Rust’s Rc standard library feature to implement garbage collection? You gotta write the garbage collector in some non-garbage-collected language after all, and while I’m not an expert in GC implementation, I would think that any GC is going to need some kind of reference counting semantics. So why not use the one that’s built into the Rust standard library?

                  1. 2

                    What’s wrong with using Rust’s Rc standard library feature to implement garbage collection?

                    Most reference counting implementations are slower than a moving, tracing GC alternative for a couple of reasons: they have to increment/decrement the count on every allocation/deallocation; and, with Rc in particular, they require two extra machine words (for the strong and weak count), which results in much poorer cache behaviour.

                    The other traditional problem with reference counting is that it can’t deal with cycles without the user annotating weak pointers. It’s easy to get this wrong, so memory leaks can happen if you’re not careful.

                    I would think that any GC is going to need some kind of reference counting semantics

                    A tracing GC (like mark-sweep; or stop-and-copy) doesn’t need a reference count. They trace from a root-set (usually on the stack or in registers) during collection looking for live objects. Unreachable objects can then be considered dead and thus their memory is freed. (See https://en.wikipedia.org/wiki/Tracing_garbage_collection)

                  2. 1

                    To be fair, CPython uses reference counting (with a tracing GC just to collect cycles).

                    1. 2

                      I think the point is, that it’s really hard to write proper GC in Rust.

                      1. 2

                        No harder than in C, if your willing to use raw pointers. There are a couple of Rust GCs that you could probably use off the shelf, though.

                        1. 1

                          Why is that?

                          1. 2

                            Existing approaches to GC in Rust which don’t have language support suffer from at least one of two problems: i) their API makes them too impractical to use; ii) their performance is worse than Rc. In some cases, it’s a bit of both :).

                            At this stage, I think fully tracing GC in Rust with acceptable performance for writing language interpreters would require some form of Rust language support. This has its own interesting challenges. You’ll encounter fairly non-language-specific problems when trying to retro-fit GC support: how to identify roots and interior pointers; whether to use a separate heap for GC objects; how the API will look; etc.

                            But what makes Rust difficult in particular are its additional borrow rules and safety guarantees. How safe should the GC be? I’m guessing the answer is “well if I use safe Rust code, I don’t expect to segfault”. Then the GC needs to be deal with things like this:

                            let s = S{}; // Assume S is some GC object
                            let s_obfuscated_ptr = &s as *const S as *const usize;
                            

                            Casting a raw pointer (to a potential GC item) to some other type can be written in safe Rust code. Dereferencing it cannot. There’s a problem here though, we’ve potentially hidden a GC root from our collector, and this could introduce unsoundness.

                            Another problem is the borrow rules. Multiple mutable references in Rust is UB. No exceptions. So you would still need something like RefCell inside your GC managed container to enforce those rules at runtime. This does some ref counting under the hood so you’d still, in some sense, have to deal with the reference count penalty. It’s important that this invariant in Rust isn’t broken - at present, I’m not aware of any optimisations rustc performs based on the mutability of references - but in the future, who knows? Breaking this invariant could lead to unsound behaviour based on assumptions made by the optimiser later on down the line.

                            The Deref trait also adds complexity because a GC needs to maintain the frozen reference property: any active reference &'a T to an object, must have a constant address for the whole of its lifetime a. If a collection takes place while an object is being dereferenced, the collector can’t move the object without breaking the FRP. From what I can remember, a lot of rustc’s safety relies on this invariant, and a GC would need to ensure that it pins objects across a deref to ensure soundness.

                            There are lots of other creases that would need ironing out for a sound Rust GC API. There’s a few resources on this from back when some of the core team were actively working on it but I think the work has now stagnated. This blog is a good start if you wanted to find out more: https://manishearth.github.io/blog/2016/08/18/gc-support-in-rust-api-design/

                        2. 1

                          Sure, that is fair, but from what I have seen is that rust ownership semantics make implementing mutability + sweeping part very hard. I have seen a rust GC ref type before, just never seen it used in a rust interpreter.

                          edit: jake hughes comment above is an excellent summary.