1. 67
  1.  

  2. 16

    This is a pretty good list of the warts in unsafe Rust.

    • there isn’t any nice pointer derefence syntax like C’s ptr->field. The code is littered with ugly (*ptr).field everywhere. – Yeah it would have been a lot nicer if dereferencing had been a postfix operator, so you could write ptr*.field or whatever. C’s syntax legacy still haunts us to this day. Instead Rust kinda shuffles around this problem by auto-dereferencing references for you when it needs to. It refuses to do this for raw pointers though, which is probably a good idea when you treat every raw pointer as intrinsically toxic.
    • Explicit allocators – people have been sloooooowly edging more of this into Rust’s stdlib and I think in the end it’s a good idea. But Rust’s stdlib API can’t be broken and people don’t want to duplicate literally every function in it, either. The ergonomics of it are still a little unsolved, people (used to) worry a lot about it being such a gigantic pain in the ass that nobody wants to use it. I need to learn more Zig and see how well it works out in practice.
    • Non-null pointers by default – yeahhhhhh. std::ptr::NonNull wasn’t introduced until Rust 1.25, and I tried to use it grand total of once when writing an allocator for my toy OS. It was such a pain in the ass that I haven’t bothered since.

    I’m rather surprised that Zig is faster though, let alone significantly faster! Some of those nuisances like not allowing pointer aliasing and having a single global allocator are things I would expect to make life easier for the optimizer.

    1. 4

      Explicit allocators

      That’s one thing that I hope will land in rust at some point. Possibly behind an edition flag to allow adding this to all crates, because HashMap::with_allocator() isn’t helping much, when the allocation (HashMap) is done inside a 3rd party crate I want to use. IIRC there were some discussions about using effects for this ?

      Tangentially I did implement a stack based VM in rust (or rather ported from C to rust), which uses a mark ‘n sweep GC - so I’m throwing all allocations inside this one big region and copying over the other half per cycle. After directly declaring my GC-element-structs* as repr(c) (C-ABI) and allocating in a libc::malloc generated area, it was actually very easy to do all the pointer stuff in rust. My surprise was especially that I was way faster finished with this implementation than my initial C one. And it worked from the start.

      What did hurt my performance was interacting with a C library that basically requires access to the VM state via globals. And rust doesn’t like mutable globals (there is no “no multithreading and I don’t care about interrupts” mode sadly). So either you bite the bullet and declare everything unsafe, or you have to add some kind of “register-callback” functionality and throw all your state on the heap for that.

      * Something like this

      struct {
        region: bool, // gc cycle
        size: int, // size, unit depending on object flag
        object: bool, // kind
        data: [u8] // <- actual data used by the VM
      }
      
      1. 2

        Last I heard, Rust does not take advantage of mutable references not aliasing as much as it could, due to LLVM miscompilations. C/C++ code doesn’t use restrict nearly as often as Rust uses &mut, so Rust surfaces a lot more of those bugs than C/C++.

        1. 2

          Rust went back and forth on this several times - enabling mutable noalias for one release and then being forced to roll it back again due to miscompilations.

          However, the last time this changed was pretty much exactly 2 years ago, when the last known LLVM bugs had been fixed and mutable noalias was enabled by default again.

          You can check this for yourself in this playground. Select “Show LLVM IR” instead of “Build” and you will see:

          define void @takes_mut(ptr noalias nocapture noundef align 4 dereferenceable(4) %a)
          

          Note the noalias!

          1. 1

            Yeah, I’d just hoped that the situation had gotten better than the last time I’d checked on it (2019 or so I think).

            1. 1

              It’s better than 2019: https://github.com/rust-lang/rust/pull/82834

              It’s still not as great as it could be ;)

        2. 7

          Nice domain name.

          1. 4

            The way I would frame this is that Rust has static (compile-time) memory management, and that conflicts with dynamic memory management (garbage collection).

            The boundary is awkward and creates complexity.

            I wrote a post about problems writing a garbage collector in C++, e.g. annotating the root set, and having precise metadata for tracing.

            https://www.oilshell.org/blog/2023/01/garbage-collector.html

            https://news.ycombinator.com/item?id=34350260

            I linked to this 2016 post about Rust, which makes me think the problem could be worse in Rust, although I haven’t tried it:

            GC and Rust Part 2: The Roots of the Problem

            I didn’t write as much about bindings to native C++ code, but that’s also an issue that you have to think about carefully. CPython has kind of been “stuck” with their API for decades, which exposes reference counting. So it’s extraordinarily difficult to move to tracing GC, let alone moving GC, etc.


            On the other hand, there was also a paper that said Rust can be good for writing GCs.

            Rust as a Language for High Performance GC Implementation

            https://dl.acm.org/doi/pdf/10.1145/2926697.2926707

            However, I’m not sure it addresses the interface issue. One lesson I learned is that GCs are NOT modular pieces of code – they have “tentacles” that touch the entire program!

            That said, C++ is pretty good at “typed memory” as well, and I think it’s more pleasant than C. That is, you get more than void* and macros. So I can believe that Rust has benefits for writing GC.

            Not sure about Zig – I can believe it’s a nice middle ground.

            (copy of HN comment)

            1. 3

              “Dynamic memory management” doesn’t only arise in a tracing garbage collector. The article cites Roc, which rewrote its runtime from Rust to Zig, and Roc uses referencing counting with the Perceus algorithm instead of a GC. If you are implementing an interpreter (which also arises in a statically typed language with compile time evaluation), then you are writing code that constructs new data types and the memory layouts for these types at runtime, and then creates instances of these new types and performs operations on them. Your code is reasoning about type and memory safety at runtime. Rust’s static type system and bad ergonomics for unsafe code just gets in the way.

              1. 5

                Yes for sure, it’s the same issues with ref counting (which is a form of GC, or automatic memory management).

                After writing the Oil runtime, it’s not surprising to me at all that Roc would want their runtime in Zig (or less strict systems languages like C++ or C), while retaining the compiler in Rust. The runtime deals with OS functions and a homogeneously typed view of objects, which are exactly the “boundaries” where Rust doesn’t help you.

                I’ve written a lot about this interface / interoperability problem with static typing, e.g.

                https://www.oilshell.org/blog/2022/03/backlog-arch.html#tradeoffs-between-dynamic-and-static-types-faq

                Serialization is another big one, that Roc’s predecessor Elm famously had issues with. They don’t want you to use JSON, because it inherently conflicts with static types. Elm also has a very consistent experience in its world, but not being able to use JSON easily was a dealbreaker for many people.

                More generally static types are a “model”, but the external world / reailty doesn’t always fit that that model – serialization of data from other processes, dealing with raw memory, and the OS.

                Another big one was dealing with hardware: https://lobste.rs/s/eppfav/why_i_rewrote_my_rust_keyboard_firmware

                Rust’s type system does not extend to the interface to integrated circuits :-) Reality is complicated and dynamic! Models are not reality.

                1. 7

                  They don’t want you to use JSON, because it inherently conflicts with static types.

                  That’ strange, because as far as I know JSON maps naturally to algebraic data types as seen in Haskell, OCaml, or F#. It would look something like this:

                  data json =
                    | Nil
                    | Bool   of bool
                    | Num    of float
                    | String of string
                    | List   of json list
                    | Object of string json map
                  
                  let process_json input =
                    match input with
                    | Nil      -> print "Nope"
                    | Bool   b -> if b then print "yay!" else print "nay!"
                    | Num    n -> print ("int" ++ string_of_int n)
                    | String s -> print_string ("String " ++ s)
                    | List   l -> List.iter process_json l
                    | Object o -> Failwith "Not implemented"
                  

                  One could argue that this is emulating dynamic typing in a static type system. And it kinda is. That’s a big part of why sum types are so damn useful. Anyway, I understand the claim that JSON conflicts with static type systems that don’t have sum types (a class hierarchy could work, but it’s so heavy). But for those who do JSON maps beautifully.

                  Separately, I would also claim that static type systems who don’t have sum types are lacking something. Same way they would if they had no generics. Outside specific niches, the rationale to omit any of the two better be real good.

                  1. 1

                    More generally static types are a “model”, but the external world / reailty doesn’t always fit that that model – serialization of data from other processes, dealing with raw memory, and the OS.

                    Another big one was dealing with hardware: https://lobste.rs/s/eppfav/why_i_rewrote_my_rust_keyboard_firmware

                    Rust’s type system does not extend to the interface to integrated circuits :-) Reality is complicated and dynamic! Models are not reality.

                    Should I be finding an example of types misrepresenting hardware in the keyboard firmware post? I see that the author struggled with operating generically over different types representing different bits of hardware, but I don’t see that those types misrepresented the hardware.

                    1. 1

                      Serialization is another big one, that Roc’s predecessor Elm famously had issues with. They don’t want you to use JSON, because it inherently conflicts with static types. Elm also has a very consistent experience in its world, but not being able to use JSON easily was a dealbreaker for many people.

                      What? JSON is simple to both encode and decode in Elm (and Rust): https://package.elm-lang.org/packages/elm/json/latest/

                      Yes, you have to map it to a static type of some sort, but it’s definitely not true that “they don’t want you to use JSON”.

                      1. 2

                        My statement was probably too aggressive, but the source is stuff like this from the creator:

                        https://gist.github.com/evancz/1c5f2cf34939336ecb79b97bb89d9da6

                        For some reason we think JSON is a thing that is fine. It is not a great choice on basically every metric that matters to building reliable, efficient, and flexible applications.

                        from https://discourse.elm-lang.org/t/status-update-3-nov-2021/7870/1

                        Also feedback like this from users: https://news.ycombinator.com/item?id=30021229

                        I programmed with Elm for 3 years and love it - best programming experience I’ve ever had. I had to stop because I started working on some educational game projects that had incredibly short timelines and frequent, large change requests and depended on some WebGL 3D and game libraries. Nothing beats the super productivity of JS for hacking something together fast, and despite JS being far less reliable, the productivity and flexibility made up for that. Yes, there were run-time errors, but they were really a minor problem. JS, whatever it’s many faults may be, is fast and fun.

                        What slowed me down with Elm was JSON parsing, which needs to be done manually and can take hours or days whereas the same thing in JS takes about 2 seconds to type an import statement.


                        So again I would just emphasize the tradeoff and dependence on the domain. I can see that if you are writing an app that doesn’t talk to the outside world a lot, Elm’s philosophy and design choices could be fantastic.

                        You just write a little bit of glue, and the glue will do some essential runtime checks. The rest of your code is pleasantly statically typed.

                        (I just advocated static types for writing language processors here (chubot) – https://news.ycombinator.com/item?id=35045520 – you need static typing for predictable performance, and type-driven refactorings are pleasant. Oil is the most statically typed shell implementation.)

                        But there are lots of other kinds of code where the glue is the whole program, and then a language like Elm will just get in the way. Actually one of the claims from the posts I linked above is that MOST SOFTWARE is glue in distributed systems – due to poor factoring and poor architecture.

                        And most controversially – that fine-grained static types CAUSE glue at the large scale, because it creates incommensurable, opinionated models that must be bridged for the whole system to work.