1. 32
  1.  

  2. 9

    By the way: the RustBelt project itself, where this paper comes out of, is almost 2 years old: http://plv.mpi-sws.org/rustbelt/

    1. 9

      This is a relief. I often thought Rust is too good to be true, and I think the reason is it actually kind of is. To quote:

      Unlike shortening, reborrowing provides an inheritance to regain the initial full borrow after the shorter lifetime has ended. This may sound intuitively plausible, but turns out to be extremely subtle. In fact, most of the complexity in the model of the lifetime logic arises from reborrowing.

      Intuitively plausible, but extremely subtle! I think it’s the best summary of how I feel about Rust’s safety guarantee.

      1. 1

        We need to have more close analysis of bugs, especially security failures, in order to get a better sense of software tradeoffs. For example, the recent CVE involving sudo was at surface a memory corruption bug, but if you look at it more closely involves a chain of design errors in the OS and the application. These is no mechanical method of preventing stupid, but it seems like a checklist of elementary security issues should be required for any e.g. privelege escalating software.

        1. -1

          Consequently, to overcome this restriction, the implementations of Rust’s standard libraries make widespread use of unsafe operations, such as “raw pointer” manipulations for which aliasing is not tracked. The developers of these libraries claim that their uses of unsafe code have been properly “encapsulated”, meaning that if programmers make use of the APIs exported by these libraries but otherwise avoid the use of unsafe operations themselves, then their programs should never exhibit any unsafe/undefined behaviors. In effect, these libraries extend the expressive power of Rust’s type system by loosening its ownership discipline on aliased mutable state in a modular, controlled fashion: Even though a shared reference of type &T may not be used to directly mutate the contents of the reference, it may nonetheless be used to indirectly mutate them by passing it to one of the observably “safe” (but internally unsafe) methods exported by the object’s API

          C has too many unsafe operations. To solve this problem, our new super language rules out all unsafe operations except those which one precedes with the keyword “unsafe”. Ta da!

          1. 16

            There’s always an unsafe part. It’s like the trusted part in secure systems: it’s the TCB. You can’t get rid of it entirely. So, you make it as small and simple as possible. Then interface with it carefully. In the process, you avoid the severe damage (esp code injection) of common defects in vast majority of your code.

            That they can do this up to temporal safety and race-free concurrency is a huge improvement over status quo.

            1. 1

              I don’t even know if that’s a good method of design for security. DJBs comments on design for security seem pretty insightful. To be fair, I don’t have a better solution, just some skepticism.

            2. 11

              What evidence would convince you that unsafe markings as manifest in Rust are an effective tool?

              1. -2

                I think the requirement for unsafe indicates that the basic system is not adequate. Either you solved the problem or you didn’t. I think that e.g. Java or Go or Lua using C libraries is a more coherent response than a system programming language with an elaborate safety mechanism that needs to be defeated in order to implement its own libraries. This is the same problem I have with the stupid C standard type aliasing rules: to impose “safety” restrictions that have to be escaped in order to implement basic functions seems like putting ones hands over ones eyes.

                1. 14

                  Okay? I’ll note that you didn’t actually answer my question. Skepticism is good, but it’s a lot more productive when you can state more precisely the level at which your belief is falsifiable. Like, I don’t know what “you either solved the problem or you didn’t” actually means in this context.

                  elaborate safety mechanism

                  How is unsafe simultaneously just a keyword and also an elaborate safety mechanism? I found your initial comment overly reductive, but you jumped from that to “elaborate safety mechanism” in the blink of an eye! What gives?

                  seems like putting ones hands over ones eyes

                  How so? What examples of “safety” restrictions are you referring to? How are they like Rust’s unsafe keyword?

                  1. -2

                    The Rust system of memory management and pointer aliasing is elaborate. But to create necessary libraries, the pointer safety system needs to be escaped. To me, that’s a design failure. It’s the classic failure in security too. It’s not like you can average safety together: 1000000 lines of totally safe code and 10000 lines of unsafe does not make it 99% safe.

                    1. 13

                      … You still haven’t answered my question! Could you please address it?

                      The Rust system of memory management and pointer aliasing is elaborate.

                      This seems inconsistent with your initial comment.

                      But to create necessary libraries, the pointer safety system needs to be escaped. To me, that’s a design failure. It’s the classic failure in security too.

                      If it’s a design failure, then that implies there is either a better design that isn’t a failure or that there is no possible design that wouldn’t be a failure in your view. If it’s the former, could you elaborate on what the design is? (Or if that’s not possible to do in a comment, could you at least describe the properties of said design and what you think it would take to achieve them?) If it’s the latter, then we are back to square 1 and I’m forced to ask: are some design failures better than others? How would you measure such things?

                      It’s not like you can average safety together: 1000000 lines of totally safe code and 10000 lines of unsafe does not make it 99% safe.

                      Who is doing this, exactly? Do you think such a simplistic reduction is an appropriate way to judge the merit of a safety system? Can you think of a better way?

                      1. 0

                        … You still haven’t answered my question! Could you please address it?

                        I’m not sure why you are still confused by this but the “elaborate” in my initial comment was not referring to the escape.

                        This is like: I’ve invented a perpetual motion machine, you just need to push it every now and then to keep it moving. I’ve invented a safe programming language, it just needs an unsafe escape mechanism or an FFI for implementing real applications.

                        If it’s a design failure, then that implies there is either a better design that isn’t a failure or that there is no possible design that wouldn’t be a failure in your view

                        I think there should be a better design, but don’t know what it is.

                        This seems inconsistent with your initial comment.

                        It is not even remotely inconsistent.

                        1. 7

                          This is the question I was referring to that I haven’t seen answered:

                          What evidence would convince you that unsafe markings as manifest in Rust are an effective tool?

                          1. 1

                            My complaint is that the language requires an escape mechanism. So what would convince me is if it did not need to turn off its own safety mechanisms.

                            1. 8

                              If it didn’t need to turn off its own safety mechanisms, then the unsafe markings themselves would cease to exist. So, that doesn’t answer my question unfortunately. If you’d like some clarification on my question, then I’d be happy to give it, but I’m not sure where you’re confused.

                              Here’s another way to think about this:

                              I think there should be a better design, but don’t know what it is.

                              What would it take to convince you that there is no better design? If you were convinced of such an outcome, would you still consider Rust’s memory safety mechanisms a design failure?

                              Here’s yet another way: if a better design does exist, do you think it’s possible to improve our tools until the better design is known, even if you would consider said tools to be a design failure? Or are all design failures equal in your eyes?

                      2. 1

                        If C had been defined with keywords to partition blocks with unsafe operations from safe ones, wouldn’t leveraging those be a best practice now? Or do you feel like we would see it now as a design failure of C?

                        This concept seems very similar to inline assembly and/or linking against handwritten assembly implementations of popular functions. C-libraries generally have some critical sections implemented in assembly whereas it’s much less popular for normal applications to leverage this feature.

                    2. 13

                      I think the requirement for unsafe indicates that the basic system is not adequate.

                      I would like to write an operating system. I need to write a VGA driver. My platform is x86. To do this, the VGA device is memory mapped at the physical address 0xB8000. I have to write two bytes to this address, one with the colors, and one with the ASCII of the character I’d like to print.

                      How do I convince my language that writing to 0xB8000 is safe? In order to know that it’s a VGA driver, I’d need to encode the entire spec of that hardware into my programming languages’ spec. What about other drivers on other platforms? Furthermore, I’d need to know that I was in ring0, not ring3. Is that aspect of the hardware encoded into my language?

                      How would you propose getting around this?

                      I think that e.g. Java or Go or Lua using C libraries is a more coherent response

                      This is interesting, since many people refer to unsafe as a kind of FFI :).

                      Fundamentally, the difference here is “when you use FFI you don’t know because it’s not marked”, and unsafe is marked. Why is not marking it more coherent? They’re isomorphic otherwise.

                      1. 3

                        How would you propose getting around this?

                        I think the point is somewhat that you can’t without losing your ability to claim truly safe and secure status.

                        A totally-safe systems language has to include the semantics of the hardware systems it runs on, otherwise it’s just wishful thinking.

                        Now, it’s clearly a hard problem on how to do this, right, but maybe that’s informative in and of itself.

                        1. 6

                          Or it just needs a certifying compiler or translation validation of generated code. Certifying compilers exist for quite a bit of C, LISP 1.5, and Standard ML so far. Ensures resulting assembly will do exactly what source says. They also have intermediate languages in them that themselves can be useful.

                          As TALC and CoqASM show, one can also add types and memory safety to assembly code to prove properties directly. One could replace the unsafe HLL code with provably-safe, assembly code. Then, you just need to take the interface specification of one then plug it into the others’ tooling. It’s one of the things Microsoft did for VerveOS: OS written in C# compiling to typed assembly interfacing with “Nucleus” separately verified.

                          https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/pldi117-yang.pdf

                          1. 4

                            Actually, I think this misses the point. The point, IMO, are the trade offs between two choices:

                            1. Encoding the safety in the language itself by encoding the semantics of the hardware systems it runs on.
                            2. Give users the tools to encode safety themselves.

                            The process for proving safety is the same and neither are “more safe” than the other. The only difference between them is that one is practical while the other is not (in a general programming language). That is, neither choice is “totally safe” (by your defintion). The choices just push safety around different levels of abstraction. The abstraction of safety in the first place is the important bit.

                            1. 1

                              Give users the tools to encode safety themselves.

                              That’s what we did with C, and look how that turned out. :)

                              The only way to prevent people from doing stupid things is to forbid them by construction–and sadly, this often limits the clever things too.

                              1. 6

                                That’s what we did with C, and look how that turned out.

                                Uh, no? C has no way to encapsulate memory safety.

                                The only way to prevent people from doing stupid things is to forbid them by construction–and sadly, this often limits the clever things too.

                                This seems overly reductive to me. We don’t actually have to prevent people from doing stupid things. A measurable reduction would be an improvement.

                                1. 0

                                  C has no way to encapsulate memory safety.

                                  Eh? Users can “embed” memory safety by using the correct library calls and compiler warnings and whatnot to catch mistakes (like, incorrect number of args to printf, using uninitialized pointers, etc. and so forth)–and can use libraries that provide APIs that do things like prevent incorrect allocation of memory and unsafe arithmetic.

                                  The problem with saying “Users can embed their own safety!” is that you then have to consider all of the legacy ways that users did that (e.g., in C, as explained above) and how that didn’t always work, because of user failings.

                                  And a “measurable reduction” instead of complete prevention makes Rust a lot less compelling than just using people’s existing knowledge of C and competent analysis tools and practices.

                                  Given the amazing marketing efforts by the Rust Evangelion Strike Force and friends, I’d rather hope you all would look to see just how good you could make it.

                                  1. 10

                                    Eh? Users can “embed” memory safety by using the correct library calls and compiler warnings and whatnot to catch mistakes (like, incorrect number of args to printf, using uninitialized pointers, etc. and so forth)–and can use libraries that provide APIs that do things like prevent incorrect allocation of memory and unsafe arithmetic.

                                    This equivalence you’re trying to establish seems incorrect to me. In Rust, I can provide an API and make the following guarantee that is enforced by the compiler: if my library is memory safe for all inputs to all public API items, then all uses of said library in safe Rust code are also memory safe. You can’t get that guarantee in C because it’s unsafe-everywhere. The obvious benefit of this implication is that unsafe becomes a marker for where to look when you find a memory safety bug in your program. Equivalently, unsafe becomes a marker for flagging certain aspects of your code for extra scrutiny. This implication is what makes encapsulation of safety possible at all.

                                    There are plenty of downsides with this scheme that make it imperfect:

                                    1. People could misuse unsafe. (e.g., By not marking something as unsafe that should be unsafe.)
                                    2. The safe subset of Rust could be so useless that unsafe winds up being a large proportion of one’s code.
                                    3. If unsafe is used infrequently, then folks will have less experience with it, and therefore might be more inclined to screw it up when they do need to use it.

                                    But they are tradeoffs. That’s my point. They must be evaluated in a context that compares your available choices. Not some idealized scheme of perfection. Everyone who has used Rust has formed their own opinions about how well it guards against memory safety bugs. If we’re lucky enough, we might even get to collect real data that supports a conclusion that using Rust leads to fewer memory safety bugs than C or C++ in the aggregate. (The answer seems obvious enough to me, and I have my own data to support it, but it’s just anecdotal.)

                                    And a “measurable reduction” instead of complete prevention makes Rust a lot less compelling than just using people’s existing knowledge of C and competent analysis tools and practices.

                                    “you aren’t perfect, so you aren’t worth my time” — That’s an amazing ideal to have. I don’t know how you possibly maintain it. Seems like a surefire way to never actually improve anything! Surely I must be mis-interpreting your standards here?

                                    For me personally, I’m more inclined to not let perfect be the enemy of good.

                                    Given the amazing marketing efforts by the Rust Evangelion Strike Force and friends, I’d rather hope you all would look to see just how good you could make it.

                                    <rolleyes> Go troll someone else.

                                    1. 2

                                      “you aren’t perfect, so you aren’t worth my time” — That’s an amazing ideal to have. I don’t know how you possibly maintain it. Seems like a surefire way to never actually improve anything!

                                      As you said, there are plenty of tradeoffs with Rust’s memory safety scheme, and established industry knowledge of C vs. Rust is just another kind of tradeoff. That seems to be his point.

                                      1. 3

                                        If that was the point, then I’d be happy, but that’s not my interpretation of friendlysock’s comments at all. (They contain zero acknowledgment of tradeoffs, and instead attempt to judge Rust against a model of perfection.)

                                        1. 1

                                          And a “measurable reduction” instead of complete prevention makes Rust a lot less compelling than just using people’s existing knowledge of C and competent analysis tools and practices.

                                          I interpreted this as “use existing knowledge and tooling to make C programs better, or switch to a new language with some embedded memory safety guarantees.” Sounds like there are tradeoffs in there to me, especially if you rely on unsafe code.

                                        2. 1

                                          You’ve grokked the essence of it–it’s not just industry knowledge, it’s also things like the vast amount of code which, while unsafe, has been tested and patched, and the operating system protections put into place to mitigate compromised programs.

                                          Who cares if somebody can escape to a shell via an overflow if they end up as a neutered user account?

                                          1. 5

                                            Who cares if somebody can escape to a shell via an overflow if they end up as a neutered user account?

                                            Step 1: Get barely-privileged account.

                                            Step 2: Privilege escalation with another bug.

                                            This works so much it’s standard in hacking guides. So, preventing that vulnerability bug is quite worthwhile. If you’re preventing step 2, then whatever they’re interacting with that’s privileged has to have no or few bugs. That hasn’t been true in mainstream software. So, these two steps are both worth putting extra effort into given the countless vulnerabilities that have happened with each.

                                            1. 2

                                              Fair enough!

                                      2. 4

                                        It’s impossible to have a programming language that is simultaneously 1) good for systems-level programming and 2) has no mechanism for bypassing memory safety. On Linux you can simply read and write from /proc/self/mem. Windows and Mac OS X have similar mechanisms.

                                        1. 1

                                          Is that true?

                                          Java ME Embedded, for example, was successfully used as a systems programming language without that escape hatch. If one reads the Oberon documentation, it looks like this language (used for real systems!) managed to support pointers without a lot of the pitfalls.

                                          1. 3

                                            So your definition of “good for systems-level programming” excludes being able to read and write from the filesystem on Linux?

                                            1. 0

                                              That’s not a reasonable criticism. The programming language is not asked to enforce safe use of the memory subsystem or the file system or to keep someone from jamming a paperclip in the processor fan. The question is purely a design decision for the language. Obviously there are many ways to split the difference and, right now, none of them are totally satisfactory.

                                              1. 6

                                                I believe my criticism is quite reasonable.

                                                • friendlysock wants “complete prevention,” and is against “measurable reduction.”

                                                • Rust is a systems programming language, and for a systems programming language, security is relative to the set of syscalls available.

                                                • Popular operating systems have syscalls that violate memory safety.

                                                • Therefore, “complete prevention” is out of the picture, and we must talk in terms of “measurable reduction.” This completes my argument.

                                                It often isn’t worthwhile to model “paperclip in the processor fan”, although, sometimes it is. If you’re designing a programming language that can deliver correct results in the face of faulty hardware, then random bit flips need to be part of the model. If you’re designing a systems programming language, then the syscall interface needs to be part of the model. Once you acknowledge that, you find that fretting about the programmer using unsafe when they shouldn’t is pointless.

                                                1. 1
                                                  friendlysock wants “complete prevention,” and is against “measurable reduction.”
                                                  

                                                  Not quite–I’m not “against” it, it’s just that the benefits of using something that only provides “measurable reduction” instead of “complete prevention” are not sufficiently large when I also take into account retraining and retooling.

                                                  Popular operating systems have syscalls that violate memory safety.

                                                  Ah, I think I see the angle you’re taking. I kind of assume that since we’re talking a systems programming language, we ignore ill-conceived syscalls since we could be writing an OS that doesn’t contain them.

                                                  1. 1

                                                    If we’re talking about a language for writing applications for an operating system that uses type safety to enforce process isolation, then I agree, Rust as-is is not suitable for the task. Maybe Rust combined with a different language for unsafe segments, as mentioned in this thread.

                                                  2. 1

                                                    If you’re designing a systems programming language, then the syscall interface needs to be part of the model.

                                                    Since it’s highly relevant to this topic, see Galois Group’s presentation on Ivory, a synthesis language for systems. Notably, they assume that an underlying system task scheduler exists.

                                                    1. 1

                                                      You might find immunity-aware programming interesting:

                                                      https://en.m.wikipedia.org/wiki/Immunity-aware_programming

                                  2. 2

                                    “I’d need to encode the entire spec of that hardware into my programming languages’ spec. “

                                    That’s pretty much what you do. You can do abstract, state machines with its key properties to avoid doing the whole thing in detail. You do one each for program and hardware function. Then you basically do an equivalence on the likely inputs.

                                    Alternatively, you assume the hardware works then specify and implement the unsafe stuff in something like SPARK Ada. Prove it safe. Then wrap the result in Rust with interface checks that ensure safe Rust code uses the “unsafe,” but verified, code it’s calling in a safe way. I think Rust’s would do well combined with tech like Frama-C or SPARK for unsafe parts.

                                    End result of either method is only the lowest-level or least safe stuff needs extra verification. Still reduces burden on developers a lot versus looking for all kinds of undefined behavior..

                                    1. 4

                                      What does this have to do with whether Rust’s safety guarantees are completely encoded in the language vs permitting users of the language to use unsafe markers?

                                      1. 2

                                        Im only addressing how to handle code that’s actually unsafe that’s included in Rust apps and can maybe break the safety. I dont know the implications or uses of Rust’s unsafe keyword enough to do anything further. Just defaults I say for any safe, systems language with unsafe parts. :)

                                        1. 2

                                          The point is that encoding the hardware semantics is not sufficient to remove unsafe while retaining Rust’s “zero-overhead” guarantee. You would also need to add a logic that subsumes quantifier-free Peano arithmetic to your type system which will seriously gum up the little type inference that Rust does already. One alternative is to use a different (more proof-oriented) language and type system for the unsafe bits. The boundary between unsafe and safe then is well-defined as the point where the invariants of the unsafe code can be expressed in terms of Rust’s type system.

                                      2. -1

                                        That’s my point. I want to write an OS. You propose a programming language that includes elaborate type safety mechanisms, particularly strong control of pointers and explain it is much better than C/C++ or Ada or assembler or some other unsafe language because of this mechanism and also when I need to write a VGA driver or parse a command string or do any of the other things that are where raw pointers are most problematic - I can escape the control! So, to me, you haven’t solved the actual problem, you just made coding more inconvenient.

                                        The basic problem is very difficult, I agree.

                                        Fundamentally, the difference here is “when you use FFI you don’t know because it’s not marked”, and unsafe is marked. Why is not marking it more coherent? They’re isomorphic otherwise.

                                        How do I know when I call a library function that, 7 layers down, some knucklehead decided code would look “more clean” using the unsafe escape? you are right: it’s essentially a FFI. So what have you gained? It may well be that the general design of the language is great and has other virtues, but you have not really solved the problem of unsafe pointers, you’ve just swept them under the FFI. The FFI at least gives me a clean separation.

                                        1. 5

                                          you just made coding more inconvenient

                                          In my experience, coding becomes more convenient. Is your experience to the contrary? Could you elaborate?

                                          The basic problem is very difficult, I agree.

                                          What do you think the basic problem is?

                                          So what have you gained?

                                          A tool to encapsulate unsafety.

                                          1. 1

                                            A tool to encapsulate unsafety.

                                            Textually. But e.g. a buffer overlow in this “encapsulated” code can spill into your safe zone - no?

                                            1. 6

                                              Yes, but then you know precisely from where it spilled, unlike in a language that does not demarcate unsafe code. That’s what you gain. When things go wrong, it’s quite quick to hone in on the problem area. The surface area to explore is reduced potentially by orders of magnitude.

                                              1. 3

                                                Yes. If there’s a bug. What form of encapsulation doesn’t have the problem that it might contain bugs?

                                              2. 1

                                                A tool to encapsulate unsafety.

                                                Textually. But e.g. a buffer overlow in this “encapsulated” code can spill into your safe zone - no?