1. 26
    1. 40

      I agree that Rust would benefit greatly from an official specification. However, I am saddened to see that a lot of the discourse around this is oriented around the examples of the C and C++ specifications, because I think these are terrible specifications, for mostly two reasons:

      1. ISO is not only useless, but actively harmful in this scenario. Access to the spec itself costs 200 CHF. Access to the standardization process requires you to go through your national standards body which has to be an ISO member. Certain countries are entirely eliminated from participation this way. The discussions are held behind closed doors, there are no public recordings or minutes. With some luck you can get “trip reports” written from memory by less secretive committee members.

      2. The standardese these specs are written in is simultaneously too obtuse for mortal humans to understand and too hand-wavy to allow formal reasoning. It sits in an unfortunate twilight zone between practically useful everyday documentation like the Rust reference and mechanized formal semantics like CLight. The way the C and C++ specifications are written allows leaving not just tricky edge cases, but gaping semantic holes such as “what the hell is provenance?” or “do relaxed atomic operations respect causality?”. The prose is also just not useful for formal verification practicioners, which is why efforts like CompCert completely ignore the ISO standards and come up with their own semantics.

      So in my opinion, yes Rust should have a specification but definitely not under ISO and not in ISO-style standardese either.

      1. 13

        So small fun fact: You can force ISO to publish your specification for free. No one does this because… who knows. But there are free ISO specifications. They just happen to come from other standards bodies

        1. 4

          Agreed.

          The C and C++ committees have to a limited extent worked around ISO’s restrictions, so you can see the proposed changes and meeting minutes and I suppose if you care more than me you can send personal email to the relevant committee members.

        2. 15

          I agree that a specification would be good to have, but I think the example code in this project fails to illustrate any sort of point. All the information that the author got from the Ferrocene spec is also in the official reference.

          The page on patterns that he linked to states of the underscore pattern:

          Unlike identifier patterns, it does not copy, move or borrow the value it matches.

          So the value of the expression is not moved into a variable, and thus does not have the drop scope of a variable. So what drop scope does it have? The author also linked to the page on destructors, which has a list of drop scopes. One of those is “each expression”. So the value of the expression should be dropped when the expression is completed.

          1. 7

            I think the example the author picked happened to be particularly favourable, ironically enough. There’s a whole class of real-world situations that the Rust documentation currently handles rather poorly.

            If an intern – someone with excellent formal knowledge, but not specifically a fan of the language, or someone who’s followed its development for a long time – comes to me with a C++ question, I can always refer them to the standard.

            If the same intern comes to me with a question about, say, interfacing with unsafe code, I have to refer them to the Rust Reference, the Rustonomicon, and possibly a series of RFCs. Both the Rust Reference and the Rustonomicon have big disclaimers that they’re out of date, so any tentative answer you reach from either needs to be checked against the issue tracker.

            Both the issue tracker and the RFCs are excellent material if you’re someone who enjoys writing Rust and wants it to succeed. That’s great, I enjoy writing Rust and want it to succeed, but most professional programmers don’t, and shouldn’t be expected to, give a rat’s ass about it. That’s twice as true for my poor intern, who’d have to consult two year-old RFCs when they’ve been programming professionally for all of two months.

            More importantly, this is all but useless in any regulated setting. If I’m conducting a formal review and someone points out an edge case that we’re not certain about, the best-case scenario if it’s not in the stable reference for our stable toolchain is I dig through newer versions of the reference and/or the nomicon, and see if it says anything useful that happens to apply all the way back to our stable toolchain. The worst case scenario is I need to figure out if the train’s brakes engage or not based on a comment that DildoBaggins69 left on an open issue in the rust-lang reference issue tracker four months ago. At that point I might as well give up, memory safety is nice and all but good luck explaining DildoBaggins69 had a point in a court when the insurance claims start rolling in.

            (Edit: obviously, my recommendation in that case would be to rewrite that code so it doesn’t depend on Mr. Baggins’ explanation. That’s still entirely undesirable: at that point you’re actively avoiding the language, not using it.)

            1. 14

              To clarify, this is not so much “rust needs the spec”, as “rust needs to actually decide what the correct behavior is”. Certain aspects of the language, especially around unsafe, are unspecified not in a sense of a merely missing documentation, but in a sense that there’s an explicit waiver for any specific behavior.

              The good news is, I think 95% of it is around unsafe. The safe Rust generally does have well defined behavior, and that behavior is actually documented these days, although the docs do tend to be evenly distributed.

              And the remaining 5% of undocumented safe stuff are of the form “you might not necessarily understand why the code doesn’t compile, but if it does compile, the code works as you want”. That is, I don’t think all intricacies of type inference and name resolution are documented, but the failure mode there is the code not compiling, not nasal-demon summoning ODR violations.

              1. 6

                To clarify, this is not so much “rust needs the spec”, as “rust needs to actually decide what the correct behavior is

                Both are important IMHO. Certainly, a decision on the correct behaviour is the fundamental condition. But unless that is also traceably documented for the toolchain I’m using, so that if I see my toolchain exhibiting a certain behaviour I can trace it to the relevant requirement, it’s simply not usable.

                The remaining 5% of undocumented safe stuff are of the form “you might not necessarily understand why the code doesn’t compile, but if it does compile, the code works as you want”. That is, I don’t think all intricacies of type inference and name resolution are documented, but the failure mode there is the code not compiling, not nasal-demon summoning ODR violations.

                Case in point: my last high stakes project involved charging batteries that could set an apartment building on fire. I want to understand why the code does or doesn’t compile, so that when it does compile, I can tell my colleagues from Sales that we did our best to make sure our customers can put these things in their basements without lying by omission.

                First of all because it’s minimal engineering ethics and integrity. The amount of ommission a company can tolerate when the worst that can happen is an angry tech support call is way larger than when the worst that can happen is someone gets called to hopefully identify half-charred bodies. But also, selfishly, because should the bloody thing catch fire after all, I can’t lie to auditors, and Legal is going to have my ass on a plate if say well, we didn’t really know what all of our code was doing, since not all of our language is documented and even if we had time to figure it out, we’re not compiler engineers. But we asked someone in the community about the parts we weren’t sure about and they said it was fine.

                My main concern isn’t that I might summon nasal demons (ideally, if my code looks like it might summon them, I’d rewrite it). My main concern is that my understanding of specified behaviour might be out of date, or incomplete. If someone asks what this or that line does during a code review, I want to be able to trace my answer to formally specified data which, in turn, can be traced to whatever toolchain we’re using, so I know both a) that my understanding is correct and b) that it applies to our toolchain, not the next version.

                1. 11

                  My hot take would be that, if you need 100% understanding of the language, then Rust just won’t work for you. I am extremely skeptical about feasibility of usefully specifying 100% of Rust. There’s too many small quirks. Like, that when, during type inference, arguments which are syntactically closures are visited first for the purpose of gathering constraints. Or how str::len would resolve to type’s method even if there’s a str module in scope (this is specific to builtins, YourStr::len won’t work like this). In this case, this isn’t even “we are figuring stuff out” – specifics of type inference are explicitly allowed to change between releases.

                  Some options here:

                  • take a more axiomatic, rather than constructive approach. Don’t pin behavior precisely, but aim for “any behavior allowed works for me”. That works nicely for type inference — you don’t specify a particular type inference algorithm. Instead, you require that the types inferred by any oracle match up.
                  • ignore the source language and do proofs about the generated machine code
                  • shop for a simpler to exhaustively specify language? Zig feels like it could be one, though at the moment I am remaining skeptical about the layer where you glue the high level language to the abstract machine. I have a hunch that to do that reasonably you do have to track aliasing in type system, which Zig doesn’t.
                  1. 9

                    My hot take would be that, if you need 100% understanding of the language, then Rust just won’t work for you

                    My equally hot take is that a language can’t usefully and simultaneously:

                    1. Make it a point to not summon nasal daemons, and
                    2. Have so many quirks that you’ll never understand why your code compiles or not.

                    If #2 is true, #1 is moot. The fact that someone can say ah, but that’s no nasal daemon, my ignorant apprentice, that is none other than Belphegor, the demon of that one exception to provenance monotonicity, did you not know that, as of two weeks ago, he may be summoned in the stable toolchain, which supersedes that from six weeks ago, has very little practical relevance.

                    A standard which does not specify some behaviour, but leaves it up to the implementation, offers exactly as much certainty as an implementation that does some very particular, but otherwise unspecified thing. Because it’s not specified. The fact that someone can point me at the compiler and say ah but of course it’s specified, can’t you read, it’s right here in const_ptr.rs doesn’t count as specified behaviour (edit: or rather, doesn’t practically count as a definition of behaviour). For one thing, I’m not a programming language enthusiast and I don’t read compilers for fun. For another, I can already do that with any compiler, for any language. By that measure, all languages are exhaustively specified and free of undefined behaviour.

                    1. 10

                      My equally hot take is that a language can’t usefully and simultaneously:

                      Both are true:

                      • C++ is vastly more specified than Rust
                      • in practice by and large programs written in Rust are significantly more well-behaved than programs written in C++

                      So, empirically, you definitely can have both 1&2 at least for class of problems “solved by average Google/Mozilla/Microsoft employee”. At this point, this particular empirical claim seems to be proven without any doubt.

                      This of course doesn’t mean that rust is free of UB in some platonic sense — there are open soundness bugs. But, again, this doesn’t matter the least for a wide range of programs.

                      There perhaps are domains where Rust level of practical safety is not enough, and where you need stricter mathematical properties, but I am not familiar with them and not making claims about them.

                      1. 3

                        Those are, indeed, the domains I was referring to, but I am not talking only about safety guarantees per se. Regulated fields run don’t run only on technical correctness, they also run on risk mitigation, insurance money and traceability.

                        Case in point: compiler bugs are a thing, and standards in some fields require formal mitigation against them (e.g. EN 50128). So many companies in these fields have a standard procedure for dealing with compiler bugs, which covers everything starting from when you ask the vendor about their toolkit’s behaviour and/or they notify you they have a bug – you check your entire codebase (ideally automatically, not by hand), document your findings and any fixes that you’ve made, explain how you came up with them (vendor-supplied or your own), how you tested them etc.

                        Part of it is just engineering 101, but this is also meant to leave a paper trail which, if something happens, you can point the insurance company at to show that this wasn’t incompetence or malice, it was, literally, an unfortunate accident, the kind of thing you’re insured for.

                        There is a real (as in I was there) quasi-legal question about whether you can even implement such a process without an authoritative source of what constitutes correct behaviour from the compiler. E.g. if there’s no authoritative source which says if something is correct or not, and the next stable version of a toolkit introduces a breaking change, but the change isn’t deliberately introduced, so they don’t know about it when it’s launched, can they “bless” it as correct and never notify you of it?

                        (Edit: the question isn’t specifically one of correctness, but of accountability. Some proprietary compiler vendor can do that with C++, too, but if you can point at a formal spec, with a formal test suite that the vendor erroneously claimed they passed, then you can push liability to the vendor and their insurer)

                        That’s just an example, I heard tons of questions in this vein. Ultimately, it is a major problem for adoption, because the kind of companies with the resources to swallow this kind of risk are usually large enough, and with sufficiently entrenched insurance and legal counsel, that they’ll treat anything that raises that many questions as radioactive.

                        The “development abstraction layer” is a real thing. There are a million things that happen between when the code is compiled and when you’ve shipped the thing to someone, and those are just as real, and just as important, as the part where the code is written and compiled.

              2. 7

                I think the example the author picked happened to be particularly favourable, ironically enough. There’s a whole class of real-world situations that the Rust documentation currently handles rather poorly.

                Perhaps it could be improved, but I don’t see how a “specification” would help with that. I haven’t read the C++ spec, but my experience with the C spec is that it sucks to figure stuff out from it. Fortunately the Rust spec will simply be an improved version of the reference, so it presumably won’t be as annoying as a typical ISO standard.

                More importantly, this is all but useless in any regulated setting. If I’m conducting a formal review and someone points out an edge case that we’re not certain about, the best-case scenario if it’s not in the stable reference for our stable toolchain is I dig through newer versions of the reference and/or the nomicon, and see if it says anything useful that happens to apply all the way back to our stable toolchain.

                If you’re in a regulated setting, isn’t it very likely that you’re using a certified toolchain like Ferrocene or AdaCore Rust, which have their own specs? Those would take precedence over whatever any official Rust spec might say.

                1. 2

                  Perhaps it could be improved, but I don’t see how a “specification” would help with that. I haven’t read the C++ spec, but my experience with the C spec is that it sucks to figure stuff out from it.

                  Both are written in that peculiar ISO dialect but:

                  1. At least each one is a single document
                  2. If you have a toolchain, you know which one of those single documents to refer to

                  For some things, this simply isn’t the case with Rust. Sometimes the answer you’re looking for isn’t (yet) in the reference, it’s somewhere in an issue tracker, and figuring out if your stable toolchain should behave one way or another requires going through changelogs and diffs. That’s not okay.

                  Just an example – maybe not the most relevant but it’s the most recent I can think of, so I don’t have to sift through old changelogs: pre-1.82.0, some NaN bit pattern guarantees weren’t documented. For someone who’s not a Rust fan that was practically the same as them being undefined. Even if they’d wanted to go out of their way to sift through the source code, there was no way to know how “final” what they saw was (and it was, some of the documented behaviour is old), so they weren’t going to rely on it.

                  If you’re in a regulated setting, isn’t it very likely that you’re using a certified toolchain like Ferrocene or AdaCore Rust, which have their own specs? Those would take precedence over whatever any official Rust spec might say.

                  Absolutely, I was specifically referring to your remark that a specification would be good to have. Ferrocene fills that exact vacuum :).

                  1. 6

                    At least each one is a single document

                    … apart from the implementation-defined parts, the POSIX refinements, the GNU extensions, the CPU intrinsics …

                    I don’t understand your point about NaN because it’s much less well defined in IEEE 754 or C or C++ than in Rust. Why is it bad that Rust is improving the precision of its spec?

                    1. 3

                      … apart from the implementation-defined parts, the POSIX refinements, the GNU extensions, the CPU intrinsics …

                      I’m… not sure what to make of this. POSIX refinements and GNU extensions are platform-specific. Lots of codebases target neither, and if you do target them, both (or at least POSIX?) are at least proper, traceable standards. Both C and C++ are under-specified, sure, and that under-specification proliferates through implementation-specific details like CPU intrinsics and various implementation-defined issues. If those are bad in C and C++, they’re bad in Rust, too.

                      I don’t understand your point about NaN because it’s much less well defined in IEEE 754 or C or C++ than in Rust. Why is it bad that Rust is improving the precision of its spec?

                      It’s not bad that Rust is improving the precision of its spec. As I mentioned in my comment, it’s an example of guarantees that only become useful once they’re (traceably) documented. The documented behaviour isn’t new, but the thing that distinguishes it from an implementation quirk for people who merely use, rather than work on or advocate for Rust, is the fact that it’s reliably documented.

                      Edit: to clarify, the first part, about the single document, was in the context of someone asking how a spec like C++‘s helps with resolving questions about a language’s behaviour, given that it’s written in that annoying ISO language. One way in which it helps is that it’s a spec, a single authoritative document (yes, extended by various platforms, especially the standard library; Rust is going to have some of those, too, by the time it gets popular enough to be a good cross-platform environment), as opposed to a non-authoritative reference and a few dozen open RFCs.

                      IMHO having a spec, even in that annoying ISO dialect, and even if it’s under-specified, is more useful than having no spec, and only a reference that starts with a disclaimer effectively amounting to it not being definitive. If an ISO standard prescribes a specific behaviour and an implementation doesn’t follow it, at least I know it’s a bug, I don’t need to start looking for relevant RFCs or poke around the issue tracker. And I can work around it if I need to (or avoid it), I can track the workaround, I can ask the vendor etc.. If the Rust reference prescribes a specific behaviour and my toolkit doesn’t follow it, I don’t even know which one’s wrong – the reference or the toolkit.

                      1. 6

                        IMHO having a spec, even in that annoying ISO dialect, and even if it’s under-specified, is more useful than having no spec, and only a reference that starts with a disclaimer effectively amounting to it not being definitive.

                        The problem is that “fearless upgrades” part of the Rust philosophy. I can easily foresee “Yeah, the spec says that, but ten times as many crates as yours depend on the implemented behaviour, so the spec is wrong”.

                2. 5

                  nitpick: Nobody has ever referred me to the C++ standard, nor would I do that to anybody :) Even at Google, where multiple C++ spec authors work, I didn’t see people referring each other to the spec

                  For users, it’s generally enough to refer to many excellent books about C++. The best C++ programmers I’ve seen write simple code that doesn’t require any reading of specs.

                  The authors of those books, and compiler writers, may refer to the spec.


                  But yes I think it’s better if Rust has a spec, so that compiler writers and authors can refer to it. And a separate test suite that goes along with the spec, and maybe even a reference implementation.

                  As I mentioned the other day, I don’t think it’s a good situation when the authors of gccrs feel the need to copy rustc exactly:

                  https://lobste.rs/s/7ixd88/c_complexity_compiler_bugs#c_5kdy2f

                  https://rust-gcc.github.io/2024/09/20/reusing-rustc-components.html

                  Borrow-checking is an extremely complex subject, and a core part of the Rust programming language. It is important that gccrs gets it right, and it is important for us not to introduce subtle differences with rustc for such a crucial error pass. Instead of rolling out our own borrow-checking algorithm, reusing one which will be used by rustc allows us to at least reduce the amount of differences we will introduce

                  1. 3

                    nitpick: Nobody has ever referred me to the C++ standard, nor would I do that to anybody :) Even at Google, where multiple C++ spec authors work, I didn’t see people referring each other to the spec For users, it’s generally enough to refer to many excellent books about C++. The best C++ programmers I’ve seen write simple code that doesn’t require any reading of specs.

                    I don’t doubt this is both an accurate description and entirely sufficient for what a lot of teams at Google work. My preference for referring to the standard is driven by the opposite experience. If someone raises a non-trivial question during a review, I prefer to refer to the spec because that’s both authoritative and, if push comes to shove, more easily traceable to the relevant ISO/IEC 14882 (or whatever) test. As in, even if I know the answer from a book, I still try to trace it to the standard. That’s a personal preference, but I’ve worked in two places where it was a requirement in certain cases (e.g. during formal reviews, or when evaluating new toolchains).

                    Ideally, sure, code should be simple enough not to require a spec. But even leaving aside cases when you don’t really have a choice (i.e. it’s old code that you have to integrate, or that you have to use with a new toolchain) people will sometimes raise pertinent questions about the simplest code.

                    1. 6

                      In C or C++ it’s not that difficult to write code that compiles successfully, but isn’t spec-compliant. If you have code that seems to work, you still need to check the spec to ensure that it’s actually supposed to work. Unsafe Rust is in that boat.

                      However, safe Rust is in a significantly different situation. It can’t have UB, so if your code compiles, it has already been checked for “spec-compliance” by the compiler.

                      For safe Rust, the grey area can only be in what a spec would call implementation-specific behavior, but even this in Rust is minimized. More things are outright defined (bytes have 8 bits, unsigned numbers overflow). Unit tests of packages on crates.io are almost the real Rust spec: each compiler release builds and tests nearly all public Rust code, so even when packages hit what happens to be implementation-specific behavior, Rust can keep this behavior, or create warnings for it, or devise some migration path (like editions).

                      Rust having a single implementation has the luxury of blessing implementation-specific behavior as the correct implementation. In C/C++ there are too many vendors and legacy implementations to do that (e.g. recently realloc(_, 0) became UB instead of getting vendors to agree on anything).

                      1. 2

                        Right, I think this is where I’m talking past other posters to some degree, and that’s totally on me because I’m referring to codebases that seem to be somewhat out of scope for Rust’s design process:

                        Rust having a single implementation has the luxury of blessing implementation-specific behavior as the correct implementation. In C/C++ there are too many vendors and legacy implementations to do that (e.g. recently realloc(_, 0) became UB instead of getting vendors to agree on anything).

                        I’m sure that this is a problem for e.g. Mozilla, they have to support everything from Windows 7 to Debian Sid. But maybe five out of ten C or C++ codebases I’ve worked on were exactly the opposite of that: specialised, safety-critical code, targeting a single vendor’s toolkit (more often than not a particular version of it), running on a single designated platform.

                        I’m sure the language’s UB caused trouble even there – I bet some of the standard library code for these platforms was lifted off FreeBSD or whatever, who knows, and had some bugs in the woodwork. But, in general, for new code that I wrote, I could rely on it being compiled with just one toolkit, and not necessarily because we had some formal procedures for it, most of the time there was literally just one vendor supplying a toolkit for that platform.

                        So a single implementation blessing implementation-specific behaviour as correct is already where I’m at half the time. Just like whatever designated C++ compiler I’m using, I’m sure that, if the Rust compiler accepts a statement, it’s checked it for validity and it does something. Like the author of this article, what I would really like for Rust, too, is some authoritative document that tells me what that something is supposed to be.

                        I would like that for two reasons:

                        1. So that I have a formal, authoritative source against which I can check my understanding of a language construct, and
                        2. So that, when we upgrade the toolkit, I know what behaviour I can rely on to remain unchanged from the previous toolkit, and what behaviour has changed (not just because the standard says so, but also because the IEC 9899/18422/whatever test suite says it’s unchanged, with all that entails – i.e. the vendor also gives me a reliable changelog, bug tracking etc.).

                        Unit tests on crates.io and Github issues and whatnot are not a substitute for that. I need something that I can match the generated code against. Something that, if a question ever arises in a formal setting, I can trace our understanding of the code to. The issue isn’t that I doubt what the compiler is doing is correct vs. the language spec, the issue is that I have to doubt wheter what I think it does, i.e. what code it emits, is correct vs. what it actually does.

                        I get that the former is a major concern if you’re at the compiler end, and that it makes Rust infinitely better and easier to target. And it obviously trickles down to me, too (hell that’s one reason why I use Rust!). But I’m not writing compilers, I’m writing device drivers and bit-twiddling firmware. The fact that there’s no doubt that what the compiler does is valid Rust is largely besides the point, what I’m worried about is whether me and the Rust compiler agree on what my code means.

              3. 12

                I think _ is a poor example for arguing for a spec since “The compiler does something surprising” relies so much on how the Rust Reference is structured that I remain unconvinced that a formal specification would connect those dots any better.

                (To be perfectly honest, I’m surprised at “I asked someone with over 8 years of Rust experience, who was equally surprised but gave me an “it depends” rationalization (paraphrased):” because, despite having learned Rust in the wild west days of “v1.0 just dropped. I can now rush to learn something that won’t change under me”, I picked that detail up within my first year or two.

                In effect, “The compiler does something surprising.” boils down to “C or C++ don’t have the concept of assignment being an irrefutable pattern match and this makes assignment in Rust surprising”.

                A let statement introduces a new set of variables, given by a pattern. […] Any variables introduced by a variable declaration are visible from the point of declaration until the end of the enclosing block scope, except when they are shadowed by another variable declaration.

                https://doc.rust-lang.org/reference/statements.html#let-statements

                (This is actually a human-friendly simplification of the more detailed information on the Destructors page.

                Patterns are used to match values against structures and to, optionally, bind variables to values inside these structures. […]

                https://doc.rust-lang.org/reference/patterns.html

                Identifier patterns bind the value they match to a variable in the value namespace. The identifier must be unique within the pattern.

                https://doc.rust-lang.org/reference/patterns.html#identifier-patterns

                The wildcard pattern (an underscore symbol) matches any value. It is used to ignore values when they don’t matter. Inside other patterns it matches a single data field (as opposed to the .. which matches the remaining fields). Unlike identifier patterns, it does not copy, move or borrow the value it matches.

                https://doc.rust-lang.org/reference/patterns.html#wildcard-pattern

                (I will admit they could have added a “REMINDER: Because the wilcard pattern is not the identifier pattern, it does not bind values it matches to a scope and thus does not extend their lifetime” to that entry… but, if anything, “standardese” would have cut against doing that.)

                When an initialized variable or temporary goes out of scope, its destructor is run, or it is dropped.

                https://doc.rust-lang.org/reference/destructors.html

                There are even some related examples under Temporary scopes though I admit that it’d probably be best if you submitted a PR to add an example of let _ = ... in there.

                But often you want a stronger guarantee, which is something an official language specification can give us:

                Now you’re talking, not about clearer documentation, but about pinning down language behaviour.

                Every language specification I’ve seen has come into existence for one of two reasons: To unify competing implementations (C, C++, ECMAScript, etc.) or to make a monoculture language seem more “serious” to government/corporate middle-managers (Java, C#, etc.)

                Of course there also exist situations where it is not entirely clear what a C++ program will do. Often that is deliberate (compilers need to have wiggle room to allow for optimizations). Every so often, it is caused by an omission in the Standard, and then these can be reported to the authorities who maintain it: the C++ Committee. If they agree it is an omission, it will get labelled as a Defect Report, and they will then discuss and debate it, and ultimately come up with a clarification of the text.

                When you don’t control (or even know about) the target platform (or compiler) which will be used to run your code.

                Specs for “unify competing implementations” languages like C++ (funny you should use that example) are actually quite bad at pinning down behaviour no thanks to the compromises needed to get the parties involved in the spec-making process to consent to actually care about the spec.

                See, for example, this post:

                For a stupid example, I used to work at a company with an IBM mainframe. The C & C++ compiler was limited to lines of 72 characters – any character after 72 characters was implicitly treated as a comment, no diagnostic. I am not sure whether this is a violation of the C89 standard – I think that the only requirement is that logical source-lines of up to 4095 characters be accepted – but it’s certainly an unexpected limitation.

                I don’t have any direct experience of embedded vendor compilers; only testimonies that deviations from the standard – or outright missing parts – were the norm, rather than the exception.

                …so I suppose, by that metric, it’s a good thing that the rustc toolchain test suite is probably already much more a “standard” for Rust than the C and C++ standards are for C and C++… especially when things like Crater cut so strongly against fixing “mistakes” in the test suite.

                To allow for competing, alternative compilers such as gccrs, avoiding vendor lock-in, without the risk of fragmenting the language into dialects.

                Even for “poster children for having a spec” like C and C++, you wind up with the language fragmenting into dialects and a bunch of little de facto compiler monocultures.

                It’s basically standard practice for C project authors to pick one (or, if you’re lucky, two) compiler(s) for each platform they want to support and treat compatibility with any other compilers as “WONTFIX: That’s a compiler bug”. (It’s been a while, but I remember there being an announcement that one of the major browsers was switching its supported toolchains for Windows from “MSVC or gcc” to “MSVC or LLVM”.)

                Beyond that, C may be standardized… but Linux isn’t written in ANSI C, it’s written in GNU C and, aside from the parts when the helped clear out use of features like GNU-style VLAs that Linux devs wanted to get rid of anyway, the LLVMLinux project was basically all about teaching LLVM to understand the parts of the GNU C dialect that the Linux kernel uses. Nothing prevents a project from using whatever superset of the standard a specific compiler implements.

                My current DOS retro-hobby project depends on Open Watcom C/C++’s #pragma aux syntax for inline assembly language. Linux uses multiple GNU C extensions.

                We’ve already seen this happen within the async ecosystem, with a “no one ever got fired for choosing IBMtokio” attitude where network effects force everyone who wants to be taken seriously to support tokio for async and the ~10% who care enough about async-std to also maintain support for that with support for other runtimes quickly becoming a rounding error.

                I don’t know about you, but I’m certainly not getting paid to futz around with a compiler I have no intent to dogfood. I write my creations to scratch my own itches and it’s bother enough that, once I’m ready to start publishing the non---bin ones, I’m going to need to come up with an MSRV policy instead of just using whichever stable-channel language constructs minimize the mental effort I need to spend on maintaining my creations.

                A specification isn’t a magical fairy that stops humans from acting like humans and there’s no free lunch.

                In a sense, that is also the purpose of having a standard: creating a stable and reliable platform for building software where new features don’t appear every three fortnights.

                They’re already doing that more reliably than equivalent prose would using the test suite, so… are you saying you’d prefer we didn’t rush to implement bedrock features like async fn in traits that standardizing the async ecosystem is blocked on?

                C was standardized after it was more or less “done” in the eyes of the people steering it. Rust’s v1.0 followed a “Minimum Viable Product” model or we still wouldn’t have a stable v1.0.

                If we standardize now, we’ll probably wind up with that same ANSI C vs. vendor C dynamic, where everyone is using features that aren’t in the standard and “gravitas”, as you put it, is the main purpose of the standard, despite no major codebase that I’m aware of restricting itself to the language ANSI C specifies.

                If Rust were to work this way, Rust Editions would be the way new features get added to the language.

                Editions exist for breaking changes… especially in the context of Crater runs. You’re effectively proposing that we either switch the nomenclature from “Java 1.5” to “Java 5” for PR reasons or bog down delivery of features required to achieve standardization on things that got pushed out into “whatever wins the rat race in the the crate ecosystem is the standard” de facto standardization.

                In my view, having specifications/standards is great for systems programming

                You’re going to have to clarify what you mean by systems programming… but I’ll agree to that on both definitions of the term so long as you agree that the kind of specification/standard that’s great for either one actually works better as something a machine can evaluate, like a conformance suite.

                Otherwise, you get GCC and LLVM disagreeing on the representation of __int128 when there exists a perfectly good PDF saying what it should be under the AMD64 SysV ABI.

                Again, having a standard isn’t a magic bullet.

                POSIX and IEEE-754 aren’t well-adhered-to because they’re standards, they’re well-adhered-to because of what point in the lifecycle of the problem they were standardized.

                I like to point to Eric S. Raymond’s The Art of UNIX Programming from 2003 (Readable free online) as an example of this. Most of the book is still extremely relevant because it covers parts of the UNIX design which were Solved Problems™ in 2003 …but it also recommends:

                • …XML as a standard for hierarchical structured data when the world has now moved on to more lightweight markups like JSON, YAML, and TOML.
                • …DocBook XML when even the lightweight markup designed to have a 1-to-1 mapping to it, AsciiDoc, is struggling to keep up with Sphinx (reStructuredText) and mdBook (Markdown) and a world of other Markdown-consuming tools.
                • …a now-forgotten protocol named BEEP as “a universal underlayer, like HTTP, but for truly peer-to-peer applications”. (Something we still don’t fully have. I’d have suggested XMPP five years ago.)

                …plus, in a lot of cases, POSIX isn’t the standard anymore. “Whatever Linux does” is, because POSIX couldn’t keep up with what people wanted their platforms to offer.

                I’m reminded of how… Bryan Cantrill, I believe… talked about how achieving compatibility with the subtle nuances of the intersection of Linux’s… vfork and SIGCHLD semantics, if I remember correctly, for the purposes of being able to run Linux binaries was a terrible experience.

                – ssokolow @ https://lobste.rs/s/jr48n1/threads_goroutines#c_hbwbvv

                You do remember correctly! Details are described in: lx vfork and signal handling still racey. While this was a bug in our emulation, it was surprising the degree that Go relied upon the murky semantics at the intersection of these two leaky abstractions…

                – bcantrill @ https://lobste.rs/s/jr48n1/threads_goroutines#c_4ulc7w

                In other words, without a clear and authoritative specification, Rust cannot be used to achieve EAL5.

                Honestly, if you want EAL5 when Rust is at this point in its lifecycle, I think Ferrocene is a much better candidate for using MSRV-esque pinned snapshots for the purposes of vetting things for that. A “language standard for Rust itself” is going to be expected to be so many things to so many people that I suspect that whatever we get will disappoint you at this point in Rust’s evolution.

                The vast majority of people I’ve seen calling for a Rust spec this early in its lifecycle are C or C++ programmers who feel threatened by Rust and are acting in bad faith.

                Or maybe Rust even needs an extended standard library.

                I’m going to assume you don’t just mean an official version of https://blessed.rs/

                I see this every few years.

                Rust’s design was informed by things like how Python 2.x’s standard library contains urllib and urllib2 and everyone tells you to use Requests, which depends on a urllib3 which is promised to never be part of the standard library. …or how, until they recently decided to API-break without a major version number, Python had constructs like asyncore when you’re supposed to use Twisted instead.

                There’s a reason that the general sentiment in the Python community is that the standard library is where modules go to die.

                I’m not a Java programmer, but i’m told there’s a similar dynamic at play between its standard library constructs and its ecosystem.

                It’s so bad in C++ that it’s faster to shell out to PHP to do regexes.

                If Rust had followed this principle from the beginning, we’d have situations such as everyone using Serde while rustc_serialize was baked into the standard library. We already have two deprecated methods forever stuck on the Error trait, a linked list implementation of questionable utility, and an MPSC queue that generally gets ignored in favour of crossbeam-channel or Flume since those are MPMC and Flume supports async too. (In fact, IIRC, std::sync::mpsc is now a vendored copy of crossbeam-channel wrapped in a legacy API that can’t do MPMC, thanks to Rust’s unstable ABI allowing std::sync::mpsc, std::sync::Mutex, and std::collections::HashMap being able to have their internals replaced with vendored copies of superior third-party crates.)

                Likewise, I imagine we’d probably all be using lazy_static, but not be as comfortable with it as the once_cell adaptation that just finished getting merged into the standard library.

                You can’t rush these sorts of things and stuff like rand and regex aren’t part of the standard library because evolving APIs is part of the same “long lifecycle with shifting requirements” that defines systems programming in its original meaning.

                (From what I’ve read, what actually defines what gets exposed by std rather than getting used internally is a mixture of “What interfaces do language built-ins like ? and async need?” and “What crates wind up in every project of non-trivial size?” (this is how once_cell got into std once the ecosystem had been given time to settle on a design))

                Writing this took longer than I expected and I have an appointment now, so I’ll do my usual extra proofreading passes in a couple of hours.

                1. 2

                  This would be better suited to a blog post than a comment.

                  1. 3

                    Good point. It’s easy to overlook that when the process that produces this sort of thing is just “Oops. I had more to say in a point-by-point response than I expected.”

                    I’ll have to see if I can make time to tweak the style and focus and make such a post. Maybe make the thesis something along the lines of “Why standardization should not be rushed and why what you believe standardization is can not be rushed”.

                2. 11

                  Carefully threading a needle here: I’m one of the authors of the Ferrocene spec and am happy to answer any questions relevant to this forum :).

                  I think the post sums up the situation very well.

                  1. 3

                    Which is correct “underscore pattern” (ferrocene), or “wildcard pattern” (rust ref)?

                    1. 7

                      None of them are normative, but I lean on the way of the Rust projects lingo, the Book uses it as well. I’ll track that one.

                      1. [Comment removed by author]

                  2. 8

                    Incidentally, the reference doc contains the necessary info to evaluate how let _ = Foo and let _tmp = Foo differ (unambiguously at least as far as I understand this)

                    https://doc.rust-lang.org/reference/destructors.html

                    When an initialized variable or temporary goes out of scope, its destructor is run, or it is dropped.

                    https://doc.rust-lang.org/reference/expressions.html#temporaries

                    The drop scope of the temporary is usually the end of the enclosing statement.

                    https://doc.rust-lang.org/reference/destructors.html#scopes-of-local-variables

                    Local variables declared in a let statement are associated to the scope of the block that contains the let statement.

                    https://doc.rust-lang.org/reference/statements.html#let-statements

                    A let statement introduces a new set of variables,

                    (This seems like a techinically incorrect statement as _ doesn’t introduce a new set of variables, and neither do many of the other patterns)

                    https://doc.rust-lang.org/reference/patterns.html#identifier-patterns

                    Identifier patterns bind the value they match to a variable in the value namespace.

                    _ isn’t an identifier, so it doesn’t have that effect. It would be nice to include the spec for identifiers in the doc rather than pointing at the basically unintelligible unicode identifier standard to make this more clear.

                    https://doc.rust-lang.org/reference/patterns.html#wildcard-pattern

                    The wildcard pattern (an underscore symbol) matches any value. It is used to ignore values when they don’t matter. Inside other patterns it matches a single data field (as opposed to the .. which matches the remaining fields). Unlike identifier patterns, it does not copy, move or borrow the value it matches.

                    So given that info:

                    • let _tmp = Foo binds _tmp to Foo extends the drop scope to the end of the block.
                    • let _ = Foo does not bind Foo, so the drop scope is the end of the let statement.

                    Incidentally, the Ferrocene spec calls the “wildcard pattern” the “underscore pattern”, which might be easily confused with the underscore expression which is used in assignment statements (not let statements).

                    None of this refutes the rest of the article.

                    1. 5

                      These are officially standardized, which means they have a language specification that is codified in an international standard

                      Does this result in or mean anything? Other than having people shelling out money to people uninvolved in the standard development.


                      Should Rust have a specification for implementers and users, at all times, readable by all? Or a specification only for implementers who have companies able to pay for them to read and work on the specification?

                      C is already a no-go for me, because I’m poor. If I want to raise an issue or contribute, that is not even something I’m allowed to think of. It feels absurd to even consider that style of management for Rust’s future. Especially when there are alternatives. (as noted in article)

                      1. 5

                        Does this result in or mean anything? Other than having people shelling out money to people uninvolved in the standard development.

                        Separating the idea of a standard from the specific implementation of ISO, having a single written document that has broad consensus among technical experts of different backgrounds provides confidence that the documented program’s expected behavior is well-understood even in unexpected conditions.

                        Whether a standard is “international” or not IMO doesn’t matter that much, though it can be nice to have a deeper well of implementations available to draw on.

                        Should Rust have a specification for implementers and users, at all times, readable by all? Or a specification only for implementers who have companies able to pay for them to read and work on the specification?

                        It’s difficult to imagine a world in which a Rust specification is paywalled in the same way as ISO C. Given the backgrounds and motivations of the people involved, a standard Rust would probably be via ECMA, which publishes its standards freely.

                        C is already a no-go for me, because I’m poor. If I want to raise an issue or contribute, that is not even something I’m allowed to think of. It feels absurd to even consider that style of management for Rust’s future.

                        The challenges with contributing to C seem to revolve more around extreme bureaucracy and resistance to change, rather than access to the standard – anyone motivated enough to fight for years to get a small feature included in the standard[0] will probably not be deterred by a few hundred dollars in licensing fees.

                        Unfortunately Rust is not immune to similar pressures caused by a feedback loop between lack of core developers -> insufficient time to review proposals -> ever-growing queue of RFCs and ACPs -> loss of core developers from burnout. As someone interested in contributing to Rust, there’s no path by which I could obtain a commit bit within a 5-year timeline and it takes >6 months to get feedback on even minor proposals, so I’ve given up on trying to contribute to rustc or the standard library (except minor checkbox-wrangling exercises such as feature stabilization).

                        [0] https://thephd.dev/finally-embed-in-c23

                      2. 5

                        Any specification effort should go into improving the existing specifications, the Rust Reference and the Rustonomicon. Maybe just rename the Reference, call it the Specification, and declare success?

                        If it’s hard to understand corner cases from the text in the spec, a different spec will fix this how? C standards lawyering has been a sport for 35 years and C is a smaller language than Rust.

                        What’s more, in some areas, such as what the precise aliasing rules in unsafe Rust are, what rules work best is still being figured out.

                        How will a Grand Specification Effort make it easier to clear up these murky areas? Isn’t it easier to clear them up in the context of the existing spec?

                        1. 5

                          Any specification effort should go into improving the existing specifications, the Rust Reference and the Rustonomicon.

                          This is what they are doing per the note here: https://github.com/rust-lang/spec

                        2. 4

                          Why a human-readable specification? It’s going to have ambiguities like C++ has. What about using all the other formal methods, that are checkable by machines? For example, you could make a kernel language like Metamath0 and then build Rust on top of that using a verified compiler.

                          1. 5

                            Beyond the fact that it’s comprehensible by mere humans?

                            Jest aside, I’ve looked at quite a few approaches and my first one would have been to stabilise some for of MIR and then specify Rust as a set of transforms to MIR. Uff, hard to convince a moving project of that for a fringe project.

                            An accurate machine checkable spec is tons of work to create, scale and actually agree that those are the right semantics.

                            But one doesn’t just walk into a room and makes a machine checkable spec for something as vast as Rust.

                          2. 3

                            As a user of Rust, I’m all for an official spec. But IMHO, this is not the core the of the problem… For me the Rust ecosystem is like the Haskell ecosystem, and I mean that in a negative way.

                            In Haskell, there is GHC and second-class citizen compilers. There is always that one person going “nah, there other compilers” and “we have a spec”, very few people uses them. Haskell is basically a GHC monoculture.

                            I have the same issue with Rust. There is rustc and that’s it… The other are either niche toys, or commendable herculean efforts to play catch-up with with rustc.

                            As long as we have a compiler mono-culture, we’ll have a crate ecosystem that mostly only compile with rustc.

                            1. 5

                              Network effects make it so that almost always a natural monopoly like that anyway. Even in basic languages like C, you immediately start hitting issues when you’re using a different toolchain than one best supported for a given platform/use-case. E.g. gcc on MacOS, etc. or any cross-compiling. With C++ it is really severe. Sure it can usually be bandaided and got to work, but it’s clear that specification does not help as much as people believe.

                              1. 2

                                ?? In practice, the situation with C and C++ is very different than the situation with Rust

                                There are multiple independent compilers, and it does improve code when you compile with 2 or more compilers. And many projects do.

                                1. 10

                                  In practice when I need to compile Rust project or especially (cross-compile) it always works (except if it has C or C++ deps), while with C and C++ over and over I’m hitting issues due to compiler/stdlib bugs, some -Werror somewhere etc.

                                  I don’t want multiple compilers. It gives me literally nothing. I want one that actually works well.

                                  There’s probably nothing that can slow down process of a software project than working on specs and multiple implementations of the same thing.

                                  1. 2

                                    I mean that’s certainly a valid opinion, just like the opinion that there should be only one browser, like either Internet Explorer, Chrome, or Firefox

                                    i.e. Some people think that writing code for multiple browsers is a waste of time too

                                    I don’t though – I think it improves the code, because you depend on an interface, not an implementation. IMO, tying yourself to one platform saves time in the short term, but limits you in the long term.

                                    1. 6

                                      Do you have a concrete example of how the current rustc monoculture negatively impacts the implementation/ecosystem/users?

                                      1. 3

                                        As mentioned, I think it’s more of a long term thing rather than a short term

                                        Languages basically have to start with one implementation – C++ did, Python did, etc. It’s too hard to do anything else!

                                        But I think eventually it’s a good idea to move toward more spec-driven development. Like the other reply, I would frame it more as “lack of positive aspects of diversity” rather than “negative ones”.

                                        I see very few people arguing in this thread that Rust should have just one implementation. I have seen people argue that for browsers, even though it is a minority opinion. Those people are kinda focused on their own work, not on the ecosystem – i.e. “I don’t like testing for multiple browsers”


                                        But yeah most languages have one implementation, and people use them and get tremendous value out of them

                                        But I’m saying that, in practice, C and C++ are different than Rust, for this reason. Apple and Google funded Clang/LLVM for decades, because GCC didn’t do what they wanted, and that turned out to be a good thing. Clang/LLVM completely changed the C++ ecosystem, with a modular design and better tools.

                                        And there are definitely negative sides to C++ standardization, which I pointed recently - https://lobste.rs/s/7ixd88/c_complexity_compiler_bugs#c_5kdy2f

                                        But I think the answer is to do better (testing, executable specs) – not to have no spec!

                                        1. 2

                                          an alternate reality can’t really be talked about with concrete examples.

                                          So I’m going to use Python as an example instead.

                                          Recently, some stuff let up to the implementation’s choice was brought back into the Python specification itself, (while I haven’t checked any of the discussions because I’m lazy) this likely was made possible by the fact there are other implementations who have made known what specifics they needed to be choosy about, and what they didn’t mind leaving to be codified.


                                          JIT in Python has had probably 10+ implementations, before being accepted into CPython. All with different trade-offs, PyPy, Pyjion, Numba, Cinder, IronPython, Jython, so on and so fourth. All of those had lessons for later implementations, some of them are completely diverse in their origins. Rust doesn’t have access to such excessive methods of incubation as far as I know.


                                          I don’t think it can be said there is a negative impact. I do think it can be said that there is possibly a lack of positive impact from potentially not being able to pull and codify from a diverse set of implementations for almost every idea possible.

                                        2. 5

                                          An open source toolchain is not a platform. There is no “vendor lock-in” into Rust open source compiler. Anyone can click a fork button on github and “break the lock-in”.

                                          And our resources are very scarce, given how ambitious and internally complex the language is.

                                          1. 2

                                            Its still lock-in if you can’t maintain a divergent fork in that case. If all you can do with your fork is practically repeatedly pull from upstream and apply some minor patches, you are still ending up in the same situation (As can be seen in the unrelated Wordpress drama with Matt insisting a person should capture as much profit as they can from projects they have control over, and yet claiming they are okay with Wordpress being forked, an implication being they don’t think forks could ever be practically viable).


                                            The GPL/et/al also acknowledges this, having the ability to look at and modify source code doesn’t mean much if you personally can’t do it, hence why sharing is considered a basic freedom: You can enlist the help of others more knowledgeable.

                                            The question isn’t if the code can be forked, its, if the code can be forked plus the community can be forked to the point where the divergent fork is viable. Not a small undertaking for a complex behemoth at all.

                                            1. 3

                                              If you can’t gather enough community to maintain a fork, how wouldn’t have enough community to maintain a whole separate implementation? Does the universe magically awards extra developers to people who want to do extra useless effort?

                                              These a reason c and c++ are an inconsistent garbage since forever and always will be, despite decades of being worked on. And it’s in large part a culture of spec design by committee and flushing human effort down the drain maintaining so many implementations.

                                              If there ever a good reason to fork Rust someone will do it and there will be enough community around it. And it will be a new language, not wasted effort of maintaining redundant impl. for the same language.

                                              1. 1

                                                There are languages which do have practical single / near-single person dev teams for forks and implementations, The obvious ones being C, Javascript and Python.

                                                While there are objections to this, I do think instead that having a language that is either well defined enough (like JS) or understandable enough when being pulled apart (Python’s reference CPython), is a good thing, both in allowing people a simple escape hatch, and in more easily allowing future contributors to implementations and ecosystems (including to the reference implementations too).


                                                I’m not advocating for ISO style specifications, I’m against anything ISO or anything similar to ISO.

                                                I do feel that is somewhat unrelated to if one should consider it beneficial that a language and implementation is complex to the point of impeding forks and alternatives.

                                      2. 9

                                        In practice, a lot of code in the open source Unix world relies on GCCisms, and Clang has to support most GCC extensions. That a specification exists is nearly irrelevant.

                                        Edit: Hell, a lot of code don’t even work unless you’re using whatever libc they’ve tested with! I’ve had to patch a decent number of libraries and programs to make them compile with Musl libc. Sometimes they intentionally use a glibc extension, sometimes they accidentally use a glibc extension, sometimes they use something standard like intN_t without including the appropriate header, because whatever libc they use happened to indirectly include those symbols via some other header file.

                                        And yeah, they’d catch those problems if they tested with more different toolchains, but that doesn’t really have anything to do with having a spec. If you test your code with both rustc and gccrs, your code will be work with both of them. That testing with more toolchains makes your code more portable is tautological. If having a specification helps with portability between compilers, then you wouldn’t have to test with multiple compilers!

                                        In practice, once gccrs is ready, they’ll probably set up something like crater for it, compiling and testing every single known bit of open source Rust code every day. Should help keep rustc and gccrs working the same for all code much more than a spec would.

                                        1. 2

                                          There is a lot of code like that, like Linux, probably because they care a lot about the exact assembly code generated

                                          But there is also a lot of code like Lua and sqlite, which are completely portable C. And thousands of portable libraries for images, audio, video, networking, etc.

                                          IME C/C++ applications are the things that tend to be unportable, because they deal with the “edges”. But libraries can be VERY portable.

                                          Oils is completely portable C++ and POSIX, except for 2 or 3 extensions/assumptions we document - https://www.oilshell.org/release/0.24.0/doc/portability.html (in progress, but that’s roughly it)

                                          1. 3

                                            Would it compile on Windows MSVC assuming sufficient POSIX shims?

                                            1. 1

                                              A different / more concrete answer - we have an idea to make a “pure” version of Oils, without I/O, for config file evaluation

                                              e.g. https://github.com/oils-for-unix/oils/wiki/Survey-of-Config-Languages

                                              Basically we can factor out fork() and all that, and make “liboils”.

                                              That would definitely compile under MSVC. It’s plain C++.

                                              I’m not sure if that will happen, but the codebase is still small and flexible, so we’re not too far from it.

                                              (This is unlike CPython, where it takes a decade+ to make something like “subinterpreters”, because the codebase is just big and large.)

                                              1. 1

                                                In theory yes … I’m not sure what environments are like that though. The shell uses fork(), so you can make it compile by providing fork(), but the kernel has to support it at runtime too.

                                                It will definitely compile and run with WSL – people have done that!


                                                I am interested in how bash works on Windows, and git for Windows, etc. As far as I know there have been several strategies over the years. There was Cygwin, and MSYS, and WSL, etc.

                                                (For anyone who has knowledge/interest, it might be a little early, but we have a grant money. I think Windows support is something that naturally falls in that category, since it’s well defined and requires specialized knowledge)