1. 67
  1. 19

    Huh, interesting article :)

    Not to take away from the interesting content, but the headline isn’t a surprise, like many (most?) typed languages

    • Rust’s type system is turing complete

    • You must typecheck rust to compile it (otherwise you can’t do things like #[no_mangle]pub static X: usize = mem::size_of::<T>())

    • Therefore compiling rust isn’t even computable.

    • Therefore rust is trivially NP-hard, and exptime-hard, and expspace-hard, and …

    1. 3

      I’d argue that the type system, as actually implemented, is barely not Turing-complete, as you have to set a fixed recursion_limit in advance.

      1. 21

        That’s still “morally” Turing compete. You still can’t determine whether or not a program will halt natural or will only halt by hitting the recursion limit. From there you get Rice’s theorem and all that other fun nonsense.

        1. 3

          Rust is a Turing completionist language.

        2. 3

          Fair point.

          I’d have to check, but I think that with this stabilized it should be possible to make a procedural macro that expands to a string of usize::max() and then set #![recursion_limit = usize_max_macro!()].

          At that point I’d argue we’re as turing complete as a full rust program is. On any given system there is a recursion limit equal to 2^(machine pointer size) but that’s not hugely different from the fact that on any given system there is a memory limit of 2^(machine pointer size).

          Incidentally with const generics there also seem to be plans to give users the ability to disable the current instruction limit that prevents using loops to achieve turing completeness, but that’s not stable yet (not sure if it’s there in nightly).

          1. 3

            Any real Turing machine will have a finite tape length too though. I think this has been recently discussed here.

            1. 3

              Well, there are languages where the spec imposes no limit on memory usage. For example, you can write code to generate an infinitely long linked list in JavaScript, and, to my knowledge, there is nothing in the spec which requires that you would ever run out of memory. Compare that to C, where all object references are on the form of a fixed-size pointer. The spec doesn’t specify what the size of a pointer should be, but it does require that a pointer is some finite number of bits long, so you fundamentally can’t express an infinitely long linked list in C.

              Obviously an implementation of JavaScript wouldn’t actually be able to store an infinitely long linked list. But the difference is in whether the language even lets you express it or not. A machine with an infinite amount of memory could store an arbitrarily long linked list in javascript, but not in C.

            2. 2

              Well, by that same logic, C isn’t Turing complete because you have to set a fixed pointer size in advance which sets a hard limit on the amount of memory you can have allocated at once.

              And that’s true, but we usually consider C “fairly Turing complete” even though it technically isn’t.

              1. 3

                C can do I/O. This effectively mirrors the infinite tape: you can always read and write data to some external infinite storage medium. I don’t believe the Rust compiler can (Rust has a sane module system, rather than a small extension to cat). The C preprocessor can perform input (and can include files whose names are specified by macros from other files, thought there’s normally a fixed recursion limit) but it can’t write some output to a file and then read it back with a subsequent #include, so the total state is always bounded.

                In C++, parsing is not distinct from type checking. It is in Rust, I believe, because Rust has generics that must type check for any possible instantiation, whereas C++ has templates that may or may not type check for any given instantiation and so producing the parse tree requires some type checking. This means that even parsing C++ is undecidable.

                1. 2

                  Since the title is “compiling rust”, beyond the type checker there are mechanisms in Rust to execute arbitrary Rust code and read random files at compile time

                  1. 1

                    C can do I/O. This effectively mirrors the infinite tape: you can always read and write data to some external infinite storage medium.

                    Hmm – is the amount of storage addressable via C’s I/O capabilities truly infinite? I’m no theoretician, but it’s not immediately obvious to me that it is…within a file your random-access ability is limited by the size of loff_t or whatever, though you can of course extend that by spanning multiple files and adding more “address bits” in the names of your files. But even setting aside limitations like PATH_MAX and such, it seems like whatever storage you’re accessing via I/O operations (even if you expand beyond the filesystem and start using network URLs or something), wouldn’t your “addresses” still need to fit in memory and hence still be finite? (Even if it grows you from 2^64 to 2^(2^64), say.) I suppose maybe you could cook up some scheme for storing secondary addresses in primary storage, tertiary addresses in secondary storage, etc…but then it seems like you’d need to have magic I/O primitives that could innately handle the N-deep indirection. But even given an I/O operation that takes that indirection-count as a parameter, wouldn’t the representation of that number still need to be finite? [End ramble – I’m probably out of my depth here, but it’s interesting to think about…]

                    1. 1

                      It depends on the hardware interface, and (unlike david) I take objection to calling any access to storage ability part of C (C the language doesn’t guarantee access to any hardware), but for the right interface it’s truly infinite.

                      You just need a hardware interface where you never have to send an absolute offset, that hardware is the tape, and the rest of the machine is your finite state/head. E.g. if you’re given access to storage device where you just say “give the the previous byte” or “give me the next byte” (the classic turing machine move left/right ability). More realistically you might make a cloud API with that sort of interface for your tape.

                      1. 1

                        For random access, you are limited to a fixed displacement, but C FILEs are streams: they can model an infinite tape (within the real universe, you obviously can’t actually implement one, but C can read from an arbitrarily long one and advance the read pointer one step in either direction for every read).

                    2. 1

                      Technically you don’t set a fixed pointer size, the compiler chooses one.

                      As long as you never do anything that assumes a maximum pointer size your program should be able to use as much memory as is made available to it by the compiler (by picking pointer size) + hardware (by actually giving it memory).

                      I think that’s a step up from the user defined limit being discussed. With a constant user defined limit the program is limited in it’s number of steps regardless of what hardware you run it on. Which is why I suggested the “proc macro usize::max()” workaround in my own reply (which mimics the C behavior in that it then scales the complexity of the problems it can solve by the complexity of the hardware).

                2. 16

                  Compiling a lot of languages is actually (at least) NP-hard, simply because of the type systems involved. I’ve created a little overview a few months back, when it turned out that Swift’s type checking was undecidable.

                  1. 5

                    That’s a wonderful overview, thank you! It’s easy to read a lot of type theory papers and then say “okay but how does this apply to the real world”, and a reference like that really makes it easier to connect theory to more concrete ideas.

                    1. 1

                      Lovely overview. Thanks!

                    2. 4

                      I naively assume that in practice nobody goes for n>4. It would be mildly interesting to search all of crates.io to find the most complicated pattern match in use in the wild.

                      1. 6
                        1. 4

                          I do love the random // TODO: shouldn't be necessary comments on the ends of some but not all of those lines. I agree, yes, in a civilised world types this long should indeed not be necessary! ;)

                      2. 2

                        Based on my experiences with Typescript, I’m guessing compiling it is NP-extrahard.

                        1. 1

                          It’s undecidable (and its type system is actually unsound).

                          1. 1

                            Yep it was a bad joke, I saw your very useful link. Anecdotally, I see this occur most frequently when a function has one or more type parameters and is overloaded. I actually found a case yesterday when I couldn’t actually get it to select the right overload, even with explicit type parameters and argument types.

                            1. 1

                              and its type system is actually unsound

                              Doesn’t this make it very decidable, like O(1) decidable.

                              1. 1

                                In what way?

                                1. 1

                                  If it’s unsound, you should be able to prove anything, so just always return “this doesn’t typecheck” or always return “this is an int” and you’re not wrong.

                                  (I may be thinking of types as a proof system slightly more than they actually are)