1. 15

  2. 2

    Again, the first two lines in the Rust code seem excessive just to get faster output.

    The code in question is:

    let stdout = io::stdout();
    let mut sink = io::BufWriter::new(stdout.lock());

    I wonder why this is not let mut sink = io::BufWriter::new(io::stdout().lock()); or even just io::stdout().lock() at the use site?

    1. 8

      let mut sink = io::BufWriter::new(io::stdout().lock());

      This is a particularity of the lifetime system, as it tracks the lifetime of bindings in relationship to the scopes they are in.

      Let’s see what happens here:

      1) you get a handle to stdout

      2) you retrieve a lock handle from stdout

      3) you pass the lock to the buffered writer

      What Rust further deduces:

      • For the lock handle to work, the stdout handle needs to be present as well. For the program to be sound at all times, the lock handle must be destroyed before stdout handle. Or, in lingo: “stdout has to outlive the lock”. (This is appearant from the defintion of StdoutLock, which has a lifetime bound to its inner reference to Stdout: https://doc.rust-lang.org/src/std/up/src/libstd/io/stdio.rs.html#339-341)

      Now, remember that Rust mostly tracks lifetime of bindings. As you don’t bind stdout in your example, it’s unclear which scope it is active in. In will be scoped to the statement only, so the result - that depends on it - will live longer then the handle to stdout. Now, how are you going to figure all that out?

      Let’s have a look at the error message, especially the upcoming one in Rust nightly:


      error: borrowed value does not live long enough

      help: consider using a let binding to increase its lifetime

      This might seem cumbersome at the beginning, but I enjoy it, as the compiler forces you to bind and name everything that has significant interplay with lifetimes. For example, given the rules how Rust drops these values, it gives a nice symmetry:

      let stdout = io::stdout();
      let mut sink = io::BufWriter::new(stdout.lock());
      drop(sink); // compiler-inserted
      drop(stdout); // compiler-inserted

      Continuing there, why don’t we need to bind the lock? Simple: BufWriter consumes the lock immediately and takes ownership of it, so BufWriter is responsible for dropping the lock from then on. In this case, the lock has no significant interplay with the lifetimes in that scope, that is taken over by the sink binding. It doesn’t need to outlive the statement.

      Finally: borrow checking in Rust is a constant work in progress, so with upcoming compiler changes, Rust might detect more of those cases as valid. I’m not sure about this specific one.

      For further reading on current work being done, especially on making lifetimes non-lexical: http://smallcultfollowing.com/babysteps/blog/2016/04/27/non-lexical-lifetimes-introduction/

      Interestingly, I think you accidentally found a very nice example for Rust lifetimes in action :).

      1. 1

        If we were to call let l= io::stdout().lock(), that lock() really ought to be able to consume and take ownership of the stdout(), no?

        1. 3

          if Stdout’s .lock() method consumed self, you’d never be able to do anything with it after the lock went out of scope.

          1. 1

            Exactly that.

            Though you could write a lock that you could consume and get stdout out again. But that’s terribly inconvenient and is definitely worse from the binding the lock to stdout.

            1. 1

              Sure - but in that snippet we can’t do anything with it after the lock goes out of scope anyway, because we don’t even have the stdout. Shouldn’t there be a syntax for calling lock() as a move?

              1. 3

                Maybe the confusion here is what ownership means in Rust? It’s really just shorthand for “is the exclusive owner that is responsible for deallocating and from which any other reference must be borrowed”. Stdout’s .lock() is defined as impl Stdout{ fn lock(&self) -> StdoutLock }, meaning it takes as its sole argument a reference to some piece of memory owned elsewhere containing a Stdout object. Rust’s semantics provide the memory address of the object to the left of the method invocation as the self argument when this method is called.

                If it were instead defined as impl Stdout{ fn lock(self) -> StdoutLock } then the target of the invocation would instead be said to move into lock(), which would take ownership of it, and this would only be callable if there were no outstanding references to the object at call time.

                The point being: Authors of methods determine the ownership semantics of their code when they write it. Having a syntax that forced any given method to take ownership of a piece of memory when it expected only a reference would significantly complicate the ability of authors to reason about their own code, particularly w.r.t. its performance and visibility. Assuming you could even maintain Rust’s safety guarantees under such an imagined syntax, since it would wildly alter the meaning of a callee based on the whim of the caller, and I’m not sure that you could do that safely in every case.

                So you’d have to provide a second method in the standard library to cover that case, and I don’t know that it would really be worth it just to satisfy some kind of dissatisfaction with needing to name an extra variable in the rare case you want the lifetime of an object to exactly equal the lifetime of the lock being held on it. Really, how often in practice does a mutable shared resource get locked precisely once and disposed of rather than unlocked?

                1. 1

                  Let’s put aside the self-ness for a moment and imagine we had a static lock function that borrows its argument. That function is perfectly happy to be called via:

                  let result = {
                    let s = stdin()

                  right? The lifetime of s is this inline block, the borrow is valid. So surely it would be possible to provide some syntax sugar for this - as a strawman proposal, something like let result = lock(<- stdin()). That’s not a special-case thing for locking, it would be usable when invoking any function that takes an argument by borrowed-reference.

                  (Making it work for self as well requires a little more syntax maybe, but nothing fundamental, no?)

                  1. 1

                    At the very least, it would imply that any function called with “stabby syntax” would have to appear twice in the binary, once in its original borrowed reference form, and once in owned form. It’s that or you bloat the binary at every “stabby syntax” callsite.

                    Either way, the compiler has to insert the call to free stdin at the end of lock(), and it didn’t know it had to do that when it compiled lock() based on lock’s definition, so this is somewhat like the C++-template situation, where you’re overcompiling the same thing several times and pushing responsibility for unification down to the linker.

                    I don’t know, there’s no universe in which a syntax that lets a function’s semantics vary wildly based on what happened at the callsite doesn’t strike me as icky.

                2. 2

                  Where would you move it to? (Rust-semantics wise)

                  StdoutLock is very clear that it borrows stdout and takes no ownership and depending on the precense of a stdout.

                  You could provide a second method that does just that (e.g. consuming_lock()), but given the rarity of such cases where you want to aquire the stdout-lock and never release it, I don’t see why?

                  By the way, I’m pretty sure that the buffering and locking of stdout isn’t necessary anyways, as stdout is already buffered and implicitly locked and there’s no second party writing to stdout. lock is just for controlled locking. I found no noticeable performance difference by locking stdout.


                  Buffering stdout is by the way the reason for this behaviour: https://lobste.rs/s/oxuwzf/rust_vs_c_fine_grained_performance/comments/ygxpuc#c_ygxpuc

        2. 2

          Has anyone notices that ./cc.bench and ./rs.bench have very different output behaviour?

          While cc.bench has very smooth output, the Rust output seems to come on blocks. It’s more noticeable when running under valgrind memcheck (yes, I had to).

          1. 1

            A few minor notes:

            • return is(c) { Some(f(c)) } else { None }” (right above footnote 4) should be “return if is(c) { Some(f(c)) } else { None }”.
            • With respect to compilation times and redundant parentheses, it’s always been my experience that parsing and typechecking are very fast (adequately fast, at any rate); it is, as frequently claimed, just the LLVM optimization step that takes a long time, and failures at that point are rare and indicate a compiler bug. If the compiler spits out warnings you want to fix, it will do so almost immediately, and you can cancel compilation before it gets into the time-consuming bits.