1. 23
  1.  

  2. 15

    Vec::with_capacity is surely a wrong way to create a buffer with specified size. Capacity is only an optimization hint. Vector should have required number of elements (with any value, probably mem::uninitialized can be used here) at the point of passing to foreign function, not a capacity.

    Problem is exactly inside unsafe, not in safe code before it: buffer.into_raw() call on vector with 0 elements, expecting that it would have BUF_SIZE elements.

    Fix in the end of article is exactly that: resize causes vector to actually have BUF_SIZE - 1 elements, not just to have preallocated memory after 0 elements.

    “Avoid FFI” FFI is unavoidable in any language except C, and in runtime-heavy languages it’s even trickier and more dangerous.

    1. 12

      Small nit: I don’t think it’s accurate to say that capacity for Vec is merely a hint. There are a lot of interesting guarantees on Vec with respect to allocation and capacity: https://doc.rust-lang.org/std/vec/struct.Vec.html#guarantees

      1. 1

        I assure you I do understand the difference between size and capacity. I just dind’t expect CString::new to reallocate the buffer it was given.

        I agree that FFI ultimately can’t be avoided. What I was trying to suggest is: consider other options before reaching for FFI. The blog post looks at a very specific bug in a contrived piece of code, but in reality, I did take a step back and consider other ways to solve my problem.

      2. 10

        Yeah, any time I still use a debugger with Rust is when I integrate with C. It’s a tough spot where Rust safety doesn’t apply, and you need to be mindful of what Rust does and what C expects, at the same time.

        For the OP I suggest just not using CString at all here, since the reallocation it may perform and its own lifetime just add more things to be careful about. Adding of a NUL byte is not magic: you can do vec.push(0).

        vec![0u8; len] is well-optimized in Rust, and that’s a safer way to get a buffer of a guaranteed size, without risk of UB from uninitialized memory, etc.

        1. 1

          Adding of a NUL byte is not magic: you can do vec.push(0).

          I don’t program using Rust, but this solution seems like it could be fragile. It depends on what the string handling functions expect.

          Like, if you’re printing the resulting string to the terminal via Rust, and just using C as an intermediary, you’d have to wrap all of the C string handling functions with a push / pop function, or the Rust functions have to be intelligent enough not to try to write the zero byte to stdout.

          Probably the second reason I’ve seen to date for using Pascal strings is that NULL bytes can do weird things. That being said, I do deeply appreciate the ability of C to split one string into multiple strings in-place without having to ask for more memory and shuffle the entire string rightwards.

          1. 3

            The CString type does exactly two things: returns error if the Rust string has an internal NUL, and appends a NUL terminator. It’s for sending a Rust string to C. In this case OP was confused, and tried to use it to take a string from C and use it in Rust (the opposite direction). It just doesn’t work that way, which is why the code just did wrong thing and crashed.

            As for forgetting to “pop” a NUL terminator from a C string: it’s not practically possible. When you receive a string from C, you don’t know its length, so you’re forced to search for the terminator. Rust strings always have a length, so you just can’t use a C string by accident. The standard library has conversion functions for this, so you don’t even need to worry you’ll write one with an off-by-one error.

            I do deeply appreciate the ability of C to split one string into multiple strings in-place

            Oh, but it can’t. It needs to modify the string. It can’t split “AABB” into “AA” and “BB”. Rust can, because there’s no terminator, and the length is stored in the pointer.

        2. 3

          This program prints a uint64_t value using the %llu printf format. That’s not portable code, because on a 128 bit platform, you might expect ‘long long unsigned int’ to have 128 bits. I don’t know Rust, but in C, you would either use the PRIu64 format string for printing a uint64_t, or you would use ULONG_LONG_MAX (a non-standard GNU macro) or (unsigned long long)(-1) to portably construct the largest unsigned long long value and then print it using %llu.

          1. 1

            Thanks! I felt a bit uneasy assuming the sizes of types in the “production version” of the code from the post, and now I know why. I’ll try to finish the port to Rust before 128-bit machines become popular :)

          2. 3

            I feel like there has to be a better solution than this.

            CString::new(buffer).ok().and_then(|buffer| {

            Somewhere around there I think we’ve started causing problems.

            1. 1

              I’d probably do let buffer = CString::new(buffer)?;

              1. 5

                From my understanding, the whole bug is caused by using CString. The first argument to snprintf is not a string. It’s incorrect to create a string to pass it.

                I went back and found another mistake in the write up.

                we convert the CString into a pointer to its one-byte buffer using as_ptr;

                But only the format string uses as_ptr. There’s no such call for the the buffer that I can see. So I’m not sure whether I should be studying the code or the explanation.

                1. 12

                  as_ptr gets a *const libc::c_char from the CString, and doesn’t semantically transfer ownership. The OP is using CString::into_raw to get a *mut libc::c_char to pass as the buffer to snprintf, which does semantically transfer ownership to the raw pointer. Once the call is done, ownership is passed back to the CString via from_raw (and the length of the string is recomputed).

                  While I think foo.ok().and_then(...) is generally perfectly fine in Rust code (“convert a result to an option, and then transform the value inside the option into another option”), the code is otherwise overcomplicated. Here’s how I’d write it, without changing the function’s type signature:

                  fn obtain_answer() -> Option<String> {
                      const BUF_SIZE: usize = 1024;
                      
                      let format = "The answer ... is: %llu\0";
                      let mut buffer = [0u8; BUF_SIZE];
                      let n = unsafe {
                          libc::snprintf(
                              buffer.as_mut_ptr() as *mut libc::c_char,
                              BUF_SIZE,
                              format.as_ptr() as *const libc::c_char,
                              std::u64::MAX,
                          ) as usize
                      };
                      if n >= BUF_SIZE {
                          None
                      } else {
                          String::from_utf8(buffer[..n].to_vec()).ok()
                      }
                  }
                  

                  (The OP doesn’t check the return value of snprintf. Probably in this case for the given format string there is an upper bound on how big it is such that it will never exceed BUF_SIZE, so it might be okay.)

                  CString has been known to trip people up. The other common problem is CString::new(...).as_ptr(). Since as_ptr() isn’t semantically transferring ownership, the CString::new gets dropped, and you’re left with a dangling pointer. We added a warning to the docs specifically for this.

                  1. 5

                    CString has been known to trip people up. The other common problem is CString::new(…).as_ptr(). Since as_ptr() isn’t semantically transferring ownership, the CString::new gets dropped, and you’re left with a dangling pointer. We added a warning to the docs specifically for this.

                    Flashbacks to c++ string::c_str. :(

                    1. 1

                      OP here. Honoured to have my code rewritten by Burntsushi!

                      I didn’t realize I can convert slices to pointers this easily; I’ll go rewrite my code again to use that. That’s what I had in the original C++ version, anyway.

                      I don’t check snprintf‘s return value because this is just a blog post. The “production version” does the check, and retries with an appropriately sized buffer if 1024 bytes weren’t enough.

                      1. 3

                        Awesome, good stuff. :-) Good luck!

                        But yeah, ffi can indeed be tricky. It takes some getting used to. I’m afraid these are probably the easy cases. The tricky parts come when you try to bundle an ffi API into a higher level safe Rust API. Paying attention to Send and Sync is particularly important! It can be quite easy to get stuff wrong, and getting it right usually requires 1) a solid understanding of Rust’s safety guarantees, 2) a paranoid reading of the C library’s docs or 3) if the C library isn’t well documented, a careful read of its source code.

                    2. 2

                      The first argument to snprintf is not a string.

                      Yeah, it’s only now that I finally understand this. C let me pretend like the buffer is a string, but Rust requires a bit clearer thinking :)

                      I went back and found another mistake in the write up.

                      Will be fixed shortly—it should say into_raw, not as_ptr. Thanks for pointing this out!