1. 32

This explains the error handling philosophy that the sled database project has arrived at after years of introducing and fixing error handling-related bugs in a large correctness-critical Rust project.

It has become quite popular in the Rust community to use a single global enum that stores all possible error types that may be encountered in a codebase. This approach is quite error prone, and an alternative approach is presented which has significantly reduced the frequency of issues related to error handling for sled.

  1.  

  2. 2

    Nicely written.

    I’m not clear on why CAS would return an error when another thread wins the race to upfate rather than the traditional Boolean true (current thread won) or false (another thread won).

    Another thing I’m not clear on is whether you can enforce that the inner Result with the local error is not accidentally ignored? With a linting rule?

    1. 2

      sled’s CAS operation works on a higher level vector-like unbounded sequence of bytes, rather than a single 8-byte value. We often want to have access to 2 things when a failure happens:

      1. the owned proposed value that we failed to set
      2. the current actual value, which we failed to correctly guess

      A failed CAS often results in a loop where we try again, so by giving us back our failed proposed value and allowing us to properly know the current value, it makes it cheaper for us to loop around and try again with less effort.

      It might be expensive to reconstruct the attempted value, but by giving it back, we prevent workloads that need to retry from paying for another allocation, as well as the deallocation of the failed attempted value. By giving it back, we are able to significantly improve performance during retries, which reduces the chance that additional contention will be encountered.

      1. 1

        Thanks! It hadn’t occured to me that holding onto the attempted value would be interestingly expensive.

        1. 2

          I missed your second question, but the inner result also throws a warning if it is not used:

          https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=17036ea79b2ba7d73330f8f4bcfd2ba2

          This is due to the must_use attribute on the Result enum definition:

          #[derive(Copy, PartialEq, PartialOrd, Eq, Ord, Debug, Hash)]
          #[must_use = "this `Result` may be an `Err` variant, which should be handled"]
          #[rustc_diagnostic_item = "result_type"]
          #[stable(feature = "rust1", since = "1.0.0")]
          pub enum Result<T, E> {
              /// Contains the success value
              #[stable(feature = "rust1", since = "1.0.0")]
              Ok(#[stable(feature = "rust1", since = "1.0.0")] T),
          
              /// Contains the error value
              #[stable(feature = "rust1", since = "1.0.0")]
              Err(#[stable(feature = "rust1", since = "1.0.0")] E),
          }
          
          1. 1

            Nice! Thank you. <3

    2. 1

      This is an insightful post, thanks. I’ve recently toyed with Rust again, and how to do error handling seems to be something that’s not entirely clear – are we supposed to write From instances/implementations for external-crate errors that we have to use/surface? Is it sensible to add one of the heavyweight error handling crates like fehler?

      I’ve stuck to doing simple/explicit error handling and converting between types by hand when necessary, to enrich errors.

      1. 2

        The main issue with From impls on error types from other crates is that it creates a public dependency on that other crate, because now you have a type from the other crate in your public API. A public dependency means that if the dependency makes a semver release (0.0.x -> 0.0.(x+1), 0.x.y -> 0.(x+1).y or x.y.z -> (x+1).y.z) and you upgrade, then you must also make a semver release. If your dependency is already a public dependency, then maybe a From impl is fine. Otherwise, yes, use map_err or other helper functions to explicitly convert your errors.

        My advice:

        1. If you’re building an application, and like most applications you just need to quit and/or log an error if one occurs, then using Box<dyn Error> or anyhow is just fine.
        2. If you’re building a library that you want other people to use, then write your own error types. Don’t bring in a helper crate because it increases compile times, increases your dependency tree and just generally isn’t worth it.
        3. If you’re building “internal” libraries, then do (1) or (2). If writing out Error and Display impls is deeply offensive, then use thiserror.
        1. 1

          For reasons I go into in the post, From should never be implemented for concerns that need to be handled in different ways. A lot of these error handling crates encourage people to convert all of their errors into one giant enum, and this is absolutely a mistake if these errors require separate handling strategies. I have never found an error handling crate that actually improves my ability to handle errors, and that’s the whole point of error handling. The error-handling crates in the ecosystem tend to be amazing for not thinking about errors at all, but this is a mistake for real software, as the quoted paper that the article starts with demonstrates.

          The good thing about most of these libraries is that they attach a backtrace to the error in the From impl on the way to the misuse-prone big-ball-of-mud enum that people tend to create. You can pretty easily write a generic struct Bt<RealError>(RealError, BackTrace); with a simple From<T> for Bt<T> that gives you an ergonomic way of making those where you’re relying on try in some places.