1. 8
  1. 5

    A real problem with string_view is that by design it isn’t memory safe: you can return one from a function, store one in a strict, or whatever and it will silently carry a pointer to its backing store, whether or not the backing store is kept alive.

    Personally I think it’s a foot gun of an API given how easy it is to use it in a way that results in UAF bugs.

    1. 4

      This feels slightly unfair to call out string_view, iterators have all the same problems, do you consider them a foot gun of an api too? It is a foot gun, but it’s not any more of a foot gun than char * and maybe only slightly more of a foot gun than string&.

      And it at least has the advantage of consistently being a non-owning reference to the backing store, something that could go either way with char *.

      1. 3

        In the context of C and C++, string_view is an improvement over alternatives. However, Rust has shown the value of statically guaranteeing no dangling pointers. Rust’s &str (the equivalent of C++‘s string_view) can’t point to a string which is no longer valid to reference. That’s pretty great!

        1. 2

          Sure. And if the original post had said something like “it’s too bad C++ doesn’t allow string_view to be memory safe by design.” I probably wouldn’t have made an objection. This is a “real problem” with vast swaths of C++ apis though, it is not notable or exceptional that string_view suffers from it.

          1. 2

            Sure, in the context of C++, it’s not notable that string_view has this problem, but in the broader context of this set of competing languages (C, C++, Rust), it’s worth noting when a new API is offering less than state of the art guarantees.

            1. 1

              But by focusing on the api the implication is the specific api is flawed as opposed to the language. No new C++ api will ever be “state of the art” without a language change. We’re not talking about adding C++’s string_view to Rust where it would make sense to criticize the api for not being memory safe, where the option to do better exists.

          2. 2

            I’m curious about how this works. If in Rust, I have a reference counted object A, that points to a reference-counted object B, that has a string field, and I use &str to take a reference to a substring of that string field. I then pass A and the substring to a function, which follows the reference from A to B and truncates the string, how does Rust enforce the bounds checks? If I completely invalidate the range that the substring covers, what happens when I try to use it?

            I believe the answer is that the Rust borrow checker would prevent objects reachable from A from being mutated during the function that has the substring reference and so I can’t express the above idiom at all. In C++, there is no such restriction and you get all of the power, flexibility, and the footguns that accompany this.

            1. 2

              I think (and could local Rust experts please correct me if I am wrong; @matklad, @kornel) Rust will not let you both have a mut reference from A to B and an &str to substring in B since that would mean you have two references and one of them is mut. You could have a non-mut reference from A to B and an &str substring to B but that means you wouldn’t be able to modify B in your function. But I could be complete wrong.

              1. 3

                Yeah, this is mostly correct, with the asterisk that a) what Rust really tracks is aliasing, rather than mutability, and it is possible to use interior mutability to mutate aliased data and b) you can use unsafe and raw pointers to gain all power, flexibility, and even more footguns.

                Couple of examples here:

                https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=88c9e646c242c510d4fc8a52871cc323

          3. 3

            I consider atring_view to be a new API introduced well into the period of knowing that dangling pointers are bad. I couple that with it behaving as a value object rather than iterators that have pointer semantics in my belief that string_view is bad.

            Iterators date to the earliest days of c++ and have a well understood pile of issues that make them not memory safe (the for(:) using iterators removes a lot of safety checks and introduce vulnerabilities from previously safe code with indices)

            1. 2

              Modern C++ APIs are not designed to eliminate memory safety bugs, they are designed to permit coding styles that eliminate pointer safety bugs. Combined with the lifetime annotation on accessors, a compiler can warn if you have locally created a string view that outlives the object from which it was created. A coding convention that says not to capture string views (pass them down the stack only) then gives you most of the lifetime safety that you need.

              The corner case is if the object that the string view references is reachable from another object and the called code mutates it in such a way that it invalidates the string view. For string views derived from something like std::string, this could be avoided by having the string view reference the string object and its bounds but that adds performance overhead and would reduce adoption and it cannot work with string views derived from C strings: One of the big benefits that you get from string views is the ability to provide safe implementations of APIs that are exposed to C and take C string arguments and can also be used from C++ with string objects.

              1. 1

                But new APIs should be designed such that they make it very easy - to the extent I’ve seen multiple blogposts and mailing list comments on the memory unsafely of string_view when used in a way that looks reasonable. If a new object is added that is trivially unsafe if moved, copied, or otherwise stored is introduced, then that object should not support copy or move.

                One of the more common errors in llvm is storing the semantically equivalent StringRef, which often works fine due to other objects keeping buffers live long enough. Except when they don’t, and then you get crashes or worse not crashes but whatever happens to be present at the time.

                It is not reasonable to add new APIs to C++ where code that looks correct is not memory safe.

                1. 2

                  But new APIs should be designed such that they make it very easy - to the extent I’ve seen multiple blogposts and mailing list comments on the memory unsafely of string_view when used in a way that looks reasonable. If a new object is added that is trivially unsafe if moved, copied, or otherwise stored is introduced, then that object should not support copy or move.

                  If you remove the copy constructor in string_view then you eliminate its only valid use: passing down value type down the stack to other functions for temporary delegation. The problem is that C++ doesn’t differentiate between copy and capture, so there is no way of implementing a thing that can be passed down but cannot be captured. You could force it to be passed by reference, but then you’re adding an extra indirection for a two-word object and that will hurt performance.

                  The main reason that string_view exists is to have an explicitly non-owning type that replaces passing a char* and a size_t in a way that makes it easy for one of them to be lost (for other code to assume that the char* is null terminated, for example).

                  As you say, llvm’s StringRef is analogous and any code that captures a StringRef is a bad code smell and gets caught in auditing. Prior to StringRef, code either defensively allocated std::strings all over the place and hurt performance or decided that performance was important and captured const char*s. StringRef has been a huge improvement over the prior state.

                  If you have a proposal for a performant, memory-safe, non-owning string reference type that could be implemented in C++ (even with some modest language additions, but without completely redesigning the language) then I’d be very interested to hear it because it’s something that I’d happily adopt in a bunch of codebases.

                  1. 1

                    No, if you remove copy you force pass by reference.

                    That has the benefit of meaning you can’t make a string_view outlast its lexical context, and you restrict it to what it ostensibly is: an abstraction API over the many variations on what a string is. In exchange for having to put an ampersand in you remove - or at least significantly reduce the footgun-ness of the API and potentially improve perf+code size as you can pass a pointer/reference in a single register :D

                    1. 1

                      No, if you remove copy you force pass by reference.

                      But you also allow capture by reference for lambdas, for example, so you’re still not memory safe.

                      That has the benefit of meaning you can’t make a string_view outlast its lexical context

                      True, but that just moves the problem. You can trivially make the reference outlive its lexical context, for example by lambda capture, by using std::forward_as_tuple to capture a parameter pack, and so on. Worse, the things that capture now look a lot more like things that are safe to capture and so it’s harder to spot in code review.

                      You also introduce a load of more subtle lifetime issues. By passing std::string_view by value, you can construct them in arguments from things like const char * or std::string references. This makes it very easy to gradually transition an API from one that takes const std::string& and const char * to something uniform. If std::string_view had to be passed by reference then creating a temporary std::string_view and passing it would be very dangerous because the lifetime of the temporary could very easily be shorter than the use, even in the common case where the underlying string had a longer lifetime. Auditing code for this would be much harder than spotting misuse of std::string_view.

                      If you want a safe string view, then it needs to be integrated with string memory management and prevent any mutation of a string that has outstanding string views. That would be impossible to integrate with APIs that exposed const char* parameters and so would not have the incremental migration benefits of the current design.

              2. 1

                So is your argument string_view just shouldn’t exist? “by design” implies to me you think there is an alternative design for string_view that would be memory safe.

                I couple that with it behaving as a value object rather than iterators that have pointer semantics in my belief that string_view is bad.

                string_view is tantamount to a pair of iterators, I’m not sure how that makes it a value object and iterators have pointer semantics.

                1. 1

                  My argument is that the existent of bad APIs in the past should not be used as a justification to add new unsafe APIs.

                  The fact that the implementation is/may be two pointers is irrelevant, it could also be a copy of the data, it could be a pointer and a length, it could be a unordered_map from size_t char, the implementation is entirely moot.

                  The string_view API behaves like a value type - it looks (in use) to be a memory safe value type like std::string. The iterator API is via the standard pointer operators *, ->, etc. All things that display very clearly that they are semantically only pointers to data, nothing else.

                  What matters is how a type behaves, not your thoughts about how it should be implemented.

                  Your argument is “C++ is filled with things that are unnecessarily unsafe because of backwards compatibility, therefore it is ok to add new unsafe things”, which is not compelling to me.

                  1. 2

                    Your argument is “C++ is filled with things that are unnecessarily unsafe because of backwards compatibility, therefore it is ok to add new unsafe things”, which is not compelling to me.

                    My argument is it is impossible to add a memory safe string_view to C++.

                    These things aren’t “unnecessarily” unsafe, they’re intrinsically unsafe.

                    The argument from a memory safety standpoint is “just don’t use C++”, not “don’t add things to C++”. Adding string_view does not make C++ less safe.

                    string_view can’t be a copy of the data because then it ceases to be a view, you would just use string then.

                    1. 1

                      You could make string_view not movable or copyable, and the problem is solved.

                      1. 1

                        It is? How do you implement string_view::substr? I suppose it could be substr(pos, n, LambdaTakingConstRefStrinView f) -> decltype(f(declval<const string_view&>()))

                        And

                        auto s = strdup(“ok”);
                        string_view v(s);
                        free(s);
                        f(v);
                        

                        is still obviously a UAF even with a non-copyable non-movable string_view.

                        Though I suppose to solve that problem we could again use the same continuation trick, and not make the char* ctor public:

                        string_view::with_view_of(“abc”,
                        [](const string_view& v) 
                        { f(v) });
                        
                        1. 1

                          Yes you can explicitly free the backing store but there is a world of difference between safety in the face of explicitly releasing backing buffer - what you’re suggesting is not that far from shared_ptr<std::yolo> ptr = …; ptr->~yolo();

                          The specific issue I have with string_view is the ease with which it makes UAFs possible, without apparently doing anything sus.

                          I think I said elsewhere, but LLVM has llvm::StringRef which has the same API semantics, and it is a frequent source of memory bugs - only sometimes caught during review. It has the same issues: it often appears to work due to incidental lifetime of the backing store, or the general “releasing memory doesn’t magically make the memory cease to exist” behavior that makes manual memory management so fun :)

                          1. 1

                            Sigh, why does lobsters occasionally hold off multiple days before it says “there are replies to your compelling commentary!” and make me seem obsessive.

            2. 2

              isn’t memory safe

              This criticism totally depends on what you use instead! If the alternative is a pointer or reference, they obviously have exactly the same problem – they can outlive the memory:

              use_after_free(std::to_string(42).c_str());
              

              I’ve heard this criticism before, but I don’t think it’s relevant with the right mindset: It isn’t a string replacement! As a slice data type, it’s just a fancy reference. It exists to replace references and pointers, not owned strings.

              As the article mentions, a good use is function arguments. The reason is that the caller only needs to make the memory outlive the call. This may seem like a problem if the function needs to store the string, but that’s not a (valid) use case, as then, it needs an owned string, and should therefore take an owned string.

              It is as a unifying replacement for other pointer-ish interfaces that it really shines:

              C++14:

              void foo(const std::string&);
              void foo(const char* c_str);
              void foo(const char* ptr, size_t len);
              
              void multifoo( // clang-format force line break
                  const char* a_ptr, size_t a_len, // clang-format force line break
                  const char* b_ptr, size_t b_len, // clang-format force line break
                  const char* c_ptr, size_t c_len);
              

              C++17:

              void foo(std::string_view);
              
              void multifoo(std::string_view a, std::string_view b, std::string_view c);
              
            3. 1

              Memory safety aside (pro tip: C++ is not a memory-safe language), I do love string_view. But the two annoying things about using it are:

              1. It’s not compatible with C (char*) strings, since of course it’s not null-terminated. This leads to impedance mismatch whenever you find yourself needing to pass one to a C API, or the all-too-many C++ libraries that still take char* parameters.
              2. It’s not even compatible with all of std! There are a number of functions in the standard library, even as of C++20, that only take std::string const& and not string_view.