1. 11
  1. 3

    Traditional C++ exceptions have two main problems:

    1. the exceptions are allocated in dynamic memory

    I think this is an implementation choice. I’m pretty sure the MS ABI allocates exceptions on the stack. This comes with its own problems (your handler runs with an extended stack, and can’t reduce it until the exception goes out of scope), but carefully-written code which is aware of the mechanism shouldn’t run into problems.

    In fact, in my opinion, that’s how exceptions should be allocated. It’s a shame that Linux uses heap allocation instead.

    1. exception unwinding is effectively single-threaded, because the table driven unwinder logic used by modern C++ compilers

    Again, not a facet of the language but of the implementations. This proposal seems to be about suggesting language changes for implementation problems…

    I do agree though that std::current_exception() is problematic (and I think it probably should never have existed). But even this is really only a problem for optimisation - of a mechanism that is already supposed to be used only for exceptional situations.

    1. 7

      I think this is an implementation choice. I’m pretty sure the MS ABI allocates exceptions on the stack.

      Yes and no. It depends on what you throw, if I remember correctly. If you’re throwing a value then it will be allocated on the stack and then copied to the called stack frame. All of the unwinder state lives on the stack, including the state required for std::current_exception. If you throw by pointer then the object needs to be allocated on the heap because the address needs to be stable. One of the problems with the current C++ unwind mechanism is that it needs to handle both cases.

      I think the article is mostly talking about the unwind state (specifically, the __cxa_exception structure, in the Itanium ABI). C++11 introduced some very painful things here to allow you to partially catch an exception and then re-raise it in another thread. This requires heap allocation but, in theory at least, could be deferred until someone actually used this functionality (I presume there are users in the wild - I’ve only ever used it in a test when I implemented it).

      In fact, in my opinion, that’s how exceptions should be allocated. It’s a shame that Linux uses heap allocation instead.

      Linux adopted the Itanium ABI, which exists mostly because Itanium made it very difficult to implement a conventional setjmp / longjmp. A lot of the design decisions there were influenced by the fact that Borland had a load of patents on the Windows SEH model. I think the last of these expired ten years ago, so it should be completely safe to implement SEH-like unwinding on other platforms now.

      The Windows model is a bit more painful for the compiler. In the Itanium model, the catch blocks are just regions in the function and they run after the unwinder has transferred control back to the function containing the code, which then runs on the top of the stack. In the Windows model, these must be outlined as funclets, functions that run with a pointer to the original stack frame, on top of the stack but with access to a frame somewhere else.

      LLVM already has logic for generating funclets (I think GCC does too?) for the Windows ABI, so it might be quite easy to add a funclet-based ABI for *NIX. Statically linked things could adopt it with a compile flag, dynamically linked things would need a feature flag unless you added a fallback mechanism that used the Itanium unwinder when it found a frame with Itanium ABI things.

      Even then, that wouldn’t help with the second problem. Both Windows and Itanium unwind ABIs now use tables and so need to have some map from return address to the table for the function that you’re trying to unwind through. This needs to be safe with respect to loading and unloading libraries. The dl_iterate_phdr API on *NIX doesn’t need to acquire a single lock but it does need to guarantee that it iterates over all loaded ELF objects even if others are concurrently loaded.

      Again, not a facet of the language but of the implementations. This proposal seems to be about suggesting language changes for implementation problems…

      The language defines the space of possible implementations. One of the things I’ve been playing with recently is using the FreeBSD system call calling convention for exceptions. This uses the carry flag to differentiate between error and non-error returns. You can follow each call with a branch-on-carry, which will be statically predicted as not taken by most systems and have this fall to the error handling path. This works only if exceptions fit in the return register[s] on all platforms. Returning a single word is fine here, so if you require exceptions to be globally allocated objects or error codes, then it’s fine. You can’t do this for C++ in the general case but you could in a language that was more willing to restrict what it permits to be thrown. This is what we’re planning on doing for Verona. In LLVM IR we can model every call returning an extra i1 that tells you if the return value is the real return or an exception, use normal control flow and inlining, and then lower it to one extra instruction in the back end.

      1. 2

        If you throw by pointer then the object needs to be allocated on the heap because the address needs to be stable

        I’m not sure if I understand what you mean by “throw by pointer”. If you are referring to using std::rethrow_exception to throw via std::exception_ptr, then the implementation is allowed to throw a copy of the object instead; there’s no need to preserve the address. Perhaps I’m not understanding you correctly.

        C++11 introduced some very painful things here to allow you to partially catch an exception and then re-raise it in another thread. This requires heap allocation but

        I assume again you mean std::current_exception, std::rethrow_exception. current_exception potentially requires a heap allocation if the original exception object is stack-allocated (because if it makes a copy of the exception, it needs to store the copy somewhere), but that by itself shouldn’t make it impossible to allocate thrown exception objects on the stack. Since rethrow_exception can also make a copy, it can even move [edit: not literally move, but copy] the heap-allocated copy “back to the stack”, unless I’m missing something.

        Linux adopted the Itanium ABI

        That is, indeed, what I am complaining about :)

        I’m sure it seemed like a pragmatic choice, but in my view, it’s an unfortunate one. The fact that throwing an exception requires heap allocation (which can of course fail) is awful. Current implementations generally have a fixed-size “emergency pool” per thread which is used in case regular heap allocation fails; but of course the emergency pool can be insufficient, or can become exhausted in the presence of nested exceptions.

        The language defines the space of possible implementations

        But the complaints in this proposal don’t seem to be about the space of possible implementations, but the particulars of certain implementations. Again, unless I’m missing something, it should be possible to allocate exception objects on the stack, and the proposal itself discusses solutions to other problems (inefficient parallelism etc) which don’t require language changes.

        1. 1

          I’m not sure if I understand what you mean by “throw by pointer”.

          throw new Foo();
          

          The heap allocation isn’t part of the throw mechanism, but it still exists. In C++, catch (T) and catch (T*) are completely unrelated things. If you throw a T then it may be copied, if you throw a T* then the pointee may not be copied.

          I assume again you mean std::current_exception, std::rethrow_exception. current_exception potentially requires a heap allocation if the original exception object is stack-allocated (because if it makes a copy of the exception, it needs to store the copy somewhere), but that by itself shouldn’t make it impossible to allocate thrown exception objects on the stack. Since rethrow_exception can also make a copy, it can even move [edit: not literally move, but copy] the heap-allocated copy “back to the stack”, unless I’m missing something.

          Kind of. Thrown objects in C++ don’t need to be copyable, but they do need to be movable. This leaks into the language though in two ways. First, move constructors can have side effects and so the number of moves is observable. Second, with the Itanium ABI, there is a single copy (which is typically elided) and code is written on the assumption . The object is copied (or directly allocated) into the space returned when the ABI library is asked to allocate space for the exception (the thrown object is stored directly after the exception object). The begin-catch function returns a pointer to this, so no copy is needed for the catch. The Windows ABI works in a similar way: the thrown object is allocated on the stack and remains there as the funclets that implement the catch access it directly.

          I’m sure it seemed like a pragmatic choice, but in my view, it’s an unfortunate one. The fact that throwing an exception requires heap allocation (which can of course fail) is awful. Current implementations generally have a fixed-size “emergency pool” per thread which is used in case regular heap allocation fails; but of course the emergency pool can be insufficient, or can become exhausted in the presence of nested exceptions.

          The emergency pool is mandated by the spec. It’s a complete waste of time because if malloc fails then you may discover that the emergency buffers are overcommitted and fail. I recently added an option to libcxxrt to disable them.

          But the complaints in this proposal don’t seem to be about the space of possible implementations, but the particulars of certain implementations. Again, unless I’m missing something, it should be possible to allocate exception objects on the stack, and the proposal itself discusses solutions to other problems (inefficient parallelism etc) which don’t require language changes.

          I think part of the problem is that the C++ standard is in denial about dynamic code [un]loading. This was apparent in thread-local variables, where there’s no good way of implementing destructors for thread-local variables. Exceptions require walking the stack and finding the associated cleanup. There are two ways of doing this:

          • Maintain a stack of cleanup functions. This is what the Win32 ABI did. It incurs a (small) performance penalty on entry and exit of every try block and so means that it isn’t a ‘zero-cost’ abstraction.
          • Walk a linker data structure to find the table (or cleanup function) corresponding to the function on the stack. This requires global synchronisation with respect to library loading and unloading.

          You can somewhat mitigate the latter case with different locking policies or with lock-free data structures for loaded objects but it’s not clear whether this just reduces the problem.

          Note that extrapolating from increasing core counts is probably not a good idea. Existing cache coherency protocols struggle a lot above about 128 cores, so I’d expect cache-coherent systems with >128 cores to be rare for quite a while.

          Avoiding exceptions, or limiting what exceptions can do, would allow exception throwing to become a local problem: no stack walking and unwind, just a lightweight conditional branch on a flag.

          1. 1

            The heap allocation isn’t part of the throw mechanism, but it still exists.

            Well, ok, yes, if you explicitly perform a heap allocation then there will be a heap allocation, that’s self-evident. The thrown object in this case is really the pointer, however, and doesn’t need to be on the heap. I guess I should have said that the MS ABI allocates thrown objects on the stack rather than exceptions.

            Kind of. […]

            What you said at this point doesn’t make clear to me what part of what I wrote was only “kind of” correct, other than the move-vs-copy thing which is incidental. (Speaking of incidentals, GCC does appear to allow throwing a non-moveable-but-copyable object, although Clang doesn’t).

            All the detail about the Itanium C++ ABI: I know all this, but it has no bearing. My point is that the ABI is a bad ABI because it mandates heap allocation for thrown objects. The alternative that I’m suggesting would’ve been a better choice doesn’t require more or less copies (or moves) be performed, except in the case where std::current_exception gets used. If code exists which expects that std::current_exception and/or std::rethrow_exception don’t make copies then I’d personally be happy just call it bad code and be done with it, and anyway if we’re concerned about preserving existing code that cares about whether copies are made by current_exception / rethrow_exception then we probably can’t make the sort of changes to what exceptions can do that are being suggested in the article.

            The emergency pool is mandated by the spec. It’s a complete waste of time because if malloc fails then you may discover that the emergency buffers are overcommitted and fail.

            This is more-or-less what I had already said (though I don’t agree with your following assertion that the emergency pool is completely useless, because I’d much rather it be possible to throw exceptions in an out-of-memory situation than not. I guess bad_alloc could theoretically be handled specially so as not to require allocation but I’d still want other exception types to be able to be thrown).

            Exceptions require walking the stack and finding the associated cleanup

            […] Walk a linker data structure to find the table (or cleanup function) corresponding to the function on the stack. This requires global synchronisation with respect to library loading and unloading.

            You can somewhat mitigate the latter case with different locking policies or with lock-free data structures for loaded objects but it’s not clear whether this just reduces the problem.

            […]

            Avoiding exceptions, or limiting what exceptions can do, would allow exception throwing to become a local problem: no stack walking and unwind, just a lightweight conditional branch on a flag.

            I’m going to just sum up my thoughts:

            • “not clear whether this just reduces the problem” implies that this could be investigated, and I’d suggest it would be worth doing so before making language changes to accommodate an imagined problem
            • “the problem” already suggests there is a problem, whereas I’m inclined to feel that lock contention in the presence of exceptions being thrown just implies that exceptions are being used too heavily in the application code
            • limiting what exceptions can do in order to make it possible to optimise exception throwing/handling in the way you suggest is all well and good if you are willing to throw away a lot of existing code, though it will introduce a small run-time overhead, and probably a code-size overhead (even counting unwinding tables), for a case that current code is written to avoid (because exceptions are known to have overhead, and are generally understood to be intended for use only in exceptional circumstances). Overall, I doubt it’s worth it. I’d certainly rather see the implementations fixed first.

            What I suspect is driving the desire to optimise exception throw/catch is some people wanting to use exceptions for general control flow. I’d rather just not see them used for that.

    2. 2

      For me, an exception is used when something has happened that requires you to drop everything, salvage as much work as possible, and get out of there. It should also be happening rarely. If it is happening often, you want users and developers to take notice. In both cases an exception that is noisy and slow is a feature for me.

      If what is happening is a bad value or some other event that I should skip over and keep going, I would rather use a sentinel or flag or some other mechanism as part of the regular flow of events. In all the examples given, for example, I would be using regular control flow to handle.

      1. 4

        That’s fine, in theory, there are two related problems in C++:

        • The standard library has no mechanism other than exceptions for handling some errors.
        • The standards committee is religiously opposed to subsetting and refuses to standardise the -fno-exceptions mode and define behaviour.

        This means that you need exceptions to have a compliant implementation of the standard library and as soon as you step out of this then you lose consistent behaviour between standard library implementations.

        1. 5

          I don’t think @kghose’s comment is advocating for turning off exceptions, just for using some other error reporting mechanism in specific situations that often fail. And I agree with that.

          If your sqrt function is somehow being passed invalid parameters up to 10% of the time, and that can’t be fixed, then it shouldn’t respond to them by throwing! Instead return an NaN or optional or something.

          To respond directly to the article: if you’re throwing exceptions on all threads so much that the stack-unwind mutex becomes a bottleneck, there’s something really wrong with your code.

          1. 1

            The problem is the standard library. Exceptions are the only error handling mechanism for most of the standard library and so you either make them fast and scalable, or you give up on the standard library.

            1. 2

              Aside from out-of-memory errors, there are usually workarounds. iostream lets you turn off exceptions, optional and variant let you preflight, etc.

              And again, throwing exceptions has been slow ever since ZRO was invented. If the cost of throwing exceptions is a performance problem for your code, you’re doing something wrong.

              I do strongly think C++ exceptions need to be fixed! I just disagree that massively parallel CPUs are making the problem worse.