+1 : Pretty sure the second of those is what I enable in my Clang builds. (Don’t have my laptop handy to check.) There is one annoyance on MacOS: it leads to an undefined-symbol link error because Apple’s libc++.dylib doesn’t export some global variable used by this mode. I just define it in my source code and all’s well. If I remember I’ll update this comment tomorrow with the details.
Then it’s easy to just run the program in gdb, let it hit the assertion, and run backtrace to get:
…
#5 0x000055555574d9a7 in std::vector<int, std::allocator >::pop_back (this=this@entry=0x6070000000c0)
at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/12.2.1/../../../../include/c++/12.2.1/bits/stl_vector.h:1319
#6 0x0000555555749a1e in (anonymous namespace)::Scanner::handle_qed_keyword_token (this=,
lexer=lexer@entry=0x61b000000080, valid_symbols=) at src/scanner.cc:1540
The second case is interesting, because it depends on how std::vector is implemented.
If std::vector was {T* data, size_t size, size_t capacity}, then pop_back would likely have no undefined behaviour provided T was a trivially destructible type (a type for which no code runs on destruction). Decrementing size from 0 is (unfortunately if you ask me) well defined to wrap around, so the sanitizers would not be allowed to complain about, for all they know we might actually have expected that wrap around.
In practice, std::vector is usually implemented as {T* begin, T* end, T* end_of_capacity}, in which case forming a pointer to 1-before-begin (which would happen on an empty vector where begin == end) is by itself UB (just forming the pointer, no need to try to dereference it). I wonder why the sanitizers do not detect this.
in which case forming a pointer to 1-before-begin (which would happen on an empty vector where begin == end) is by itself UB (just forming the pointer, no need to try to dereference it). I wonder why the sanitizers do not detect this.
I think because it would be extremely expensive to check every pointer-arithmetic operation - the sanitizers (address sanitizer, at least) instead check for validity on dereference. This means they can have false negatives, of course.
The difficulty lies around knowing the bounds of the array into which a pointer points. As far as I understand ASan currently just (more-or-less) tracks which memory is allocated, not what is allocated there; for example if the memory contained a structure containing an array, the structure size (and thus the allocation size) are potentially larger than the array size, so you can’t use the allocation bounds to decide whether a pointer has moved outside the array.
Keeping track of whether memory is allocated is as simple as having a shadow-space where a bit or set of bits keeps track of the state of each byte (or slightly larger unit) of memory; tracking type as well would require a much more complicated structure, and a lot of additional instrumentation.
Yeah, that makes sense. Initially I thought you could just instrument after each pointer arithmetic operation but did not realise that the costly part is to resolve what range of memory is valid for this specific pointer.
Minor formatting nit, perhaps specific to my system: I’m reading in dark mode on a laptop, and the text boxes containing the error messages, terminal printouts, and TLA+ code were difficult to read since they rendered as light grey text on a white background. It didn’t look in sync with the rest of the blog’s theme and I’m guessing it might just be a CSS glitch. Wanted to let you know so you could have a look — it’s the sort of thing that’s difficult to find if nobody tells you about it :)
Apologies, thanks for the report! Seems the pygments highlighter isn’t playing well with whatever dark mode setting exists for this theme I’m using. I sort-of fixed the problem by highlighting all those code blocks as sh, which isn’t correct, but at least gets them readable. Given the topic of the post I really should figure out how to highlight all the code blocks on my blog with tree-sitter.
Note that for my system (W10,Firefox,Dark UI settings) your website is dark, but all console output has a white background with white-grey text, making it barely readable.
That’s interesting, I didn’t know dark UI settings were a thing in desktop browsers; thought the issue only happened on mobile. Is it some standardized parameter the browser sends that my hugo theme must be responding to?
If you open the dev tools, in Chrome’s Elements page, in the “styles” sidebar there’s a paint roller icon. Click it and it will let you override the dark mode / light mode setting in your browser so you can see what both versions look like.
Oh that’s very interesting! I will certainly need to implement the picture switching on one of my posts, which has transparent vector art I put a lot of time into that become totally invisible in dark mode: https://ahelwer.ca/post/2018-12-07-chsh/
Another useful tool for C++ is
-D_GLIBCXX_ASSERTIONS
if using libstdc++ and-D_LIBCPP_ENABLE_ASSERTIONS=1
for libcxx.I suspect it may have caught the
std::vector
UB.We enable both of those for our hardened profiles in Gentoo now.
You can also use the debug variants for both of those which are stricter but break ABI.
Ah, good old Gentoo Hardened! What kind of performance cost does that have? (Googling “_GLIBCXX_ASSERTIONS” and “Gentoo” returns surprisingly little.)
+1 : Pretty sure the second of those is what I enable in my Clang builds. (Don’t have my laptop handy to check.) There is one annoyance on MacOS: it leads to an undefined-symbol link error because Apple’s libc++.dylib doesn’t export some global variable used by this mode. I just define it in my source code and all’s well. If I remember I’ll update this comment tomorrow with the details.
Hey, you’re right that did catch it! Running the program I get:
Then it’s easy to just run the program in
gdb
, let it hit the assertion, and runbacktrace
to get:Thanks, I’ll update the post!
The second case is interesting, because it depends on how std::vector is implemented.
If std::vector was
{T* data, size_t size, size_t capacity}
, then pop_back would likely have no undefined behaviour provided T was a trivially destructible type (a type for which no code runs on destruction). Decrementing size from 0 is (unfortunately if you ask me) well defined to wrap around, so the sanitizers would not be allowed to complain about, for all they know we might actually have expected that wrap around.In practice, std::vector is usually implemented as
{T* begin, T* end, T* end_of_capacity}
, in which case forming a pointer to 1-before-begin (which would happen on an empty vector where begin == end) is by itself UB (just forming the pointer, no need to try to dereference it). I wonder why the sanitizers do not detect this.I think because it would be extremely expensive to check every pointer-arithmetic operation - the sanitizers (address sanitizer, at least) instead check for validity on dereference. This means they can have false negatives, of course.
The difficulty lies around knowing the bounds of the array into which a pointer points. As far as I understand ASan currently just (more-or-less) tracks which memory is allocated, not what is allocated there; for example if the memory contained a structure containing an array, the structure size (and thus the allocation size) are potentially larger than the array size, so you can’t use the allocation bounds to decide whether a pointer has moved outside the array.
Keeping track of whether memory is allocated is as simple as having a shadow-space where a bit or set of bits keeps track of the state of each byte (or slightly larger unit) of memory; tracking type as well would require a much more complicated structure, and a lot of additional instrumentation.
Yeah, that makes sense. Initially I thought you could just instrument after each pointer arithmetic operation but did not realise that the costly part is to resolve what range of memory is valid for this specific pointer.
It hasn’t crossed my mind that sanitisers wouldn’t have a special case for
vector
. I guess textual inclusion makes it a struct like any other…Minor formatting nit, perhaps specific to my system: I’m reading in dark mode on a laptop, and the text boxes containing the error messages, terminal printouts, and TLA+ code were difficult to read since they rendered as light grey text on a white background. It didn’t look in sync with the rest of the blog’s theme and I’m guessing it might just be a CSS glitch. Wanted to let you know so you could have a look — it’s the sort of thing that’s difficult to find if nobody tells you about it :)
Here too.
Apologies, thanks for the report! Seems the pygments highlighter isn’t playing well with whatever dark mode setting exists for this theme I’m using. I sort-of fixed the problem by highlighting all those code blocks as
sh
, which isn’t correct, but at least gets them readable. Given the topic of the post I really should figure out how to highlight all the code blocks on my blog with tree-sitter.I’d love to read a blog post about syntax highlighting the code blocks with tree-sitter :)
Bonus points if it’s tree-sitter running client side as WASM, just for the fun of it!
Note that for my system (W10,Firefox,Dark UI settings) your website is dark, but all console output has a white background with white-grey text, making it barely readable.
That’s interesting, I didn’t know dark UI settings were a thing in desktop browsers; thought the issue only happened on mobile. Is it some standardized parameter the browser sends that my hugo theme must be responding to?
Yes, it’s a CSS media query (the same family as the ones for screen width or pixel density) and its name is
prefers-color-scheme
: https://developer.mozilla.org/en-US/docs/Web/CSS/@media/prefers-color-schemeIf you open the dev tools, in Chrome’s Elements page, in the “styles” sidebar there’s a paint roller icon. Click it and it will let you override the dark mode / light mode setting in your browser so you can see what both versions look like.
Fun fact: you can use
<picture>
tags together with that CSS media query to display a different image in dark mode compared to light mode. You can see this in action on my blog, where in dark mode the graphics use off-white text over a transparent background, and in light mode they use dark grey text over a transparent background: https://predr.ag/blog/speeding-up-rust-semver-checking-by-over-2000x/Oh that’s very interesting! I will certainly need to implement the picture switching on one of my posts, which has transparent vector art I put a lot of time into that become totally invisible in dark mode: https://ahelwer.ca/post/2018-12-07-chsh/
You can do it without picture switching if you embed SVGs in the HTML. For simple line drawings with black/white strokes:
A more complicated setup for colors:
You don’t have to use CSS variables here, but I like doing it this way so I can have my whole palette in one place and use it outside SVGs as well.
I use this technique for the drawing here. I’ve also partially automated it for another website.
Good tips, I just managed to get it working by inlining the SVGs! Might play around with choosing non-black colors depending on the colorscheme too.
Changes: https://gitlab.com/ahelwer/ahelwer.gitlab.io/-/commit/c7547624c6f11e69c0a914bd29b22e9632625596
Representative blog posts:
Haha I clicked on the link, and was like “oh, very interesting minimalistic graphics!”
Then I switched to light mode and realized how much I was missing :)
Good luck, and help spread this knowledge far and wide!