It was all worth it. … Or. That’s what I keep telling myself. Because it doesn’t feel like it was worth it. It feels like I wasted a lot of my life achieving something so ridiculously basic that it’s almost laughable.
I wonder if their efforts would be better spent on a different language that is more welcoming to features many programmers consider basic in 2022. They also write:
It has only just occurred to me now,
that every single paper I have merged into C or C++ has been exclusively at the behest of other people and I have merged exactly 0 papers that have served my own needs.
I need to be a lot more fucking selfish.
C++26 and C2y/C3a era is gonna be on Demon Time. No more of this happy nice let’s-get-along shit.
I hope “demon time” brings improvements to the C ABI, and I hope we can one day throw out the technical debt we’ve accumulated over the decades in exchange for new languages and new ABIs. Carbon looks really appealing, especially if they go the route of “your C++ is busted and you must change it to conform to our new practices/API/ABI”.
I hope “demon time” brings improvements to the C ABI, and I hope we can one day throw out the technical debt we’ve accumulated over the decades in exchange for new languages and new ABIs.
There are a lot of good changes in the most recent standard (I’m particularly pleased with the removal of K&R prototypes), and a lot of other good proposals on the table – but I don’t think that this is one of them.
The C ABI things pointed out here are a non-problem. C already doesn’t guarantee ABI, and the healthy variety of libc implementations out there means that people tend to not rely on the ABI that deeply anyways.
The C++ ABI issues involve a number of subtle properties that are effectively guaranteed by the standard, such as reference stability in the standard hash tables or std::string copying in regex. Adding new aliasing options doesn’t fix code that depends on this.
The proposed solution makes things worse – it doesn’t consider changes to types, the interactions between returned types and different libraries, function pointers, and so on, and it has the effect of effectively baking in old ABIs in a way that needs to be maintained forever.
It makes the ABI situation worse, not better.
What we need is for people to get used to rebuilding their code, and possibly some standardized package management solutions around that. Whole builds need to learn to segregate by ABI for this to work.
A cross ABI build is a cross compilation, and needs to be treated that way.
(There has been talk in the C++ committee around package managers, and this is, in my opinion, the place to fix ABI.)
I wonder if their efforts would be better spent on a different language that is more welcoming to features many programmers consider basic in 2022.
I know Go and Rust probably natively support this but it’s not very common. The two languages I’ve used in production where I’ve wanted this are Haskell (which requires Template Haskell) and Java/Kotlin (which can’t even get close except there are build tools which generate special class files with byte arrays).
Not sure I’d call it “basic in 2022” but I wish it was.
extremely well written. i understand getting hyperfixating on something and dragging it across the finish line - it’s something i’ve done at work before. what is born from passion and dragged long enough becomes a burden. thanks for writing this, and thanks for going through all of this.
Why make this a preprocessor feature? To me it sounds like a job for the linker — “please resolve this symbol to the contents of this binary file.”
Putting it in the preprocessor means the binary file is going to be converted to a huge ASCII list of comma-separated numbers, then parsed back down to a byte array, then written to the .o file. I’m sure that expansion can be cleverly optimized away with some work, but why is it even there?
the whole point of this feature is so you don’t have to parse it - if you read the article, the author says he has to convince compiler authors that “a sufficiently clever compiler” is never going to be faster than copying a file. The intention is for implementors to turn this into some platform-specific linker directive.
Well, any linker could support it without the help of the C standard, which pretty much only covers compilation. But having it dealt with by the preprocessor allows the compiler to be more intelligent about optimization and the like. I imagine that if it were a feature of the linker, the C standard would be hesitant to require that the C code could obtain the object’s size, for example, fearing that some linkers wouldn’t be able to easily provide more than a bare pointer. If it’s in the preprocessor, the compiler already knows it’s size (if the programmer wants that).
To add to a good point, the compiler knows the size (very useful for some optimisations), and also the contents. The constexpr evaluations shown in the article aren’t possible otherwise.
I didn’t really follow the standardisation process but IMHO this is, if not the, at least a correct answer. The linker approach is available in some compilers (if not all – I’ve “just included binary data” via the linker script countless times) but all it gives you is the equivalent of a void * (or, optimistically, a char *). The compiler doesn’t know anything about it. Just as bad, linters and other static analysis tools don’t know anything about it. If you want to do anything with that data short of sending it to a dumb display or whatever, the only appropriate comment above the code that does something with it is /* yolo */.
Yes, that’s what I meant by “cleverly optimized away with some work”. But it’s architecturally ugly — it means the preprocessor is overstepping its bounds and passing stuff to the parser that isn’t source code.
Obviously, it does this to communicate line and file information to the user for the most part. But I don’t see why it would be a much more severe layering violation to make #embed "foo.svg" preprocess to something like # embed "/path/to/foo.svg", which the compiler can then interpret, if the preprocessor already produces non-standard C with the expectation that the compiler supports the necessary extensions.
The advantage is that since it uses the same syntax as C23, it is easier to switch to using the compiler: just remove the Cedro pragma #pragma Cedro 1.0 #embed and it will compile as C23.
The pain in the author’s voice is disheartening.
I wonder if their efforts would be better spent on a different language that is more welcoming to features many programmers consider basic in 2022. They also write:
I hope “demon time” brings improvements to the C ABI, and I hope we can one day throw out the technical debt we’ve accumulated over the decades in exchange for new languages and new ABIs. Carbon looks really appealing, especially if they go the route of “your C++ is busted and you must change it to conform to our new practices/API/ABI”.
There are a lot of good changes in the most recent standard (I’m particularly pleased with the removal of K&R prototypes), and a lot of other good proposals on the table – but I don’t think that this is one of them.
The C ABI things pointed out here are a non-problem. C already doesn’t guarantee ABI, and the healthy variety of libc implementations out there means that people tend to not rely on the ABI that deeply anyways.
The C++ ABI issues involve a number of subtle properties that are effectively guaranteed by the standard, such as reference stability in the standard hash tables or std::string copying in regex. Adding new aliasing options doesn’t fix code that depends on this.
The proposed solution makes things worse – it doesn’t consider changes to types, the interactions between returned types and different libraries, function pointers, and so on, and it has the effect of effectively baking in old ABIs in a way that needs to be maintained forever.
It makes the ABI situation worse, not better.
What we need is for people to get used to rebuilding their code, and possibly some standardized package management solutions around that. Whole builds need to learn to segregate by ABI for this to work.
A cross ABI build is a cross compilation, and needs to be treated that way.
(There has been talk in the C++ committee around package managers, and this is, in my opinion, the place to fix ABI.)
Probably not on demon time, there are now new kids ready to do happy nice let’s-get-along shit. The question is, in which language.
I know Go and Rust probably natively support this but it’s not very common. The two languages I’ve used in production where I’ve wanted this are Haskell (which requires Template Haskell) and Java/Kotlin (which can’t even get close except there are build tools which generate special class files with byte arrays).
Not sure I’d call it “basic in 2022” but I wish it was.
extremely well written. i understand getting hyperfixating on something and dragging it across the finish line - it’s something i’ve done at work before. what is born from passion and dragged long enough becomes a burden. thanks for writing this, and thanks for going through all of this.
Why make this a preprocessor feature? To me it sounds like a job for the linker — “please resolve this symbol to the contents of this binary file.”
Putting it in the preprocessor means the binary file is going to be converted to a huge ASCII list of comma-separated numbers, then parsed back down to a byte array, then written to the .o file. I’m sure that expansion can be cleverly optimized away with some work, but why is it even there?
the whole point of this feature is so you don’t have to parse it - if you read the article, the author says he has to convince compiler authors that “a sufficiently clever compiler” is never going to be faster than copying a file. The intention is for implementors to turn this into some platform-specific linker directive.
Well, any linker could support it without the help of the C standard, which pretty much only covers compilation. But having it dealt with by the preprocessor allows the compiler to be more intelligent about optimization and the like. I imagine that if it were a feature of the linker, the C standard would be hesitant to require that the C code could obtain the object’s size, for example, fearing that some linkers wouldn’t be able to easily provide more than a bare pointer. If it’s in the preprocessor, the compiler already knows it’s size (if the programmer wants that).
To add to a good point, the compiler knows the size (very useful for some optimisations), and also the contents. The constexpr evaluations shown in the article aren’t possible otherwise.
Even beyond optimization, being able to do sizeof(embedded_thing) is very useful!
I didn’t really follow the standardisation process but IMHO this is, if not the, at least a correct answer. The linker approach is available in some compilers (if not all – I’ve “just included binary data” via the linker script countless times) but all it gives you is the equivalent of a
void *
(or, optimistically, achar *
). The compiler doesn’t know anything about it. Just as bad, linters and other static analysis tools don’t know anything about it. If you want to do anything with that data short of sending it to a dumb display or whatever, the only appropriate comment above the code that does something with it is/* yolo */
.The article mentions the “as if” rule in the standard. The compiler has to behave “as if” it did this, but it doesn’t actually have to do this.
Yes, that’s what I meant by “cleverly optimized away with some work”. But it’s architecturally ugly — it means the preprocessor is overstepping its bounds and passing stuff to the parser that isn’t source code.
The preprocessor already produces non-standard C. Just running
echo hello | cpp
gives you the following output:Obviously, it does this to communicate line and file information to the user for the most part. But I don’t see why it would be a much more severe layering violation to make
#embed "foo.svg"
preprocess to something like# embed "/path/to/foo.svg"
, which the compiler can then interpret, if the preprocessor already produces non-standard C with the expectation that the compiler supports the necessary extensions.One of the compilers uses __builtin_string_embed(“base64==”), which does parse as valid source code.
What an achievement!
I can only hope for someday to get my code accepted in such a critical piece of the world!
Does it feel amazing personally? Dl
If you want to start playing with this now, my C preprocessor Cedro (2021-08-12, 2022-04-27) will insert the byte literals for the compiler:
https://sentido-labs.com/en/library/cedro/202106171400/#binary-include
That’s the same that
xxd
does, for instance.The advantage is that since it uses the same syntax as C23, it is easier to switch to using the compiler: just remove the Cedro pragma
#pragma Cedro 1.0 #embed
and it will compile as C23.The source code (Apache 2.0) and GitHub link are at: https://sentido-labs.com/en/library/
This seems pretty useful. For games, just being able to slap binary files in your app without odd tricks can be very helpful.
Just unfortunate that the author had to go through so many hurdles to get something so basic in the language.