I wouldn’t normally post video content featuring myself but this video was particularly well received.
Since it was published, people pointed out two mistakes I made:
Go has been able for a while now to export dynamic libraries. My knowledge was from before that time and I also got confused, thinking that you could not export C ABI functions at all, while in fact you can. That said, having a runtime still makes Go not a viable C replacement in the most direct sense of the expression.
Zig used to only support pointer arithmetic by converting the pointer to an int, applying the operation, and then converting it back to a pointer. Since a few months ago, [*]T (and related) started supporting arithmetic. That’s a pointer type that you don’t touch directly often, as you normally would use a slice (ptr + len).
having a runtime still makes Go not a viable C replacement
What you mean is that Go has a garbage collector.
C has a runtime. It is called “the C runtime” and is traditionally abbreviated as “crt”. On Linux systems with GCC installed, there are files named “crt*.o” somewhere under /usr/lib that are part of the C runtime. This is distinct from and in addition to the standard C library (libc). If I compile the C program “int main() { return 0; }” using GCC, then I get about 2K of code and data, even though I’m not calling any library functions. This 2K of stuff comes from the C runtime. [However, note that I’m producing a dynamically linked executable. If I try using ‘gcc -static’ then I get an executable with 780K of code (it looks like glibc), and I don’t know how to make that smaller.]
Rust also has a runtime, even though the Rust-lang.org home page claims that it does not! If I compile the rust program “fn main() {}” (which references no library functions) then I get a static executable that is over 300K, and that’s due to the Rust runtime. Supposedly most of this is due to the standard panic handler. Here is some documentation about the Rust runtime: https://doc.rust-lang.org/reference/runtime.html, which says that the panic handler is part of the Rust runtime.
Zig seems like the best choice if you want to build static executables with a minimal runtime. I compiled “pub fn main() !void {}”, and got a static executable with 660K of code and data. Twice the size of the corresponding Rust executable. A lot of this runtime code seems to involve runtime safety checks and a panic handler. If I rebuild using ReleaseFast then I get 190K of code, which again includes a panic handler. If I rebuild with “zig build -Doptimize=ReleaseSmall” then I get a much smaller static executable with only 6K of code. I don’t know how to make C static executables this small (on Linux).
I believe he was more interested in whatever tricks the Catholic Church was pulling to keep the Word of God from everyone. So quite apropos to the comment from our calvin.
yeah, I really don’t understand people that think it makes sense to downplay the fact that a language like Go can pause execution, realloc an entire green stack somewhere else and fixup all pointers, while being really fixated on crt.
This answer is like “dry cleaning actually uses liquids”. You’re correct in the strict sense, but also ignoring everything people mean by “having a runtime” in the common-yet-imprecise sense.
Runtimes of C and Rust (and probably Zig’s too, although I’m unsure about their async) are relatively small, non-invasive, and play nicely with other runtimes in the same process. These languages can produce static libraries that are easily usable in programs written in other languages. That’s generally not the case in languages that are said to “have a runtime”, in the sense that the runtime is substantially larger, more involved in execution of the program, and may cause problems if it’s not the only runtime in the process (e.g. if it needs to control all I/O, or track every pointer).
Rust also has a runtime, even though the Rust-lang.org home page claims that it does not! If I compile the rust program “fn main() {}” (which references no library functions) then I get a static executable that is over 300K, and that’s due to the Rust runtime.
That’s due to the std library, which is linked by default if you’re compiling for a hosted target. It’s not part of the Rust language, which is why people say Rust doesn’t have a runtime.
A Rust program that just prints hello world is about 9K:
The Rust Runtime Environment is entirely optional. You can in-fact compile a Rust program that does not reference any of std, alloc or core. You will be stuck with a very restricted environment (similar to what happens if you do this in C).
It should also be noted that when you simply compile a Rust program, the stdlib isn’t LTO optimized or otherwise shrunk down (loadbearing * here). You can disable that and only bring what you need. You can also disable the startup wrapper which handles some early init stuff, you can remove the default panic handler entirely and even disable the OOM handler.
Additionally, running in no-core mode will require you to implement a few core constructs the compiler is looking for yourself, since you’ll be missing quite literally everything that holds rust together (such as operators).
C has a runtime. It is called “the C runtime” and is traditionally abbreviated as “crt”. On Linux systems with GCC installed, there are files named “crt*.o” somewhere under /usr/lib that are part of the C runtime. This is distinct from and in addition to the standard C library (libc). If I compile the C program “int main() { return 0; }” using GCC, then I get about 2K of code and data, even though I’m not calling any library functions. This 2K of stuff comes from the C runtime. [However, note that I’m producing a dynamically linked executable. If I try using ‘gcc -static’ then I get an executable with 780K of code (it looks like glibc), and I don’t know how to make that smaller.]
It sounds like you are describing gcc, not C in general.
Windows also has a C runtime – and worse, it was not distributed with the operating system!
It was called msvcrt.dll as far as I remember – Microsoft Visual Studio C runtime. I remember you had to copy it around to get some programs to work.
This was over 15 years ago – not sure what the situation is like today.
edit: To clarify, C does have a runtime, but you don’t have to use it. Kernels and Windows user space don’t, but Linux user space does.
The Windows kernel doesn’t use the C runtime, as far as I know.
Many/most Windows user space apps don’t use the C runtime. They use C APIs provided by Windows.
But portable ANSI C applications often use the C Runtime DLL I mentioned.
The Linux kernel doesn’t use the C runtime.
For example, printf() is part of the C runtime, but the kernel doesn’t use it. It has its own string formatting routines.
Linux user space uses the C runtime – that’s what GNU is – most apps and CLI tools on Linux link with GNU libc, etc. Or musl libc.
Another consideration is that modern malloc()s have a lot in common with garbage collectors. They have some unpredictable performance characteristics, similar to a GC runtime.
Recognizing that this is a tangent, how many C compilers are in your dataset to judge what’s “typical”? Do clang, tcc, kenc, cproc, lacc, and scc do this too?
I don’t know about hobbyist C compilers, but what do you think calls main? A crt0.o that contains things like _start is pretty standard for mainstream Unix C compilers.
I’m not sure what you’re trying to say here. Are you implying that e.g. tcc, kenc, cproc don’t have any startup or exit code?
C programs expect to get argv, have stdout, and atexit working. These things are part of C and its standard library, and compilers need to insert code that makes them work.
I’m asking if @calvin is referring to the two mainstream Unix C compilers, gcc and clang, in which case his statement could be strengthened from “pretty standard” to “universal.”
I’m describing the situation when you use C to write programs that run under an operating system like DOS, Linux, Windows, MacOS, etc. If your C program has a function called int main(int argc, char **argv), then there is a C runtime that provides the operating system entry point and calls main. The ISO C standard calls this a “hosted execution environment”. The situation where C might not have a runtime is called a “freestanding environment”.
Zig used to only support pointer arithmetic by converting the pointer to an int, applying the operation, and then converting it back to a pointer.
Does zig not model pointer provenance than? There has been a lot of discussion in the rust community how about pointer/int cast break the compilers ability to reason about provenance. As I understand it, you can have either pointer/int casts or pointer provenance, but not both.
As I understand it, you can have either pointer/int casts or pointer provenance, but not both.
That is not quite the case. There are several proposed provenance models for Rust, C and C++, all of which have to have some way to deal with integer/pointer casts.
tl;dr there are two categories of approaches: “PNVI” (provenance not via integers) and “PVI” (provenance via integers).
In PVI, integers carry provenance. So if you cast a pointer to an int and then back, you retain the entire chain of custody and the resulting pointer works exactly as the original did. However you now run into some tricky questions like “What is the provenance of a + b? Or a ^ b?”. This is (from what I can tell, but I’m no expert on this) what CHERI C does: Their uintptr_t retains provenance, and they make some choices about what that means for integer math.
In PNVI, integers do not carry provenance, so you need another way to make pointer->int->pointer casts work. The current favorite seems to be “PNVI-ae”, for “address exposed”, where casting a pointer to an int leaks its provenance to “the environment”. Casting back to a pointer will look for an exposed provenance for that address and if there is one, you get that (and an invalid pointer otherwise).
This avoids the tricky questions of PVI and allows a bunch of patterns that PVI doesn’t, such as the infamous XOR-linked-list. However, it is also extremely painful and slow to implement on platforms that physically manifest provenance such as CHERI.
For regular optimizing compilers however, it’s not a big problem: their aliasing/provenance analysis already has to be able to cope with pointers that “escape” their analysis horizon, be it due to FFI, inline asm or (and yes, this is literally called out as allowed by the C standard) round-tripping pointers through the filesystem via fprintf("%p") and fscanf("%p"). PNVI-ae at worst inhibits their ability to optimize code that does a bunch of int/pointer casts.
Now for Zig, if they adopt these rules as-is, this might mean a more substantial pessimization if int/pointer casts are more idiomatic and used in many more places. If more or less every pointer’s address is escaped, you do lose most of the optimizations allowed by provenance.
Their uintptr_t retains provenance, and they make some choices about what that means for integer math.
Note for those unfamiliar with CHERI: this means uintptr_t is 128 bits large when holding a 64 bit address.
Also, for languages that don’t need to be backward-compatible with code that casts freely between ints and pointers, there is the option to disallow int->ptr casts completely, and instead require supplying provenance explicitly when you want to construct a pointer from an integer. E.g. new_ptr_with_derived_provenance = old_ptr.new_from_address(some_int).
in the video Loris notes that go programmers have used zig as their C compiler for cross compilation (which is great), I’ve also used cargo-zigbuild to do the same with rust and it’s excellent, so much easier than trying to get a cross toolchain set up correctly
edit: maybe I should be patient… Loris brought it up about 30 seconds after I hit send on this post
I’m a Rust programmer who’s been interested for a while now in trying out Zig, and this discussion definitely makes me more interested in doing so. The comptime stuff is really fascinating to me, and I also want to get a visceral sense of what design decisions a systems language might’ve made differently than Rust.
I wouldn’t normally post video content featuring myself but this video was particularly well received.
Since it was published, people pointed out two mistakes I made:
Go has been able for a while now to export dynamic libraries. My knowledge was from before that time and I also got confused, thinking that you could not export C ABI functions at all, while in fact you can. That said, having a runtime still makes Go not a viable C replacement in the most direct sense of the expression.
Zig used to only support pointer arithmetic by converting the pointer to an int, applying the operation, and then converting it back to a pointer. Since a few months ago,
[*]T
(and related) started supporting arithmetic. That’s a pointer type that you don’t touch directly often, as you normally would use a slice (ptr + len).What you mean is that Go has a garbage collector.
C has a runtime. It is called “the C runtime” and is traditionally abbreviated as “crt”. On Linux systems with GCC installed, there are files named “crt*.o” somewhere under /usr/lib that are part of the C runtime. This is distinct from and in addition to the standard C library (libc). If I compile the C program “int main() { return 0; }” using GCC, then I get about 2K of code and data, even though I’m not calling any library functions. This 2K of stuff comes from the C runtime. [However, note that I’m producing a dynamically linked executable. If I try using ‘gcc -static’ then I get an executable with 780K of code (it looks like glibc), and I don’t know how to make that smaller.]
Rust also has a runtime, even though the Rust-lang.org home page claims that it does not! If I compile the rust program “fn main() {}” (which references no library functions) then I get a static executable that is over 300K, and that’s due to the Rust runtime. Supposedly most of this is due to the standard panic handler. Here is some documentation about the Rust runtime: https://doc.rust-lang.org/reference/runtime.html, which says that the panic handler is part of the Rust runtime.
Zig seems like the best choice if you want to build static executables with a minimal runtime. I compiled “pub fn main() !void {}”, and got a static executable with 660K of code and data. Twice the size of the corresponding Rust executable. A lot of this runtime code seems to involve runtime safety checks and a panic handler. If I rebuild using ReleaseFast then I get 190K of code, which again includes a panic handler. If I rebuild with “zig build -Doptimize=ReleaseSmall” then I get a much smaller static executable with only 6K of code. I don’t know how to make C static executables this small (on Linux).
The greatest trick Unix ever pulled was convincing its programmers C doesn’t have a runtime.
I wasn’t sure you were THE Calvin, until now.
“The”?
There was a famous Calvin who spent a lot of time worrying about what tricks God was pulling.
I believe he was more interested in whatever tricks the Catholic Church was pulling to keep the Word of God from everyone. So quite apropos to the comment from our calvin.
Wait, does that mean Hobbes is here too?!
It also has green threads and a multiplexed IO library.
C only really lacks a runtime when it is used in freestanding mode, when there’s no stdio nor stdlib: no allocator, no signals, no main(), no exit().
yeah, I really don’t understand people that think it makes sense to downplay the fact that a language like Go can pause execution, realloc an entire green stack somewhere else and fixup all pointers, while being really fixated on crt.
[Comment removed by author]
This answer is like “dry cleaning actually uses liquids”. You’re correct in the strict sense, but also ignoring everything people mean by “having a runtime” in the common-yet-imprecise sense.
Runtimes of C and Rust (and probably Zig’s too, although I’m unsure about their async) are relatively small, non-invasive, and play nicely with other runtimes in the same process. These languages can produce static libraries that are easily usable in programs written in other languages. That’s generally not the case in languages that are said to “have a runtime”, in the sense that the runtime is substantially larger, more involved in execution of the program, and may cause problems if it’s not the only runtime in the process (e.g. if it needs to control all I/O, or track every pointer).
That’s due to the
std
library, which is linked by default if you’re compiling for a hosted target. It’s not part of the Rust language, which is why people say Rust doesn’t have a runtime.A Rust program that just prints hello world is about 9K:
The Rust Runtime Environment is entirely optional. You can in-fact compile a Rust program that does not reference any of std, alloc or core. You will be stuck with a very restricted environment (similar to what happens if you do this in C).
It should also be noted that when you simply compile a Rust program, the stdlib isn’t LTO optimized or otherwise shrunk down (loadbearing * here). You can disable that and only bring what you need. You can also disable the startup wrapper which handles some early init stuff, you can remove the default panic handler entirely and even disable the OOM handler.
Additionally, running in no-core mode will require you to implement a few core constructs the compiler is looking for yourself, since you’ll be missing quite literally everything that holds rust together (such as operators).
tcc is pretty good at producing small executables from C code
It sounds like you are describing gcc, not C in general.
Windows also has a C runtime – and worse, it was not distributed with the operating system!
It was called
msvcrt.dll
as far as I remember – Microsoft Visual Studio C runtime. I remember you had to copy it around to get some programs to work.This was over 15 years ago – not sure what the situation is like today.
edit: To clarify, C does have a runtime, but you don’t have to use it. Kernels and Windows user space don’t, but Linux user space does.
printf()
is part of the C runtime, but the kernel doesn’t use it. It has its own string formatting routines.Another consideration is that modern malloc()s have a lot in common with garbage collectors. They have some unpredictable performance characteristics, similar to a GC runtime.
That’s a fairly typical way for C compilers to link in the startup and exit code.
Recognizing that this is a tangent, how many C compilers are in your dataset to judge what’s “typical”? Do clang, tcc, kenc, cproc, lacc, and scc do this too?
I don’t know about hobbyist C compilers, but what do you think calls
main
? Acrt0.o
that contains things like_start
is pretty standard for mainstream Unix C compilers.Well there are only 2 mainsteam Unix C compilers. So I guess by “pretty standard” you mean “universal”?
I’m not sure what you’re trying to say here. Are you implying that e.g. tcc, kenc, cproc don’t have any startup or exit code?
C programs expect to get
argv
, havestdout
, andatexit
working. These things are part of C and its standard library, and compilers need to insert code that makes them work.I’m asking if @calvin is referring to the two mainstream Unix C compilers, gcc and clang, in which case his statement could be strengthened from “pretty standard” to “universal.”
Well, to be more precise, the crt (aka csu, C startup) usually belongs to the platform (libc) rather than the compiler.
So what were you saying is fairly typical?
I’m describing the situation when you use C to write programs that run under an operating system like DOS, Linux, Windows, MacOS, etc. If your C program has a function called
int main(int argc, char **argv)
, then there is a C runtime that provides the operating system entry point and callsmain
. The ISO C standard calls this a “hosted execution environment”. The situation where C might not have a runtime is called a “freestanding environment”.Thanks for clarifying the meaning of “runtime;” I was not aware it included things like startup and malloc.
Does zig not model pointer provenance than? There has been a lot of discussion in the rust community how about pointer/int cast break the compilers ability to reason about provenance. As I understand it, you can have either pointer/int casts or pointer provenance, but not both.
That is not quite the case. There are several proposed provenance models for Rust, C and C++, all of which have to have some way to deal with integer/pointer casts.
This paper gives a good overview: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
tl;dr there are two categories of approaches: “PNVI” (provenance not via integers) and “PVI” (provenance via integers).
In PVI, integers carry provenance. So if you cast a pointer to an int and then back, you retain the entire chain of custody and the resulting pointer works exactly as the original did. However you now run into some tricky questions like “What is the provenance of
a + b
? Ora ^ b
?”. This is (from what I can tell, but I’m no expert on this) what CHERI C does: Theiruintptr_t
retains provenance, and they make some choices about what that means for integer math.In PNVI, integers do not carry provenance, so you need another way to make pointer->int->pointer casts work. The current favorite seems to be “PNVI-ae”, for “address exposed”, where casting a pointer to an int leaks its provenance to “the environment”. Casting back to a pointer will look for an exposed provenance for that address and if there is one, you get that (and an invalid pointer otherwise). This avoids the tricky questions of PVI and allows a bunch of patterns that PVI doesn’t, such as the infamous XOR-linked-list. However, it is also extremely painful and slow to implement on platforms that physically manifest provenance such as CHERI. For regular optimizing compilers however, it’s not a big problem: their aliasing/provenance analysis already has to be able to cope with pointers that “escape” their analysis horizon, be it due to FFI, inline asm or (and yes, this is literally called out as allowed by the C standard) round-tripping pointers through the filesystem via
fprintf("%p")
andfscanf("%p")
. PNVI-ae at worst inhibits their ability to optimize code that does a bunch of int/pointer casts.Now for Zig, if they adopt these rules as-is, this might mean a more substantial pessimization if int/pointer casts are more idiomatic and used in many more places. If more or less every pointer’s address is escaped, you do lose most of the optimizations allowed by provenance.
There is no “Zig Memory Model” yet from what I can tell, but a bunch of discussion: https://github.com/ziglang/zig/issues/6396
Note for those unfamiliar with CHERI: this means uintptr_t is 128 bits large when holding a 64 bit address.
Also, for languages that don’t need to be backward-compatible with code that casts freely between ints and pointers, there is the option to disallow int->ptr casts completely, and instead require supplying provenance explicitly when you want to construct a pointer from an integer. E.g.
new_ptr_with_derived_provenance = old_ptr.new_from_address(some_int)
.in the video Loris notes that go programmers have used zig as their C compiler for cross compilation (which is great), I’ve also used cargo-zigbuild to do the same with rust and it’s excellent, so much easier than trying to get a cross toolchain set up correctly
edit: maybe I should be patient… Loris brought it up about 30 seconds after I hit send on this post
I’m a Rust programmer who’s been interested for a while now in trying out Zig, and this discussion definitely makes me more interested in doing so. The comptime stuff is really fascinating to me, and I also want to get a visceral sense of what design decisions a systems language might’ve made differently than Rust.
tl;dw?