In the future, you can find out information like this in a quicker way by grepping through /usr/include and reading the definition of errno and/or reading glibc source.
If you want to understand libc, I’d recommend you steer away from glibc (it’s a torturous mess of indirection, macros, and incomprehensibility) and instead read musl or a *BSD libc which are much easier to grok.
I agree that glibc is really tough to follow… But if you want to know how this behaves for your system, then you have to read glibc, not musl. And it may even tell you interesting things. For errno, for example, even if we restrict to just Linux on x86_64, it works differently in different places. Follow the breadcrumbs, and you’ll eventually find the SYSCALL_SET_ERRNO macro. And we see that there’s a differenterrno in different contexts: the dynamic linker uses its own copy, which does not appear to be thread-local; the C library uses the __libc_errno symbol, and other parts of the distribution (such as libpthread) use errno (though my guess is that these resolve to the same address most of the time), which are at known offsets from the thread-local-storage base register. This suggests that dlopen (which is largely implemented in dynamic linker code) doesn’t set errno if it fails? Now I feel like testing this… I wouldn’t have wondered if I hadn’t actually gone through my own system’s code.
It’s not necessarily clear from header files alone. For example stuff gets weird with vDSO and address space mapping. Also the thread local variable stuff gets confusing if you’re not familiar with the details. But yes, you are right in theory.
What I don’t understand is why everyone should have to go through this trouble (which isn’t all that complicated in the end, I realise), instead of this being upfront in documentation/man pages?
cppreference.com is your friend here. It’s the best resource for reading stuff from the C and C++ standards. The actual standards documents are a tough slog.
As for Linux man pages, it seems to be pretty clear about it (although this one is for C99, not C11).
errno is defined by the ISO C standard to be a modifiable lvalue of type int, and must not be explicitly declared; errno may be a macro. errno is thread-local; setting it in one thread does not affect its value in any other thread.
That doesn’t tell you how it’s implemented. There are at least three plausible ways of implementing it given that description:
Have the kernel return unambiguous success or errno values and have libc maintain errno.
Have the VDSO expose an initial-exec thread-local variable and have the kernel always write the errno value at that offset from the userspace thread pointer (this could also be done in a completely ad-hoc way, without the VDSO).
Have a system call that allows a userspace thread to specify its errno location and have the kernel write into that.
It happens that most (all?) *NIX systems, including Linux, pick the first option from this list. If I were designing a POSIX system today, I’d be somewhat tempted by option 2 so that the system calls could implement the POSIX semantics directly even without libc, at the cost of one extra copyout per failed system call. The main down side is that system calls would then have no mechanism for reporting failure as a result of the thread pointer being invalid, but signals can handle that kind of everything-is-broken failure.
True, the documentation doesn’t say anthing about implementation (thankfully, at least in the case of the C standard), but as I understood the OP the question was about whether errno is kernel-based or libc-based in general. Given the fact that it is documented as part of the C standard that should be a big clue that it is libc-based. On the systems I support it can only be libc based because there is no operating system.
If the OP question was really about whether errno is libc or kernel based on Linux, then there is some room for ambiguity. Perhaps the article should have phrased the question better.
but as I understood the OP the question was about whether errno is kernel-based or libc-based in general. Given the fact that it is documented as part of the C standard that should be a big clue that it is libc-based
Why? Signals are part of the C standard, but are implemented in the kernel on most *NIX systems, for example. The POSIX standard doesn’t differentiate between kernel and libc functionality at all: it is defined in terms of C interfaces, but some things are implemented in the kernel and some in libc. It’s entirely reasonable to ask what the division of responsibilities between kernel and libc is for any part of the C or POSIX standard, particularly a part that is set on system call returns.
On the systems I support it can only be libc based because there is no operating system.
That doesn’t mean that file I/O is a purely libc service in a hosted environment, yet it is also specified in the C standard.
When I was working on a toy-kernel, my Idea was that syscalls would return carry-zero for success and a opaque handle on error with the carry bit set.
You could interrogate the kernel and vDSO to learn more, so finding out if you can retry would be relatively simple and fast, stored in the vDSO, but you could get stack traces over the various nanokernel services that were touched and tell the user what went wrong; (in pseudocode)
let result: SyscallResult = syscall_open_file("/etc/passwd");
if result.carry_bit() {
if vdso_err_retryable(result) {
goto retry;
} else {
panic("could not read file: {reason}\n{stacktrace}",
reason = syscall_err_message(result),
stacktrace = syscall_err_stacktrace(result)
);
}
}
let file_handle: FileHandle = result.cast();
goto do_stuff;
I keep pondering reaching LLVM about the carry-bit-on-failure calling convention. I think it would be a nice way of implementing lightweight exceptions: set carry on exception return and implement exceptions in the caller as branch-on-carry to the unwind handler. You’d get one extra branch per call, but in exchange for that you don’t need an unwind library.
The extra branch per call is virtually free if you branch to the error case and the error is rare (and it should be). Both on big OoO super scalar and small in order microarchs.
Also you shouldn’t place a subroutine call in your hot loop 😇.
I don’t think Herb proposed a calling convention in that document (it’s purely C++, which regards the ABI as a separable concern). I did discuss this as a possibility with him around the time that he wrote that though.
I thought that one of the aspects of mailing list style code reviews is that the reviewers are trying the code locally, and aren’t just commenting, but also running it and participating on the changes.
But maybe someone else who has more experience with mailing list style code reviews can comment on that?
Thanks for checking it out! I added some metadata to link to repo and stuff. Opting not to directly link to table of contents cuz mdbook already generates that stuff in the sidebar
I might have to try it out just to be able to search the whole diff. GitHub “helpfully” collapsing the file with all of the changes makes it easy to miss the most important parts of a change.
However reading the README it seems that whitespace separates comments? Or am I just not understanding their wording? I very frequently leave multi-paragraph comments.
Another thing I would miss is screenshots and videos for UI review. It would be cool if you could put a markdown link to a local file and it would be uploaded.
Multi-paragraph comments are ok. Whitespace in the quoted (diff) part of the file delineates scoped comments. You could ignore this feature altogether and still get by IMO.
I tried it on a PR with many comments (https://github.com/llvm/llvm-project/pull/72714) but got an error from GitHub (sorry, not saved locally). Unfortunately the error message isn’t clear which chunk is corrupted. I ended up copying the comments to the web UI…
Yes, it could be a shell script! That’s how most of the “vmtest” predecessors in the kernel-y space work. But they suffer from fragility and maintenance issues. For example triply escaped shell strings you would need to pass to QEMU. A binary is more heavy weight for sure but there are ways to make it smoother (eg cargo-dist).
Although note that use of QGA in this problem space is new AFAIK.
I do something similar for testing rust-fuse. The integration tests must run as root and I don’t want to worry about having test state stick around between runs, so those tests are executed under QEMU.
The implementation is a bit different though – the guest filesystem is minimal (basically just /sbin/init in an initrd) and there’s no filesystem sharing. The test runs are hermetic, deterministic, and reproducible.
From the post, it seems like sharing the host filesystem is an important design goal of vmtest, which doesn’t seem like a benefit to me. Keeping the environments separate allows the guest and host OSes to be be decoupled. I can do development on my Linux workstation or macOS laptop, and can test both the Linux and FreeBSD implementations of FUSE in the same test run.
Being able to test different OSes (and different versions of those OSes) can discover bugs in unexpected places, for example in the FreeBSD kernel:
I’m also not really sure what the purpose is of allowing a test to access the terminal or test configuration file. That just seems like more opportunity for non-determinism to slip in.
Thank you for the perspective! Rootfs sharing is only done for kernel targets — vmtest also supports running standard qcow2 images for more deterministic / hermetic setups. In image targets, only the directory rooted at vmtest.toml is shared into the guest at /mnt/vmtest (to make it easier to move things in and out).
I have not tried non-linux guest OSes for vmtest, but I suspect it might work with a little bit of work. I’ve sent some patches to qemu-guest-agent before and from I can tell they take cross platform support seriously (eg windows is supported too). I suspect the same might apply for host OS as well.
I used something similar to this for testing Linux kernel changes. It was great for rapid iteration. It took 8 seconds to boot the new kernel, with bash as init any my host filesystem mounter read only as the guest’s root. I could build my tests outside and then run them in both host and guest and make sure that bugs were fixed and so on. I definitely wouldn’t want to use something like that for userspace development (including FUSE/CUSE drivers) where I’d want to avoid any host state leaking. Similarly, I wouldn’t want to use it for CI, where having a fully deterministic build environment is crucial.
Out of curiosity, how would vmtest affect determinism in CI? For github actions at least, each new job already runs in a fresh VM. So anything that runs inside the top level VM (vmtest being the nested VM) already possess a degree of determinism, right? (Ignoring the usual sources of non-determinism like fetching packages)
Firecracker only supports Linux hosts, and it requires KVM. Both of those make sense given its design goals, but for use in development and CI they can be limiting.
QEMU runs pretty much anywhere, and it doesn’t have a hard requirement on host CPU features that might be absent when the host itself is a VM.
No particular reason other than QEMU is mature and available everywhere. Firecracker would be a good backend to explore. Unclear if it has all the functionality we need from skimming the docs.
In the future, you can find out information like this in a quicker way by grepping through /usr/include and reading the definition of
errnoand/or reading glibc source.If you want to understand libc, I’d recommend you steer away from glibc (it’s a torturous mess of indirection, macros, and incomprehensibility) and instead read musl or a *BSD libc which are much easier to grok.
I agree that glibc is really tough to follow… But if you want to know how this behaves for your system, then you have to read glibc, not musl. And it may even tell you interesting things. For
errno, for example, even if we restrict to just Linux on x86_64, it works differently in different places. Follow the breadcrumbs, and you’ll eventually find theSYSCALL_SET_ERRNOmacro. And we see that there’s a differenterrnoin different contexts: the dynamic linker uses its own copy, which does not appear to be thread-local; the C library uses the__libc_errnosymbol, and other parts of the distribution (such as libpthread) useerrno(though my guess is that these resolve to the same address most of the time), which are at known offsets from the thread-local-storage base register. This suggests thatdlopen(which is largely implemented in dynamic linker code) doesn’t seterrnoif it fails? Now I feel like testing this… I wouldn’t have wondered if I hadn’t actually gone through my own system’s code.It’s not necessarily clear from header files alone. For example stuff gets weird with vDSO and address space mapping. Also the thread local variable stuff gets confusing if you’re not familiar with the details. But yes, you are right in theory.
What I don’t understand is why everyone should have to go through this trouble (which isn’t all that complicated in the end, I realise), instead of this being upfront in documentation/man pages?
cppreference.com is your friend here. It’s the best resource for reading stuff from the C and C++ standards. The actual standards documents are a tough slog.
As for Linux man pages, it seems to be pretty clear about it (although this one is for C99, not C11).
That doesn’t tell you how it’s implemented. There are at least three plausible ways of implementing it given that description:
It happens that most (all?) *NIX systems, including Linux, pick the first option from this list. If I were designing a POSIX system today, I’d be somewhat tempted by option 2 so that the system calls could implement the POSIX semantics directly even without libc, at the cost of one extra
copyoutper failed system call. The main down side is that system calls would then have no mechanism for reporting failure as a result of the thread pointer being invalid, but signals can handle that kind of everything-is-broken failure.True, the documentation doesn’t say anthing about implementation (thankfully, at least in the case of the C standard), but as I understood the OP the question was about whether
errnois kernel-based or libc-based in general. Given the fact that it is documented as part of the C standard that should be a big clue that it is libc-based. On the systems I support it can only be libc based because there is no operating system.If the OP question was really about whether
errnois libc or kernel based on Linux, then there is some room for ambiguity. Perhaps the article should have phrased the question better.Why? Signals are part of the C standard, but are implemented in the kernel on most *NIX systems, for example. The POSIX standard doesn’t differentiate between kernel and libc functionality at all: it is defined in terms of C interfaces, but some things are implemented in the kernel and some in libc. It’s entirely reasonable to ask what the division of responsibilities between kernel and libc is for any part of the C or POSIX standard, particularly a part that is set on system call returns.
That doesn’t mean that file I/O is a purely libc service in a hosted environment, yet it is also specified in the C standard.
When I was working on a toy-kernel, my Idea was that syscalls would return carry-zero for success and a opaque handle on error with the carry bit set.
You could interrogate the kernel and vDSO to learn more, so finding out if you can retry would be relatively simple and fast, stored in the vDSO, but you could get stack traces over the various nanokernel services that were touched and tell the user what went wrong; (in pseudocode)
I keep pondering reaching LLVM about the carry-bit-on-failure calling convention. I think it would be a nice way of implementing lightweight exceptions: set carry on exception return and implement exceptions in the caller as branch-on-carry to the unwind handler. You’d get one extra branch per call, but in exchange for that you don’t need an unwind library.
This calling convention for exceptions was proposed for C++ by Herb Sutter.
The extra branch per call is virtually free if you branch to the error case and the error is rare (and it should be). Both on big OoO super scalar and small in order microarchs.
Also you shouldn’t place a subroutine call in your hot loop 😇.
I don’t think Herb proposed a calling convention in that document (it’s purely C++, which regards the ABI as a separable concern). I did discuss this as a possibility with him around the time that he wrote that though.
See top of page 17.
Some manual pages do in fact talk about it in more detail; e.g., errno(3C).
I thought that one of the aspects of mailing list style code reviews is that the reviewers are trying the code locally, and aren’t just commenting, but also running it and participating on the changes.
But maybe someone else who has more experience with mailing list style code reviews can comment on that?
I think that’s doable even with GH PR reviews.
gh pr checkoutfor example does this. FWIW,prralso supports this viaprr apply ...I was aware of
gh pr checkout, I didn’t catch the detail thatprr applyexists though. Thanks for pointing it out.Hi, interesting tool! Minor suggestion: add links to the repository and other relevant pages in Introduction, Installation, Releases, Contributing.
Thanks for checking it out! I added some metadata to link to repo and stuff. Opting not to directly link to table of contents cuz mdbook already generates that stuff in the sidebar
I might have to try it out just to be able to search the whole diff. GitHub “helpfully” collapsing the file with all of the changes makes it easy to miss the most important parts of a change.
However reading the README it seems that whitespace separates comments? Or am I just not understanding their wording? I very frequently leave multi-paragraph comments.
Another thing I would miss is screenshots and videos for UI review. It would be cool if you could put a markdown link to a local file and it would be uploaded.
Multi-paragraph comments are ok. Whitespace in the quoted (diff) part of the file delineates scoped comments. You could ignore this feature altogether and still get by IMO.
prr is really cool!
I tried it on a PR with many comments (https://github.com/llvm/llvm-project/pull/72714) but got an error from GitHub (sorry, not saved locally). Unfortunately the error message isn’t clear which chunk is corrupted. I ended up copying the comments to the web UI…
Ouch, sorry. If you manage to figure out how to reproduce it please lmk.
I had a similar experience a while ago (frustrating, I know) which led to https://github.com/danobi/prr/commit/ba0a41ac5580ba18db2052ac971ed378949f9378. Sounds like it didn’t catch your error though.
The drop reasons is a good addition to the kernel, but I find them still to be too-fine grained / some missing when debugging packet drops.
A shameless plug - we’ve recently added the drop reasons to pwru - https://github.com/cilium/pwru/blob/main/demo.gif.
Nice that it’s using kprobe multi now :). Last time I tried it took a few minutes to attach all the probes (and also detach when I ctrl-C’d it)
Looking at vmtest, it appears to be a Rust program that parses a config file, has a little UI, spawns QEMU, and communicates with the QEMU Guest Agent
https://github.com/danobi/vmtest/tree/master/src
https://wiki.qemu.org/Features/GuestAgent
I wonder if it could just be a shell script? Then you don’t have to worry about distributing binaries.
The QGA stuff might be hard in a shell script, but there’s probably a way.
In any case, it’s a nice example of how to run QEMU in CI ! Which I’ve been very close to doing
Yes, it could be a shell script! That’s how most of the “vmtest” predecessors in the kernel-y space work. But they suffer from fragility and maintenance issues. For example triply escaped shell strings you would need to pass to QEMU. A binary is more heavy weight for sure but there are ways to make it smoother (eg cargo-dist).
Although note that use of QGA in this problem space is new AFAIK.
I do something similar for testing rust-fuse. The integration tests must run as root and I don’t want to worry about having test state stick around between runs, so those tests are executed under QEMU.
The implementation is a bit different though – the guest filesystem is minimal (basically just
/sbin/initin an initrd) and there’s no filesystem sharing. The test runs are hermetic, deterministic, and reproducible.From the post, it seems like sharing the host filesystem is an important design goal of
vmtest, which doesn’t seem like a benefit to me. Keeping the environments separate allows the guest and host OSes to be be decoupled. I can do development on my Linux workstation or macOS laptop, and can test both the Linux and FreeBSD implementations of FUSE in the same test run.Being able to test different OSes (and different versions of those OSes) can discover bugs in unexpected places, for example in the FreeBSD kernel:
I’m also not really sure what the purpose is of allowing a test to access the terminal or test configuration file. That just seems like more opportunity for non-determinism to slip in.
Thank you for the perspective! Rootfs sharing is only done for
kerneltargets —vmtestalso supports running standard qcow2 images for more deterministic / hermetic setups. Inimagetargets, only the directory rooted atvmtest.tomlis shared into the guest at/mnt/vmtest(to make it easier to move things in and out).I have not tried non-linux guest OSes for
vmtest, but I suspect it might work with a little bit of work. I’ve sent some patches to qemu-guest-agent before and from I can tell they take cross platform support seriously (eg windows is supported too). I suspect the same might apply for host OS as well.I used something similar to this for testing Linux kernel changes. It was great for rapid iteration. It took 8 seconds to boot the new kernel, with bash as init any my host filesystem mounter read only as the guest’s root. I could build my tests outside and then run them in both host and guest and make sure that bugs were fixed and so on. I definitely wouldn’t want to use something like that for userspace development (including FUSE/CUSE drivers) where I’d want to avoid any host state leaking. Similarly, I wouldn’t want to use it for CI, where having a fully deterministic build environment is crucial.
Out of curiosity, how would
vmtestaffect determinism in CI? For github actions at least, each new job already runs in a fresh VM. So anything that runs inside the top level VM (vmtest being the nested VM) already possess a degree of determinism, right? (Ignoring the usual sources of non-determinism like fetching packages)Out of curiosity, why use qemu instead of firecracker?
Firecracker only supports Linux hosts, and it requires KVM. Both of those make sense given its design goals, but for use in development and CI they can be limiting.
QEMU runs pretty much anywhere, and it doesn’t have a hard requirement on host CPU features that might be absent when the host itself is a VM.
No particular reason other than QEMU is mature and available everywhere. Firecracker would be a good backend to explore. Unclear if it has all the functionality we need from skimming the docs.