1. 85
  1.  

  2. 31

    Tl;DR; The Interface Definition Language for Operating Systems is C…. and it’s notoriously bad at being an IDL.

    Yup. Dead right.

    If you want something close to a real IDL, you probably should be looking at dbus.

    Yup. Dbus sucks, but as Poettering points out… it exists, it’s in real world practical use and has a number of other important security characteristics.

    https://archive.fosdem.org/2014/schedule/event/anatomy_of_kdbus/attachments/slides/460/export/events/attachments/anatomy_of_kdbus/slides/460/kdbus_fosdem.pdf

    1. 9

      On Linux, it’s not really C. It’s a bunch of hardcoded syscall numbers and carefully formatted structures passed in-memory, along with a bunch of prose documentation. The userspace interface to Linux is unfortunately not formally defined/described - but the interface is stable, which makes it less of a problem.

      glibc or musl or other libcs provide a C API to access Linux functionality, in the form of functions like “read” and “write”, but other languages can “use the Linux interface directly” - just make syscalls directly.

      1. 5

        Unfortunately, POSIX mandates the existence of the C itnerfaces but nothing else. Some systems, such as macOS or Solaris, don’t provide a stable system call interface, only a stable C interface. Go was bitten by this when a minor update of macOS changed the ABI of the system call used to implement gettimeofday, which the Go runtime called on startu. Go, at the time, bypassed libSystem and issued system calls directly and so every single Go program failed on startup until it was recompiled with an updated compiler.

        Even on Linux, the situation is quite painful because Linux makes the system call table part of the architecture-specific code and so the system call numbers differ between architectures and sometimes even the arguments for system calls change. This isn’t the case on *BSD (where the system call tables are all part of the architecture-agnostic code and the machine-dependent code just provides a way of getting the arguments out of the trap frame).

        1. 1

          I wrote about this: http://catern.com/linux_api.html

      2. 16

        This is a brilliant piece and answers a lot of common questions one may have when dipping their toes into the “FFI” problem!

        I’m really glad that CHICKEN compiles to C, for the reasons stated. I’ve always had a bit of an ill feeling in the pit of my stomach when seeing other FFI systems that dynamically call into libraries. There’s always this nagging feeling of “but what if the library decides to change types?” (especially for opaque typedefs, where the library explicitly reserves the right to change the type).

        In CHICKEN, you have to essentially “repeat” the types of functions when declaring them to make them known at the Scheme level (as there’s a translation that needs to be done from e.g. Scheme integers to C’s int types), but at least you can expect to get a proper warning from the C compiler when compiling the resulting code if the types are changed. In dynamic FFIs you’ll just get an error at runtime… or not… also, this depends on the platform… shudder

        But it also makes me sad, because it means it will be very hard to get out of this mess, at least for the foreseeable future where we’re all running some unix-derivative (or Windows, which has similar problems and at the same time is starting to resemble unix more every day that passes) where the C interface is the only portable interface.

        1. 3

          Oh man! I did some serious stuff with Chicken back in 2002-2003, building it into telephone exchange software so that the logic for call processing and monitoring – which uses a “thread” for the entire duration of every active call – could be written as normal sequential logic instead of as a state machine. Pretty much the same thing as node did for web back ends six or seven years later.

          I haven’t used it for a long time. But very cool.

          1. 2

            That’s awesome!

            1. 2

              It was pretty awesome. Continuations and Cheney-on-the-MTA for hundreds of thousands of stackless threads and cheap calls into the C functionality that was being stitched together. I seem to recall submitting patches to Felix to make it cheaper to do something about calling into C and then a C callback running more Scheme. Initially it was on HP/UX with HUGE L1 cache, but it was transitioning to SPARC and even some x86.

            2. 1

              That is pretty much what Erlang was created to do.

          2. 13

            Fuchsia IDL attempts to tackle this problem:

            https://fuchsia.dev/fuchsia-src/concepts/fidl/overview

            1. 12

              It’s common for capability-oriented systems to need to not use C. seL4 has an ABI, Genode has an API, and Capn Proto has an IDL.

              1. 7

                Rather than capability-oriented, I’d say well designed.

                Both being capability-centric and designing ABIs and RPC interfaces carefully are symptoms of good design.

                While I do understand copying UNIX made sense at some point, these choices are obvious choices when designing a system today.

            2. 10

              Is there a reasonable machine-readable interchange format for describing ABI/FFI of “C” libraries? If not, there should be! (I’m thinking of replacing .h with something zero-cost that’s easier to consume by other languages, not a different ABI nor universal dynamic RPC interface).

              1. 11

                GObject Introspection uses GIR, which is an XML format. It is pretty reasonable.

                1. 13

                  No XML format has ever been reasonable.

                  1. 16

                    When the alternative is depending on implementation details of C, “reasonable” becomes relative.

                    1. 3

                      DocBook is pretty reasonable…

                  2. 7

                    OpenGL, Vulkan and Wayland all use some sort of XML-based IDL type specification or another, and there are generators that consume it and output bindings in various programming languages.

                    In practice, the OpenGL one absorbs a ton of assumptions about how C works and embeds that in its functions, types, etc. Vulkan does a much better job of actually being language-agnostic, Wayland idk about.

                    It’s certainly not a standard, but they’re real examples of such a thing that are widely used.

                    1. 4

                      Wayland is a very different beast in this regard; the protocol is defined in terms of messages sent over a local socket (possibly using SCM_RIGHTS to transfer file descriptors); the IDL doesn’t have to describe C ABI anything, just message formats.

                      Way back I messed around with hooking into the IDL with Go, and it wasn’t too bad: https://github.com/zenhack/go.wayland

                      (though note that I never really finished the project, and it’s quite possible somebody else has done this and actually made it usable; I wouldn’t actually try to use that library in its current state).

                      1. 2

                        The same is true of X11 (at least for the last decade or so). The protocol is defined by some XML files and XCB is mostly machine-generated C that serialises and deserialises based on these definitions, as is the code for talking the protocol in the server. You can use the same XML to generate X11 client (or server) in another language.

                        1. 1

                          Indeed – I was drawing a contrast with OpenGL/Vulkan, but Wayland is in some sense a more “normal” use of an IDL.

                        2. 1

                          …the protocol is defined in terms of messages sent over a local socket…

                          I am in a cheeky mood, so I will point out this could be considered a degenerate form of ABI. It’s all about programs being able to talk to other programs, right? All the rest is an implementation detail!

                    2. 14

                      Important context: Author is involved in the development of Rust.

                      1. 28

                        Grankra (she/they, by the way, for the other replies), has also been involved in the development of Swift. For example, they have written about the different approaches that Rust and Swift take towards ABI (e.g. how Swift makes different tradeoffs than Rust to make sure it can stay ABI stable in the face of generics).

                        https://gankra.github.io/blah/swift-abi/

                        1. 19

                          He does have a point… an OS shouldn’t be bound to a language, so it’s ABI should be well defined.

                          And if you look at the very very long history of security flaws at the OS syscall design level…. Poettering has a point. An ABI designed as a security barrier is a very different thing to a C function signature.

                          1. 4

                            He does have a point… an OS shouldn’t be bound to a language, so it’s ABI should be well defined.

                            Yes. But there’s no changing existing systems. This is a whole system re-architect.

                            I understand Fuchsia was designed with this in mind from the start. They’re not the first to attempt this either.

                            1. 4

                              Umm.

                              https://www.freedesktop.org/wiki/Software/systemd/kdbus/

                              ps: I’ll add dbus sucks. But it proves the point that something good could be retrofitted, and it has the one true excellent redeeming feature…

                              It exists, it works.

                              1. 4

                                While I find these quite unrelated to the topic, I do absolutely welcome efforts to make Linux IPC suck less.

                          2. 1

                            Wanna spend a few words explaining why that is important context?

                          3. 11

                            I ran out of patience to finish reading the article due to the exasperated tone throughout and the author’s steadfast insistence that they know better than decades of system programmers that came before them, but I’ll state what I think should be the obvious:

                            If you want to build something on a system with a legacy design, you shouldn’t be too surprised that you have to use legacy tools and interfaces to get the job done.

                            The author states several times that the people who wrote C and C-based OSes made bad choices and invented bad designs, which is simply not true. Those people were not idiots, they were designing things according to existing constraints, concerns, and goals all colored by the state of the art at the time. Where the state of the art was generally, “Oh, you want that program written for this computer to run on that other computer too? Have fun re-writing the whole thing from scratch!”

                            The person who wrote this article is missing entire decades of context, and without that context, it’s very easy to dismiss mistakes in design as obvious oversights or incompetence. I look forward to the day that someone looks at the author’s code a few years from now and says, “wow, what an flaming pile of yuck!”

                            1. 12

                              Granted, hindsight is 20/20, and we should not chastise past effort just because they lacked our hindsight.

                              But.

                              Hindsight is 20/20, how about exploiting this? While it is normal for legacy systems to suck by current standard, they still suck by current standards. Insisting that we’d be nice to legacy designers turn our attention away from the fact that they lacked our hindsight. Insisting that legacy systems used to be good, hides the fact that they are now bad.

                              If we want to have a chance of disentangling ourselves from legacy crap, we first need to recognise that it is legacy crap, and build up the emotional energy necessary to take action and try & make it better. I don’t care that past giants did the best they could. The best they could is no longer enough, and we should stand on their shoulders and do better.

                              1. 4

                                The best they could is no longer enough, and we should stand on their shoulders and do better.

                                Nobody is saying we shouldn’t! The way I see it, the problems here are obvious to those paying attention. Complaining about the problems, whether the tone is dispassionate or angry, doesn’t actually help form a solution, and becoming angry over the problems helps even less by emotionally exhausting everyone.

                                As @andyc alluded to in their sibling comment, this is a common pattern in computing. Much like the C ABI creates a form of crufty legacy glue between applications, HTTP and TCP have formed a similar bottleneck in networking. Every couple weeks another internet loud-person comes to the realization in anger that the reason why so much stuff gets piped over HTTP is because it’s the least common denominator let through by middleboxes. And as much as I’m sympathetic when another person comes to this well-known conclusion in anger, it doesn’t change the reality: I can’t use SCTP because of Middleboxes; I’m stuck on port 443 because of Middleboxes; Latency is really high on my video call because I’m NATed/Middleboxes. You can be angry at the middleboxes or accept/try to work with reality, it’s your choice.

                                1. 9

                                  The way I see it, the problems here are obvious to those paying attention. Complaining about the problems, whether the tone is dispassionate or angry, doesn’t actually help form a solution

                                  Not everyone is paying attention. Complaining raises awareness, which is a necessary step towards forming a solution. If no one complains, few will ever know. If no one knows, no one will care. If no one cares, the problem does not get fixed.

                                  Important problems need to be complained about.

                                  Much like the C ABI creates a form of crufty legacy glue between applications, HTTP and TCP have formed a similar bottleneck in networking.

                                  There’s a huge difference between C and HTTP. C is basically the only way for languages to talk to each other in the same process. It’s bad and crufty and legacy, but it’s also all we have.

                                  HTTP on the other hand is not the only thing we have. We have IP. We have UDP. We have TCP. And those middle boxes are forcing me to use complex HTTP tunnels where I could have sent UDP packets instead. In many cases this kills performance to such an extent that some programs that would have been possible with UDP, simply cannot be done with the tunnel. And bottleneck wise, IP, TCP, and UDP are much narrower than HTTP.

                                  You can be angry at the middleboxes or accept/try to work with reality, it’s your choice.

                                  I’m not sure you realise how politically charged this statement is. What you just wrote suspiciously sounds like “There is no alternative”. Middle boxes aren’t like gravity. Humans put them there, and humans can remove them. If they’re a problem, complaining about them can raise awareness, and hopefully cause people to make better middle boxes.

                                  On the other hand, if everyone thinks middle boxes are “reality”, and that the only choice is to work with them, that will make it so. I can’t have that, so I’ll keep complaining whenever I have to do some base64-JSON->deflate->HTTP insanity just to copy some Protobuf from one machine to another (real story).

                                  1. 2

                                    Not everyone is paying attention. Complaining raises awareness, which is a necessary step towards forming a solution. If no one complains, few will ever know. If no one knows, no one will care. If no one cares, the problem does not get fixed.

                                    The people with the knowledge and ability to fix the situation, or to provide workarounds, are often the people who are aware of the problem. I firmly believe that the endless complaining in technical circles on the internet doesn’t actually help raise awareness to folks unaware of or uninterested in the issue; once you become aware you understand the issue fairly quickly. I’ve always viewed it as a form of venting rather than an honest attempt to fix things, all about healing the self and not about fixing the problem.

                                    Reality has a surprising amount of detail and I can guarantee you I can find a domain expert in any domain who can breathlessly fire off a list of things broken about their domain. Yet you or I who are in no position to change those things nor really have much more than a surface-level interest in them don’t need to be aware of every one of those problems. If every problem was shouted from every rooftop, I’m pretty sure humanity would go deaf.

                                    I’m not sure you realise how politically charged this statement is. What you just wrote suspiciously sounds like “There is no alternative”. Middle boxes aren’t like gravity. Humans put them there, and humans can remove them. If they’re a problem, complaining about them can raise awareness, and hopefully cause people to make better middle boxes.

                                    I don’t mean to draw any parallel to world politics, though humans being human there will always be overlap. That being said, I think understanding why it’s not trivial to remove middleboxes from the equation is a very important part of understanding the problem here and exactly why I find so many rants unhelpful. The reality is that hardware manufacturers are trying to cut costs and hire cheap, understaffed development teams who make crappy middleboxes which are then used by ISPs who will attempt to use a middlebox forever until it either breaks or them or someone threatens them with legal action, because margins are so low. This is exacerbated by the ecosystem of ISPs in an area. There’s more, it’s a complicated topic, but all of that gets lost if you’re angrily ranting. It’s helpful to understand the incentives/problems that created this broken state so we don’t inadvertently create another set of broken incentives when the time/opportunity comes to fix them. In networking that time is around the corner as QUIC/HTTP3 is increasingly being proposed as the way forward to allow the sorts of applications that the old Internet envisioned. Understanding the problem well here is key so we don’t run into yet more ossification.

                                    On the other hand, if everyone thinks middle boxes are “reality”, and that the only choice is to work with them, that will make it so. I can’t have that, so I’ll keep complaining whenever I have to do some base64-JSON->deflate->HTTP insanity just to copy some Protobuf from one machine to another (real story).

                                    Accepting that something is “reality” doesn’t stop folks from trying to improve the situation. You don’t constantly need to beat the drum of how broken something is just to fix it. Accepting reality is to also empathize with the past decisions that brought us into the current state. For example, I think IPv6 should be adopted by everyone and everywhere; NAT is a silly crutch stopping middleboxes from having to leave IPv4 addresses (and raising the value of existing blocks owned by certain entities, but I digress.) But writing a long, angry rant about how NAT sucks doesn’t help anyone; it doesn’t help my family overcome their NATs nor does it help me come up with a less complicated network topology. In the meantime Wireguard, Zerotier, and Yggdrassil are taking matters into their own hands and helping bring the full internet back despite middleboxes. That doesn’t mean I’ll ever stop pushing netops to support IPv6 nor will I stop pushing netops to let non-TCP and non-UDP traffic through their middleboxes. But there’s something to be said about actually solving a problem and not just complaining about it. In fact, I’d say that trying to solve a problem despite the broken state of the problem is perhaps the strongest statement on how broken things are. “Look at thing!” I say, “it sucks so much I had to route around it”.

                                    Having said that, I run up against my own screed. Programmers love to complain and rant, moreso than any other domain that I’m familiar with. I need to accept the reality that this hasn’t changed in the past and will not change going forward. Still, I voice my opinion from time-to-time about the fact. Overall I’m happy that this site has a rant tag because I can filter out the rants from my headspace and only view them when I want to (like now.)

                                    1. 3

                                      The people with the knowledge and ability to fix the situation, or to provide workarounds, are often the people who are aware of the problem.

                                      Political leaders are often the only folks who can make short term decisions on foreign policy. Yet their decisions are often influenced by what they believe their people will think of their decision. If they anticipate that a given decision will be unpopular, they are more likely to not do it. Thus, discussing foreign policy in a bar or on online forums does influence foreign policy. The effect is very very diffuse, but it’s real.

                                      People with knowledge and ability to fix the situation, if they’re not psychopaths, are likely to empathise with whatever they believe the “normies” would feel about it, making it a similar situation to politicians.

                                      Accepting that something is “reality” doesn’t stop folks from trying to improve the situation.

                                      The choice of words is important. You didn’t just say “accept reality”, you also said “work with reality”, which generally implies not only accepting what reality is, but also accepting that you’re powerless to change it. Directed at someone else, it also tend to chastise them for being idealistic fools.

                                      1. 1

                                        Thus, discussing foreign policy in a bar or on online forums does influence foreign policy. The effect is very very diffuse, but it’s real.

                                        This is where we disagree. You think it’s real but I think it’s not. I think the world is full of people being unhappy by things and without a concerted political front you’ll just be that person on their soapbox ranting at crowds; the silent majority ignores the soapbox ranter. Anyway this is straying out of technology into politics so I’ll stop here.

                                        Directed at someone else, it also tend to chastise them for being idealistic fools.

                                        Fools no, but idealistic, yes. I know that’s anathema here on Lobsters where everyone wants to resonate with their code and have their personal values reflected in and pushed by their work, but I’m comfortable with that not being the case for myself. I’m very happy not having opinions about most things and accepting that there’s a Chesterton’s Fence to most issues in reality.

                                  2. 2

                                    Yes definitely, I agree we should try to do better but not denigrate the work of the past …

                                    Although in thinking about this more, I think there is a pretty important difference between networking in software. The incentives are mixed in both cases, but I’d say:

                                    • In networking the goal is to interoperate … so people make it happen, even the companies trying to make money.
                                    • In software interoperability is often an anti-goal. There is a big incentive to create walled gardens

                                    But yeah overall I really hope everyone writing software thinks about the system as a whole, the ecosystem, and how to interoperate. Not just the concerns of their immediate work

                                2. 11

                                  The person who wrote this article is missing entire decades of context, and without that context, it’s very easy to dismiss mistakes in design as obvious oversights or incompetence. I look forward to the day that someone looks at the author’s code a few years from now and says, “wow, what an flaming pile of yuck!”

                                  I can’t express enough how much I agree with this. C was designed for a specific problem and solved it well. Now, 40 years, later people complain about its deficiencies, yet barely question the fact that we (as software developers) haven’t come up with any usable and widely accepted alternative to binary interfaces. Apparently this industry isn’t as innovative as it likes to perceive itself…

                                  1. 5

                                    Yes. Today we have byte addressable 2s complement machines, but back when C was first designed? There were computers with addressable units from 9 to 66 bits in size and the C compilers that K&R put out were retargetted (by others) for such machines. By the time 1989 rolled around, the standards committee didn’t want to break any existing C code, so we got the standard we got. It was a different time back then.

                                3. 3

                                  I’m trying to materially improve the conditions of using literally any language other than C.

                                  My problem is that C was elevated to a role of prestige and power, its reign so absolute and eternal that it has completely distorted the way we speak to each other.

                                  This is highly related to my recent post, A Sketch of the Biggest Idea in Software Architecture

                                  In those terms:

                                  • Various C ABIs are haphazardly defined and evolved. But they are implicit narrow waists because so much code is written in C. Practically speaking to solve an interoperability problem between Rust and Python, or Swift and Ruby, you will need to speak some kind of C API or C ABI.
                                  • Type systems have limitations when your program is written in multiple languages!
                                    • Here’s the kicker: ALL programs that are not written in C depend on the semantics of multiple languages! Because the kernel interface is specified in C on both Unix and Windows as a bunch of header files. (If your program doesn’t use the kernel at all, then you’re exempted from this … it’s also likely not a very interesting program :-) )
                                    • If you want things to look like Swift or Rust functions, you will be sorely disappointed when doing runtime composition with C code.
                                    • runtime composition with ABIs gives you bad error messages
                                    • build time composition is also hard because you need a whole C compiler …

                                  So basically we need a precisely specified narrow waist, not a haphazardly evolved one … But we will still need narrow waist. You inherently have an O(N x N) explosion of languages trying to talk to each other.

                                  I find that programmers misunderstand this a lot… they think because a Python functions have roughly similar syntax to a Rust function, or to C functions, it should be “easy” to interoperate. But the semantics are completely different … The lowest common denominator ends up being very low, all though you can special case for instances where you’re only passing integers, etc. If you’re trying to pass big heap allocated structures then you have a big problem !

                                  i.e. basically the lowest common denominator ends up more like message passing and IPC. It doesn’t look like RPC very much

                                  (copy of Reddit comment)

                                  1. 3

                                    Because the kernel interface is specified in C on both Unix and Windows as a bunch of header files.

                                    Linux at least defines system calls in terms of numbers and buffers. I have no doubt that they were heavily influenced by C, but how we make a Linux system call is now independent from C.

                                    I’m not sure Windows even has a stable kernel ABI. Instead we’re supposed to link to system dlls.

                                    So basically we need a precisely specified narrow waist, not a haphazardly evolved one

                                    Oh yes we do. Also it would be so nice for this waist to be narrower still: while it’s nice to have one ABI per platform, it’s a bummer when we have over 170 platforms…

                                    1. 2

                                      Yes Linux maintains a stable kernel ABI, unlike the BSDs. Windows does too BTW – you can run 30 year old binaries on modern Windows machines.

                                      I didn’t fully explain it, but that is what I meant by these parts:

                                      http://www.oilshell.org/blog/2022/03/backlog-arch.html#characteristics-of-narrow-waists

                                      POSIX C APIs → Linux x86 ABI

                                      The narrow waist was supposed to be C, but that doesn’t help other languages. And C isn’t defined unless you add in the ABI emitted by the C compiler for a particular architecture, so we got the Linux x86 ABI as our de facto standard / narrow waist for a large class of apps.


                                      Justification:

                                      • Windows emulates Linux with WSL
                                      • FreeBSD and Illumos also emulate Linux:

                                      Another example is when Illumos borrowed FreeBSD’s Linux syscall ABI emulation in order to run user-uploaded Docker containers. This is dynamic, runtime composition with ABIs, not static composition by compiling code against kernel APIs expressed as C header files.

                                      On the other hand, Win32 is becoming a de facto narrow waist for GUI apps on Linux, also mentioned in the post.


                                      So I think I need to coin a term like “sloppy waist” or “messy waist” … Something we didn’t design, but just arose through “Emulation of Waists”.

                                      1. 5

                                        Narrow waist vs free as in beer belly.

                                        1. 2

                                          Ha, a beer belly could be a fitting image … although I think it is better to avoid any body connotations – the narrow waist is for an hourglass, not a person :-)

                                          A lot of the “accidental” waists tend to come from whatever companies happen to be successful, e.g. Microsoft and Intel.

                                          I think a funny thing is that sometimes those companies do have clean slate efforts. I think Intel’s Itanium was supposed to fix a lot of problems with the ISA … but they probably got killed by their more successful older brother.

                                        2. 2

                                          Nitpick: The Windows kernel ABI as in syscalls was historically at least, not stable - but the userland libraries were.

                                          1. 2

                                            Yes Linux maintains a stable kernel ABI, unlike the BSDs. Windows does too BTW – you can run 30 year old binaries on modern Windows machines.

                                            Windows does have a stable ABI for sure, I just heard it didn’t have a stable kernel ABI. That the stable layer was a little bit higher, and was provided by system DLLs instead of the kernel directly. But that was a nitpick anyway. What really matters is we have a stable ABI somewhere.

                                            1. 1

                                              “accidental waist”?