1. -1

    According to section 6.2.5.12, integers are arithmetic types. This, in combination with the second rule, makes that i will now be 0.

    But the text you cite says: “If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:”. I fail to see how i; is equivalent to static i;. As far as I can tell, i; being initialized to zero just so happened to be done by the compiler and/or resident memory, conveniently, but there’s no actual guarantee of that.

    Plus i is implicitly (signed) int, so --i; is signed integer overflow, and also well into undefined behavior territory. I’d imagine a compiler at would be well in its rights to just optimize the entire function to nothing because UB occurs first thing, since once you hit UB, all bets are off.

    1. 6

      i has external linkage, and static storage duration.

      C99 6.2.2p5

      If the declaration of an identifier for an object has file scope and no storage-class specifier, its linkage is external.

      C99 6.2.4p3

      An object whose identifier is declared with external or internal linkage, or with the storage-class specifier static has static storage duration. Its lifetime is the entire execution of the program and its stored value is initialized only once, prior to program startup.

      The static storage-class specifier means that the linkage is internal or none, depending on whether the declaration is at file scope or block scope, and the storage duration is static. Objects with external linkage (for example extern int i;), or no linkage (for example static int i; at block scope), also have static storage duration.

      1. 3

        Thank you so much for taking the time to look up the relevant pieces in the standard!

        1. 5

          No problem :) I spent a long time studying these pieces of the standard when writing cproc, and I know how tricky they are.

        2. 2

          Drats, I actually got out-language-lawyered. Learned something new, cheers!

          1. 2

            Great link to an HTML version of the standard, I’ve been using the PDF and it’s much harder to navigate. I’m very impressed by your compiler too, it’s much further along than mine: https://github.com/jyn514/rcc.

            I noticed your compiler is a little inconsistent about functions without prototypes:

            $ ./cproc-qbe
            int f() { return 0; }
            int main() { f(1); }
            export
            function w $f() {
            @start.1
            @body.2
            	ret 0
            }
            <stdin>:2:17: error: too many arguments for function call
            $ ./cproc-qbe
            int f();
            int main() { return f(1); }
            export
            function w $main() {
            @start.1
            @body.2
            	%.1 =w call $f(w 1)
            	ret %.1
            }
            
            1. 3

              The difference between

              int f();
              

              and

              int f() { return 0; }
              

              is that the first declaration specifies no information about the parameters, and the second specifies that the function has no parameters. When calling a function, the number of parameters must match the number of arguments, so I believe the error message is correct here.

              C99 6.7.5.3p14

              An identifier list declares only the identifiers of the parameters of the function. An empty list in a function declarator that is part of a definition of that function specifies that the function has no parameters. The empty list in a function declarator that is not part of a definition of that function specifies that no information about the number or types of the parameters is supplied.

              C99 6.5.2.2p6

              If the number of arguments does not equal the number of parameters, the behavior is undefined.

              I’m very glad C2X is removing function definitions with identifier lists (n2432). So int f() { return 0; } will actually be the same thing as int f(void) { return 0; }.

              1. 3

                I missed 6.7, thank you! That makes things easier for me to implement I think :)

          2. 4

            I am always afraid to answer to detailed questions like these, because I am still learning a lot about C and not always sure. However, I believe that i actually has a automatic static storage duration, because it is defined outside of the scope of a function or whatsoever. This means that the variable will be persistent throughout the whole program.

            Edit: I said that i would have an automatic storage duration, but the arguments are for a static storage duration. This is what I actually meant to write. My apologies for the inconvenience.

            The second point is a good one, about which I responded earlier in a comment under my post. You depend on your compiler for that indeed. This line was written for gcc specifically however.

            1. 0

              I am always afraid to answer to detailed questions like these, because I am still learning a lot about C and not always sure.

              Very few people actually know C. Given that writing it is actually just an extremely elaborate exercise in language lawyering, it truly is a language only a lawyer could love. The worst that could happen is that you get corrected if you give a wrong response—meaning that you’d learn something from it.

              However, I believe that i actually has an automatic storage duration, because it is defined outside of the scope of a function or whatsoever. This means that the variable will be persistent throughout the whole program.

              Indeed so, that’s my understanding as well. But that also means that its initial value is indeterminate: Automatic storage donation is mutually exclusive with static storage donation. Therefore, you cannot actually get the zero-initialization you’d get from static storage duration, and instead the value is indeterminate because it has automatic storage donation.

              1. 6

                Every object declared at file scope has static storage duration. The only objects that have automatic storage duration are those declared at block scope (inside a function) that don’t have the storage-class specifier static or extern.

                1. 3

                  The worst that could happen is that you get corrected if you give a wrong response—meaning that you’d learn something from it.

                  I agree, which is why I always answer indeed. Thank you.

                  Indeed so, that’s my understanding as well. But that also means that its initial value is indeterminate: Automatic storage donation is mutually exclusive with static storage donation.

                  I changed my post above. I made a mistake while writing that, due to my lack of time. The fact that i is defined outside of the scope of a function, means that it is static, as mcf described

            1. 8

              libc

              STATUS: NOT YET

              Uh musl?

              1. 9

                There are many items that can be added here. For example for the base system there’s sbase from suckless; I have no idea why Drew is re-implementing his own?

                • For init system there’s runit, as well as some others like OpenRC (I have no experience with the latter though).
                • C compiler: tcc
                • GUI: fltk
                • Package manager: xbps, pacman
                • Shell: dash, mksh
                • High-level programming: lua is probably the best/simplest here.

                And probably some more … this is just from the top of my head.

                1. 6
                  • cproc is nicer than tcc in my opinion.
                  • janet is nicer than lua in my opinion.

                  But overall I agree.

                  1. 5

                    As the current maintainer of sbase, I’m also a bit confused about the re-implementation of UNIX tools. As far as I can tell, the main design goal differences are:

                    • To not reuse code between utilities, which I think is a mistake. Many tools use the same flags for symlink behavior for recursive operations, have to deal with mode strings in the same way, etc.
                    • To behave randomly when behavior is implementation defined. It is already hard enough to get the rest of the world to work with plain POSIX tools. I’ve spent quite a while making sure sbase works well with various scripts out there, for example building the Linux kernel, and I don’t see the point of making this harder than it needs to be.

                    However, I think the addition of a test suite is great. It’d be neat to see if could be extracted to a standalone project, similar to how libc-test can be used for multiple libc’s.

                    1. 2

                      The first goal to not reuse code is definitely a mistake, but I think that will become clear to them as they write more tools.

                1. 2

                  Overall, I think the answer is yes, at least for the types of projects I work on.

                  Which compiler does your Makefile support?

                  One that accepts the POSIX-specified options, and also supports common warning flags like -Wall, -Wextra, and -pedantic. But at least I can choose whatever compiler I wish, meson only accepts a hard-coded list of compilers: https://github.com/mesonbuild/meson/issues/5406

                  Does your Makefile support out-of-source build?

                  No, this is difficult with POSIX make, and something I wish could be done easily.

                  Does your Makefile support Windows at all?

                  Usually I’m not concerned with this, since my projects don’t target Windows. Does nmake support POSIX makefiles?

                  Is your Makefile a GNU makefile, or BSD makefile?

                  Both, just make sure to follow the POSIX standard.

                  Does your Makefile support cleaning the project from all autogenerated artifacts?

                  Yes, with a phony clean target.

                  Do you support a situation when the compiler/SDK will be upgraded on the system?

                  Do you track the dependencies on the libraries installed in the system?

                  No, the user will have to clean and rebuild.

                  Do you support setting a Release/Debug build of your project?

                  Any build-type is supported, the user just sets the CFLAGS and LDFLAGS they want.

                  Does your Makefile support passing custom CFLAGS or LDFLAGS?

                  Yes, they can set them in the environment.

                  Are you using thirdparty libraries in your project?

                  Preferably not, but if I do, they are detected with a small configure script using pkg-config, flags, and/or environment variables.

                  I don’t want to use CMake, because the project is small and it’s not worth it.

                  If I use CMake, that’s an additional build-type dependency, and also one that requires a C++ compiler, even if my project is in C. make is available almost everywhere.

                  What if someone just wants to use Eclipse, Xcode, Visual Studio, CodeBlocks, etc?

                  I’m not familiar with those tools, but surely they have a way to invoke make and show the output somewhere?

                  Does your Makefile support showing the full command line used to compile a compilation unit?

                  Yes, make does this by default.

                  1. 1

                    Just two points:

                    1. Even if you’ll get it right, it doesn’t mean that other people will also get it right,
                    2. For each ‘no’ in this list, there’s a CMake build script that makes you forget the issue exists, it just supports it.
                    1. 1

                      It’s just as easy to get a CMake build script wrong as it is to get a Makefile wrong. I’ve seen some atrocious CMake build scripts, and some brilliant Makefiles. I’ve also seen great CMake scripts and awful Makefiles!

                      Just because you get CMake right doesn’t mean others will.

                      1. 1

                        That’s true.

                        But it’s easier to write a good CMake script than a good Makefile script, even only because CMake has a lot of things already implemented, and people don’t need to re-do it in a wrong way.

                  1. 2

                    As far as I know, it’s made by the same person who created the musl libc, which is considered well written and clean.

                    nvm.

                    1. 2

                      No, musl libc was created by Rich Felker. Jens Gustedt is involved with WG14, the ISO C working group.

                    1. 1

                      These examples work on musl libc since it implements dlclose as a no-op: https://wiki.musl-libc.org/functional-differences-from-glibc.html#Unloading-libraries

                      1. 2

                        Nice work. If you get around to it, “restrict” is a great C feature that gcc/clang have ignored in favor of dangerous “optimizations”. Can “while(n– >0) *x = *y” be optimized by using vector registers or similar? Traditional C says, no because we don’t know if for example y = x +1 or something - if the pointers overlap. Modern GCC/Clang say “Oh boy, overlapping pointers is undefined, so we can just assume they do not” and compile to something that fails if they do”. But there is no reason why a C compiler can’t force the programmer to make it explicit: to assume no overlap, see if the programmer told us * (restrict)x = * (restrict )y or similar.

                        1. 1

                          Thanks for the suggestion. I think at the moment, there is quite a bit more lower hanging fruit in QBE, but it’s something to keep in mind.

                          I think we’d need to come up with some syntax in QBE IL to communicate the restrict information from the frontend to QBE, and I’m not quite sure how that would look. I’d have to spend some more time reading and understanding C11 6.7.3.1 (formal definition of restrict).

                          1. 1

                            unfortunately, the “restrict” definition in the C11 standard is absolute murk. I think it actually makes no sense - but the examples and notes give a clear indication of what it is supposed to mean. For example

                            EXAMPLE 2 The function parameter declarations in the following example
                            void f(int n, int * restrict p, int * restrict q)
                            {
                            while (n-- > 0)
                            *p++ = *q++;
                            }
                            assert that, during each execution of the function, if an object is accessed through
                            one of the pointer
                            parameters, then it is not also accessed through the other.
                            
                            1. 1

                              While I am asking for features - sorry - there is a similar issue with “volatile” which is a key feature, but needs to be passed into the back end.

                              1. 2

                                Yep, I’m well aware of this one. Right now there is no way to tell QBE not to optimize loads and stores, and I’ve been talking to the author about adding this feature.

                          1. 1

                            How portable is your compiler, and how easy is it to target a new architecture?

                            I’m utterly unfamiliar with QBE. Looking at QBE page, I see it says “Only x64 platforms are currently supported. ARM support is planned”, but I see you have AArch64 working, at least in your branch.

                            Without having to do a deep dive, does QBE or the compiler make non-portable assumptions? I’m thinking of things like big endianness, odd byte width (9-bit bytes), NULL != 0, 36-bit registers, etc.

                            I’m thinking specifically of the Honeywell 6000, but many of the same concerns would apply to other systems, such as the PDP-10.

                            1. 4

                              Unfortunately for your 36-bit dream, QBE strongly assumes 8-bit bytes, and I don’t think they want to change it. I also agree assuming 8-bit bytes is the right choice for QBE.

                              1. 1

                                Thanks for the response - we are currently exploring other options for our project, but we are always looking at all new compiler technology that comes around.

                              2. 3

                                The compiler itself (excluding the driver) should be completely standard C, no POSIX functions are used. I believe this is true of QBE as well.

                                I think the QBE page is out of date, since aarch64 is indeed supported. Judging by the size of the existing amd64 and arm64 directories in QBE, it looks like it takes ~1500 lines for a new architecture.

                                Regarding non-portable assumptions, QBE was designed for machines with 32/64 bit integer and floating point registers, and this is reflected in the basic data types of the IL (see the IL doc). So, I expect that it would not work so well with the type of machines you’re describing.

                                In the frontend, currently the primitive data types have hard-coded sizes, since these match between x86_64 and aarch64. There are also a few isolated places that assume little-endian in the bit-field implementation. But both of these are pretty easily fixable.

                                For NULL != 0, conversion from integer to pointer is implemented as a copy, so yes, I do make the assumption that null pointers are 0. However, I’m curious about how that works. Do you mean there are platforms where the C expression NULL != 0 evaluates to 1, or just where the underlying integer for a null pointer is non-zero? C defines a null pointer constant as an integer constant 0, possibly cast to void *. That means this expression is a comparison between two 0 values of integer type, or two null pointers which the standard says must compare equal. So I think the result is 0 in all cases.

                                1. 2

                                  The H6000 is an odd beast.

                                  There are existing C implementations, but we are looking for (possible) new options.

                                  Reflecting the design of the hardware, the standard C implementation has char types that are 9 bits, ints and floats that are 36 bits and longs that are 72 bits. There are no 18-bit (short) types. The standard signed char type is actually 8-bits and a high-order sign bit, representative of a range of -256 to 254. The unsigned char type is 9 bits, representing a value of 0 to 512. Importantly, the pointer is 72 bits wide (so it does not fit in an integer type, a common cause of porting headaches), as it consists of (if my memory is correct) a 36 bit offset location and a 36 bit segment reference.

                                  However, I’m curious about how that works. Do you mean there are platforms where the C expression NULL != 0 evaluates to 1, or just where the underlying integer for a null pointer is non-zero? C defines a null pointer constant as an integer constant 0, possibly cast to void *. That means this expression is a comparison between two 0 values of integer type, or two null pointers which the standard says must compare equal. So I think the result is 0 in all cases.

                                  On the Honeywell, the NULL is a pointer to void, which would be “0|-1” (location 0 of segment -1, or vice versa, I’d have to look) and is not equal to integer 0, but is an “impossible” value.

                                  1. 2

                                    Not to mention that the frontend itself could probably quite easily be ported to emit assembly directly (like 8cc), or emit llvm text or link against the llvm C api.

                                1. 4

                                  It’s very impressive that it can build GCC, it’s about the same size as 8cc (if we just count the cc code, not the additional code from the qbe backend) but far more capable. It shows that the task of compiling C code can be implemented in a modular way (separating frontend and backend) in an approachable codebase. gcc and clang are roughly 100x bigger in scale (lines of code) but they do include very powerful linkers and extra programming languages/frontends.

                                  Can it build any libc?

                                  I’m not sure how to use it to compile my own stuff though, I get an error that “random” function is unknown but my code includes the header.

                                  1. 6

                                    Can it build any libc?

                                    Not yet. The main missing pieces for building musl are inline assembly, volatile, and long double, all of which require some new features in QBE. I’m hoping to work with the author to implement them. Building musl is definitely on the roadmap.

                                    I’m not sure how to use it to compile my own stuff though, I get an error that “random” function is unknown but my code includes the header.

                                    By default, on glibc targets we pass -D __STRICT_ANSI__ to cpp, since otherwise glibc headers will use statement expressions, which are not yet supported. This has the side-effect of disabling the default feature-test macros defined by the preprocessor. My guess is you haven’t defined the right feature-test macros to get your libc to expose random, and are relying on your compiler defaults. I think you’d hit the same error if you built with CC='gcc -std=c99'. Try adding -D _DEFAULT_SOURCE to your CFLAGS.

                                    However, now that I think about it, this was because we used to define __GNUC__=1 which ended up causing more problems then it solved and has since been reverted. I think it should be okay to also remove -std=c11 and -D __STRICT_ANSI__ from config.h to get the behavior you expect by default. I’ll look into it.

                                    Thanks for trying it out!