1. 24

  2. 12

    One thing he didn’t mention is the return value of main() is also a bit of a construct. There’s no concept of returning a value from usermode to the kernel on function exit; the return value is given to the kernel as part of an _exit() or similar call, which depends on main() not being the real entrypoint of the program.

    Although the code he linked to kind of implies it, one other telltale sign that main() is a construct is the ability to create objects that invoke constructors prior to main() running. The kernel isn’t constructing those objects - it’s just a list of code to execute before (and possibly after) main.

    On Windows the difference with main() is even more pronounced. The real entrypoint is specified to the linker, but is commonly mainCRTStartup(). This receives no arguments and returns no value. It is responsible for getting the command line from GetCommandLine() and parsing it to form argc/argv, then invoking main.

    The other strange thing on Windows is there are many potential entrypoints, including main, wmain, WinMain, etc., depending on the type of program. The linker has the rather awful task of firstly finding the entrypoint in the code you’re compiling, then working backwards from there, finding the “real” startup code that will call it. The linker cannot just include all of the startup code, because if it did, it would hit unresolved externals trying to resolve the entrypoints that your code doesn’t use. The linker knows to discard entire modules if there is no way in and no way out of the module, so this process works but it depends on each piece of startup code residing in its own module. Put them in the same module, and the result is undefined externals.

    1. 3

      On Windows […] the real entrypoint is specified to the linker

      This is actually the case on unix, too. It’s just that linkers generally default to _start, and no one has a good reason to override this, so in practice it’s always _start. _start is a tiny assembly stub which calls a function like __libc_start_main after collecting the addresses of main, argc, argv, and envp (if applicable); the latter three need to be fetched in an arch-dependent way. __libc_start_main is usually written in c (though likely still dependent on the host OS; it’s not completely portable).

      You can also tell that main isn’t the real entry point of a program because of atexit().

    2. 3

      The two argument version of main() goes back to at least Research Unix V4’s exec(2),

      That interface is 47 years old!

      1. 1

        Reserving two bytes at the start insures that no variable or other thing in the data section can be located at address 0 and so C NULL is always distinct from valid pointers.

        Nice. Can you use address 0 as a 2B scratch memory then?