1. 15

Sometimes I have crazy ideas, this is one of them. They don’t always work out. ;)

  1.  

  2. 8

    Turning it into a static array in the data segment and then stripping the resulting binary gives you a ~7k difference and a compile time of under a second:

    $ time c99 -Os foo.c
    
    real	0m0.539s
    user	0m0.515s
    sys	0m0.023s
    
    $ strip -s a.out
    
    $ ls -l a.out pg2162.txt 
    -rwxrwxr-x. 1 baron baron 443064 Dec 20 12:19 a.out
    -rw-rw-r--. 1 baron baron 431888 Dec 20 12:03 pg2162.txt
    

    I didn’t check the assembly but I’m assuming that when using the method in TFA, actual move instructions are generated for each assignment and those aren’t particularly small. Putting it in the data segment just plops the chars in there unmolested.

    EDIT

    Yeah, it’s generating move instructions. GCC (I didn’t test Clang) attempts to optimize for size by packing everything into 64-bit quantities and then performing the move all at once for each block of eight characters rather than moving one character at a time.

    1. 2

      how did you move that data array into data segment?

      Overall I find this quite interesting, I wonder what web assembly would look like for this. It is sort of like the challenge building a prepackaged/compressed website.

      1. 3

        Essentially, changed the declaration to static const and changed the way it was populated from a bunch of assignments like

           foo[0] = 65
        

        to a static array initializer like

           foo[39393] = {
              65,
              38,
              41,
              ...
        
        1. 2

          ah, got it. thanks.

      2. 2

        I was pretty sure compilers wouldn’t just zip up the contents of a static array. That’s why I put it in code, because that’s where I know compilers perform optimizations.

        1. 2

          I had assumed as much, sorry. Wasn’t trying to steal any thunder, I was just testing for my own curiosity.

          What’s neat, IMHO, was the massive difference between no optimization and Os: almost three MB for the unoptimized version, and like 900k for the optimized one. That was pretty neat.

          1. 2

            Heh, no thunder was stolen :) I agree, the difference between unoptimized and Os is quite significant!

      3. 2

        I think there’s a whole lot of crap that gcc and clang dump by default that can be removed from the output. I don’t think you need libc, etc etc