1. 8
  1.  

  2. 4

    This hurts, because all of the examples seem so…expensive. As in:

    strcat( strcat( strcpy( _alloca( strlen( FirstString) + strlen( SecondString) + strlen( ThirdString) + 1), FirstString), SecondString), ThirdString)
    

    So, walk through the first string to count its length; walk the second; walk the third; walk the first again to copy; walk the first to find the terminator, then walk the second to copy; walk the first and second again to find the terminator, then walk the third to copy. That’s four passes through the first string, three through the second, two through the third.

    Without realizing it, I think the author is making the argument that strcpy/strcat should return pointers to the end of the string they just acted on, allowing the caller to continue to construct more on the end of it.

    Personally I think strcat is just a terrible API; it’s basically strlen + strcpy. It’s very rare to not have already calculated the length of a string via some mechanism before calling strcat; arguably it’s invalid to call without having calculated the length beforehand lest it overflow. But if you knew the length of the string, you shouldn’t use strcat, because you don’t need the strlen part of it.

    1. 1

      strlcat does it a bit better by returning the new length, so you don’t need to strlen multiple times. https://linux.die.net/man/3/strlcpy

      Unfortunately it’s not in glibc since the maintainers have strong opinions about it… (https://sourceware.org/legacy-ml/libc-alpha/2000-08/msg00053.html)

      1. 4

        strcat() and strlcat() are all terrible. What’s so bad about:

        char filename[FILENAME_LEN];
        snprintf(filename,sizeof(filename),"%s%s%s",pNS,NAMESPACE_SEPARATOR_STR,pclsName);
        

        No buffer overflows, one library call, easier to see the intent. I replaced about 20 lines of convoluted logic with just one call to snprintf() at work. Sigh.

        1. 1

          It’s ok, but it does the call and parsing the format string. With few strings, you don’t need convoluted logic with:

          size_t written=0;
          written += strlcpy(filename+written, pNS, sizeof(filename)-written);
          written += strlcpy(filename+written, pNAMESPACE_SEPARATOR_STR, sizeof(filename)-written);
          written += strlcpy(filename+written, pclsName, sizeof(filename)-written);
          
          1. 4

            Depends on how you qualify ‘convoluted logic’. The snprintf call is clearest and most self-explanatory. The strlcpy calls are less clear and have more room for bugs, and though somewhat more performant, are not optimal (vs, say, a solution which knows the lengths of strings instead of looking for nulls).

            And anyway, if you are constructing a path name, you are probably going to open (or make some other syscall), the overhead of which far outweighs the overhead of calling printf. Meanwhile printf is impossible to beat, codesize-wise, so the codebase which uses printf code may actually be faster due to second-order effects.

            1. 2

              If all you’re using is %s, then you’re probably safe, but don’t forget that printf is locale-aware and so if anything in your program has set the local (on any thread, because the locale is a per-process property, not a per-thread one) then using the number specifiers with anything in the printf family may end up with spaces, dots, or commas as separators, depending on the user’s locale.

    2. 3

      I think this justification is overthinking it. In old C calling conventions, parameters were passed on the stack and popped by the caller, but there was no requirement that they remain unmodified on the stack. With register calling conventions, argument registers are all caller-save[1], yet the argument to these functions is often used. This means that you end up with the caller saving the value on the stack, then pushing it into the argument frame. If it’s the return value, then you save a stack space for the spill because you get the value back at the end. This is even more apparent on register-based calling conventions, where the return address and the first argument are often the same register. Returning the first argument is free.

      With a dumb compiler and a small amount of source-level optimisation (use the result, not the source), you’ll get better code. Remember, this is a programming language designed to squeeze the last bit of performance and code density out of machines where 128 KiB of RAM was a lot.

      [1] I strongly suspect that this is the wrong decision but I’ve only done static instruction count analysis, not dynamic, and so can’t be 100% sure. I was hoping to persuade an undergrad to do this analysis as a final-year project. Any students reading this and wanting a compiler project, please ping me!

      1. 1

        Stack passed parameters were popular on CISC machines back in the 80s/90s (VAX, 68000, 80386) even if they weren’t register starved (like the VAX and 68000). RISC based machines (MIPS, SPARC) used register passing, spilling to memory only if they had to (probably more often on MIPS than SPARC due to the SPARC’s register windows, but then task switches on SPARC took longer because of the larger number of registers to save).

        With register calling conventions, argument registers are all caller-save (I strongly suspect that this is the wrong decision but I’ve only done static instruction count analysis, not dynamic, and so can’t be 100% sure)

        Because the compiler might be using them for non-argument reasons? For x86-64, the registers RDI, RSI, RDX, RCX, R8, R9, and XMM0-7 are used as input parameters. If the compiler is using, say, RCX for something and calling a function with only three parameters, it will need to save RCX (because the called routine might call a function with four parameters). It makes sense to me.

        This is even more apparent on register-based calling conventions, where the return address and the first argument are often the same register.

        How does that work? If register rx contains the return address, if you stuff an argument into rx, then the call will wipe out the argument with the return address. What you said there doesn’t make sense.

        Remember, this is a programming language designed to squeeze the last bit of performance and code density out of machines where 128 KiB of RAM was a lot.

        I’ve never seen a C compiler use the ret N instruction (x86, return, remove N bytes of parameters from the stack; 68000 and VAX have similar instructions) but ret followed by a add sp,N (or add esp,N depending if 32 bit code). ret N would save at least two bytes per call.

        1. 1

          Because the compiler might be using them for non-argument reasons? For x86-64, the registers RDI, RSI, RDX, RCX, R8, R9, and XMM0-7 are used as input parameters. If the compiler is using, say, RCX for something and calling a function with only three parameters, it will need to save RCX (because the called routine might call a function with four parameters). It makes sense to me.

          It makes sense in some cases. For C++, I strongly suspect that preserving the first argument register (which contains this) would be a net win. It resulted in a net code size reduction when I tried it but I didn’t investigate what that did to dynamic behaviour. Most functions call more than one method on the same object, and most methods need to keep their receiver live right up until the end, so it resulted in fewer spills.

          How does that work? If register rx contains the return address, if you stuff an argument into rx, then the call will wipe out the argument with the return address. What you said there doesn’t make sense.

          Typo. I meant the return value register.

          I’ve never seen a C compiler use the ret N instruction (x86, return, remove N bytes of parameters from the stack; 68000 and VAX have similar instructions) but ret followed by a add sp,N (or add esp,N depending if 32 bit code). ret N would save at least two bytes per call.

          C can’t do this because the existence of variadic functions in C (and the existence of K&R-style functions) requires that the caller cleans up the stack. If the caller and callee are guaranteed to agree on the number of parameters to a function then either can clean up the stack. Since the callee is going to be adjusting the stack pointer to clean up its own frame, it can clean up the argument area as well. Most Pascal ABIs did this. In C, you can pass more parameters than a function expects[1] and so the callee can’t clean them up, the caller must. This is also why tail-call optimisation is hard in C.

          [1] This is UB according to the spec but it’s sufficiently common that all ABIs that have tried to do anything different have hit big compatibility problems.

      2. 1

        Some alternative ideas to that: https://github.com/leahneukirchen/libste