1. 3
  1.  

  2. 16

    I’m not sure I understand the author’s point here, or rather, why it is a strlcpy problem. This code:

    char chararray[6];
    strncpy(chararray, "abcdef", sizeof(chararray));
    strlcpy(chararray, "abc", sizeof(chararray));
    

    On my machine, the final state of chararray is abc\0ef. In what way is this a gotcha? These functions are designed to work on C strings and this output is a valid C string. Every single time you copy data between buffers, you could observe this effect - memory you didn’t write to retains its original data.

    Reading uninitialized memory is UB. Don’t do that. This is one of a million different ways to footgun yourself.

    1. 2

      In what way is this a gotcha?

      As you point out, yeah the string naturally has remnants of its old state which isn’t such a big deal. The real gotcha of strncpy that the author didn’t point out is that

      strncpy(dst, src, BUFSZ);
      

      won’t add \0 to dst if BUFSZ is shorter than src. In OP’s case chararray will not be NUL terminated after execution of the strncpy line. (His strlcpy line fixes the problem, and thankfully the string was not used earlier.)

      These functions are designed to work on C strings

      strncpy has an unusual history, actually. The function was designed to copy filenames into fixed-width fields, where two fields were considered equal when each and every character of one matched the other. Filename comparisons in that particular system didn’t stop at \0 but blindly compared the full length of both memory regions. Strncpy is an outlier in string.h, and is very unlikely to be useful for most string tasks.

      For portable strncpy alternatives in C89 and C99 see https://begriffs.com/posts/2019-01-19-inside-c-standard-lib.html#string.h-strncpy

      I’m not sure I understand the author’s point here, or rather, why it is a strlcpy problem.

      Me neither, if you want zeroed memory in a sensitive place then memset it before use, and obtain it from calloc.

      Reading uninitialized memory is UB.

      Oh I didn’t know that but you’re right! Annex J of the standard lists that situation: “The value of an object allocated by the malloc function is used.”

      1. 1

        strncpy(dst, src, BUFSZ);

        won’t add \0 to dst if BUFSZ is shorter than src. In OP’s case chararray will not be NUL terminated after execution of the strncpy line.

        Hmm, maybe strncpy is poorly named then. It seems strange to have a function named str* that doesn’t always produce c strings considering there are mem* functions like memcpy that do the same thing.

        strncpy has an unusual history, actually. The function was designed to copy filenames into fixed-width fields, where two fields were considered equal when each and every character of one matched the other. Filename comparisons in that particular system didn’t stop at \0 but blindly compared the full length of both memory regions.

        Hmm, with the glaring obviousness of memcmp and memcpy being easily usable here, this case seems like a historical wart - a relic of some namespacing limitation, programmer opinion, or something else. Maybe a memory width relic. Who knows!

        Annex J of the standard lists that situation: “The value of an object allocated by the malloc function is used.”

        I assume this implies only allocated memory and not stack memory? This would make sense and still obviate the strncpy problem you mentioned.

    2. 3

      This needs a threat model; that is, in what reasonable scenario can this be used against you.

      Otherwise, it’s just paranoia.

      1. 2

        I used to sort of like strl*

        But I now prefer to let my code die noisily and as soon as possible when I do such stupid…

        I then know I’m doing stupid.

        I then fix my code.

        My code is then no longer stupid.

        1. 3

          What are you trying to say? Do you prefer to use strncpy instead of strlcpy because strncpy is somehow noisier? What?

          1. 2

            I’m saying both are a Bad Idea.

            If you hit the case in strl* where you need to truncate the result so you can null terminate AND/OR the case in strn* where you don’t null terminate…..

            You have a bug.

            So wrap ’em both (or replace them) and check for that case and die noisily as soon as possible and then fix your code.

            Or use a memory safe language such as D.

            1. 1

              String truncation is not necessarily a bug. You might have outside requirements. The correct thing to do is to check it properly, which needs to be done in all languages.

              strlcpy is cumbersome for truncation check. At least it leaves the memory in a better shape. strscpy is better to quickly check that you properly copied your string

              1. 2

                If we were all perfect… we wouldn’t even be talking about strn*

                We’re talking about it because the strn* api’s are a very very common source of defects. They are at the 2.5 to 3 out 10 on the Rusty Scale of API Goodness http://sweng.the-davies.net/Home/rustys-api-design-manifesto

                So you’re right. “String truncation is not necessarily a bug”, it just is around about 99 out of a 100 uses.

                My proposal bumps it up to 5/10 on the scale.

        2. 2

          This is not the job of strlcpy to initialize memory.

          memset or = {0};, don’t rely on quirks from badly designed functions. strncpy() will always write len bytes, but that’s going beyond the scope of a copy function.

          1. 2

            I was going to post something snarky about Rust’s CString making life easier, but I looked into it and it really doesn’t. C strings are actually just a pox and blight on humanity no matter the language.

            1. 2

              Honestly? There’s too much contention around the use of the str(cpy|ncpy|lcpy) functions, that’s not even getting into the mess of the ‘safe’ variants. Just use memcpy and memmove. Everyone understands them, there’s no room for error.

              1. 2

                Those functions work on “strings”, i.e. it must be \0 terminated. When you then use those “strings” like chararrays, without the proper precautions, yes you will leak memory and worse.

                1. 1
                  #define strscpy(dst, src, len)  \
                      do {                        \
                          memset(dst, 0, len);    \
                          strlcpy(dst, src, len); \
                      while (0);
                  

                  How’s this? I bet there’s still a one-off bug somewhere.

                  1. 3

                    not that it matters, but memset(3) will return dst, so you could (maybe not should) also do

                    #define strscpy(dst, src, len) \
                    	strlcpy(memset(dst, 0, len), src, len)
                    
                    1. 2

                      Still has the problem of evaluating len twice.

                      For clarity’s sake, a better approach here would be to implement strscpy as a (potentially inline) function rather than a macro. The types of all the arguments are known and there’s no preprocessor trickery going on.

                    2. 2

                      Probably just a typo, but drop the semicolon after while (0). Having it defeats the purpose of wrapping your code in a do {} while loop in the first place.

                      1. 1

                        You’re right that it’s a typo, but it doesn’t break anything, as far as I see. It would just equality valid to add or to omit a semicolon in the real code.

                        1. 10

                          The whole point of using do { ... } while (0) is to handle the case where adding a semicolon in the real code is not valid. Consider the calling code

                          if (a)
                              macro();
                          else
                              foo();
                          

                          If you define your macro as #define macro() do { ... } while (0) then this works fine. But if you define it as do { ... } while (0); then this expands to

                          if (a)
                              do { ... } while (0);;  /* note two semicolons here */
                          else
                              foo();
                          

                          That extra semicolon counts as an extra empty statement between the body of the if and the else. You can’t have two statements in the body of an if (without wrapping things with curly braces) so the compiler will refuse to compile this. Probably complaining that the else has no preceding if. This is the same reason why plain curly braces don’t work properly in a macro.

                      2. 2

                        How do you detect truncation?

                        strlcpy will also attempt to evaluate strlen(src), meaning that if src is malformed, you will read memory that should not be read, and you will waste time evaluating it in every case.

                        ssize_t strscpy(char *dst, const char *src, size_t len)
                        {
                        	size_t nleft = len;
                        	size_t res = 0;
                        
                        	/* Copy as many bytes as will fit. */
                        	while (nleft != 0) {
                        		dst[res] = src[res];
                        		if (src[res] == '\0')
                        			return res;
                        		res++;
                        		nleft--;
                        	}
                        
                        	/* Not enough room in dst, set NUL and return error. */
                        	if (res != 0)
                        		dst[res - 1] = '\0';
                        	return -E2BIG;
                        }
                        
                        1. 1
                          char *dir, pname[PATH_MAX];
                          if (strlcpy(pname, dir, sizeof(pname)) >= sizeof(pname))
                              goto toolong;