1. 18
  1.  

  2. 1

    I feel like I need corroboration to know if this is really good advice or not. 😄

    1. 4

      I include an implementation of strlcpy if that’s missing on the target, it’s not a complex function to implement if you cannot include a third-party implementation for some reason.

      If you can replace strcpy with memcpy, then it’s true you should have been using memcpy in the first place. However you cannot always replace strcpy with memcpy with the same efficiency, and strlcpy has the correct semantics.

      I do agree the *_s variants are pointless.

      1. 2

        Totally agree! As I read this post, I remembered this other post on the same topic.

        strcpy, like gets, is fundamentally unsafe and there’s no way to use it safely unless the source buffer is known at compile-time. I know multiple people who give out the advice to use strncpy instead of strcpy, but I’m not a believer. Using strncpy requires that you know the length of the destination buffer, and if you know that, then you could be using memcpy instead.

        If you want a safer strcpy, here you go:

        char * strcpy_improved(char * dest, const char * src, size_t dest_size) {
            size_t length = strlen(src) + 1;
            if (length > dest_size) {
                return NULL;
            }
            memcpy(dest, src, length);
            return dest;
        }
        

        This is basically how strcpy is implemented in glibc, with the length check added. This is what unaware people believe strncpy does.

        This still isn’t totally foolproof – if src is not a valid string, or either pointer is NULL, that’s undefined behaviour. Also, technically all identifiers should be unique in their first 6 characters, and all identifiers beginning with str are reserved anyway, but that’s C programming for you.

        I honestly don’t know what the point of strncpy is. I understand the urge to strcpy to copy a short string into a large buffer; it only copies as many bytes as necessary. But strncpy does not do this – it copies the string into the buffer, and then it fills the buffer with null bytes until it has written as many bytes as you told it to. Basically, it’s worse than memcpy in every way unless this particular weird behaviour is what you really want. To call it “niche” is not only fair, it’s kind.

        2 years ago, I submitted to a C library a pull request which changed a strncpy to a memcpy when gcc started issuing warnings about bad uses of strncpy. I kinda wish gcc would issue a warning for any use of the str*cpy functions, possibly with a link to some helpful advice on what to do instead.

        1. 3

          Using strncpy requires that you know the length of the destination buffer, and if you know that, then you could be using memcpy instead

          The length of the destination buffer is the maximum number of characters you can copy. The length of the source string is the maximum number that you want to copy. In any cases where the former is smaller than the latter, you want to detect an error.

          The strlcpy function is good for this case. It doesn’t require you to scan the source string twice (once to find the null terminator, once to do the copy) and it lets you specify the maximum size. It always returns a null-terminated buffer (unlike strncpy, which should never be used because if the destination is not long enough then it doesn’t null terminate and so is spectacularly dangerous).

          There are three cases:

          • You know the length of the source and the size of the destination. Use memcpy.
          • You know the size of the destination. Use strlcpy, check for error (or don’t if you don’t care about truncation - the result is deterministic and if you’ve asked for a string up to a certain size then strlcpy may enforce this for you).
          • You don’t want to think about the size of the destination. Use strdup and let it allocate a buffer that’s big enough for your string.

          99% of cases I’ve used, strdup is the right thing to do. Don’t worry about the string length, just let libc handle allocating a buffer for it. For most of the rest, strlcpy is the right solution. If memcpy looks like the right thing, you’re probably dealing with some abstraction over C strings, rather than raw C strings. If you’re willing to do that, use C++’s std::string, let it worry about all of this for you, and spend your time on your application logic and not on tedious bits of C memory management.

          1. 1

            strlcpy is better, and if truncation to the length of your dest buffer is what you want, then it’s the best solution. More commonly, I want to reallocate a larger buffer and try again, but you’re correct that strdup is a much simpler way to get that result most of the time.

            I decided to look up the Linux implementation of strlcpy, and it works the same way as my function above: a strlen and then a memcpy. So it does still traverse the array twice, but I don’t see why that’s a problem.

            1. 2

              I decided to look up the Linux implementation of strlcpy, and it works the same way as my function above: a strlen and then a memcpy. So it does still traverse the array twice, but I don’t see why that’s a problem.

              I found that a bit surprising, but that’s the in-kernel version so who knows what the constraints were. The FreeBSD version (which was taken from OpenBSD, which is where the function originated) doesn’t. The problem with traversing the string twice is threefold:

              • If the string is large, the first traversal will evict parts of beginning from L1 cache so you’ll hit L1 misses on both traversals.
              • You are far more likely to want to use the destination soon than the source, but the fact that you’ve read it twice in quick succession will hint the caches that you’re likely to use the source again and they’ll prioritise evicting things that you don’t want.
              • [Far less important on modern CPUs]: You’re running a load more instructions because you have all of the loop logic twice.

              The disadvantage of this is that it’s far less amenable to vectorisation than the strlen + memcpy version. Without running benchmarks, I don’t know which is going to be slower. The cache effects won’t show up in microbenchmarks so I’d need to find a program that used strlcpy on a hot path for it to matter.

              1. 1

                You raise some compelling points! And compiler optimizations will throw another wrench in there. Without doing rigorous benchmarking, this is all speculation, but it’s interesting speculation.

          2. 2

            I honestly don’t know what the point of strncpy is. I understand the urge to strcpy to copy a short string into a large buffer; it only copies as many bytes as necessary. But strncpy does not do this – it copies the string into the buffer, and then it fills the buffer with null bytes until it has written as many bytes as you told it to. Basically, it’s worse than memcpy in every way unless this particular weird behaviour is what you really want. To call it “niche” is not only fair, it’s kind.

            strncpy was intended for fixed-length character fields such as utmp; it wasn’t designed for null-terminated strings. It’s error prone so I replace strnc(at|py) with strlc(at|py) or memmove.

            1. 2

              strcpy, like gets, is fundamentally unsafe and there’s no way to use it safely unless the source buffer is known at compile-time.

              Huh? The danger of gets is completely different from that of strcpy (and the former is certainly worse) – gets does I/O, taking in arbitrary, almost-certainly unknown input data; strcpy operates entirely on data already within your program’s address space and (hopefully) already known to be a valid, NUL-terminated string of a known length. Yes, it is very possible (easy, even) to screw that up and end up with arbitrary badness, but it’s a lot easier to get right than ensuring that whatever bytes gets pulls in are going to contain a linefeed within the expected number of bytes (the only way I can think of offhand for using gets safely would involve a dup-ing a pipe or socketpair or something you created yourself to your own stdin and writing known data into it).

              (This is not to say that strcpy is great, nor to negate the point of the article that it can and quite arguably should be replaced by memcpy in most cases. But it’s not as grossly broken as gets.)

              1. 1

                Okay, I might have exaggerated there. :^)

                strcpy is suitable for some situations – copying static strings, or strings that are otherwise of a known length. In the latter case, memcpy is better. In the former case, I actually think strcpy is fine, even though this article argues against it. I would expect a modern compiler to optimize those copies to memcpy anyway.

                gets is basically totally unusable in all situations. Somebody doing something bizarre like you mentioned should probably rethink their approach…