I include an implementation of strlcpy if thatâs missing on the target, itâs not a complex function to implement if you cannot include a third-party implementation for some reason.
If you can replace strcpy with memcpy, then itâs true you should have been using memcpy in the first place. However you cannot always replace strcpy with memcpy with the same efficiency, and strlcpy has the correct semantics.
Totally agree! As I read this post, I remembered this other post on the same topic.
strcpy, like gets, is fundamentally unsafe and thereâs no way to use it safely unless the source buffer is known at compile-time. I know multiple people who give out the advice to use strncpy instead of strcpy, but Iâm not a believer. Using strncpy requires that you know the length of the destination buffer, and if you know that, then you could be using memcpy instead.
This is basically how strcpy is implemented in glibc, with the length check added. This is what unaware people believe strncpy does.
This still isnât totally foolproof â if src is not a valid string, or either pointer is NULL, thatâs undefined behaviour. Also, technically all identifiers should be unique in their first 6 characters, and all identifiers beginning with str are reserved anyway, but thatâs C programming for you.
I honestly donât know what the point of strncpy is. I understand the urge to strcpy to copy a short string into a large buffer; it only copies as many bytes as necessary. But strncpy does not do this â it copies the string into the buffer, and then it fills the buffer with null bytes until it has written as many bytes as you told it to. Basically, itâs worse than memcpy in every way unless this particular weird behaviour is what you really want. To call it ânicheâ is not only fair, itâs kind.
2 years ago, I submitted to a C library a pull request which changed a strncpy to a memcpy when gcc started issuing warnings about bad uses of strncpy. I kinda wish gcc would issue a warning for any use of the str*cpy functions, possibly with a link to some helpful advice on what to do instead.
Using strncpy requires that you know the length of the destination buffer, and if you know that, then you could be using memcpy instead
The length of the destination buffer is the maximum number of characters you can copy. The length of the source string is the maximum number that you want to copy. In any cases where the former is smaller than the latter, you want to detect an error.
The strlcpy function is good for this case. It doesnât require you to scan the source string twice (once to find the null terminator, once to do the copy) and it lets you specify the maximum size. It always returns a null-terminated buffer (unlike strncpy, which should never be used because if the destination is not long enough then it doesnât null terminate and so is spectacularly dangerous).
There are three cases:
You know the length of the source and the size of the destination. Use memcpy.
You know the size of the destination. Use strlcpy, check for error (or donât if you donât care about truncation - the result is deterministic and if youâve asked for a string up to a certain size then strlcpy may enforce this for you).
You donât want to think about the size of the destination. Use strdup and let it allocate a buffer thatâs big enough for your string.
99% of cases Iâve used, strdup is the right thing to do. Donât worry about the string length, just let libc handle allocating a buffer for it. For most of the rest, strlcpy is the right solution. If memcpy looks like the right thing, youâre probably dealing with some abstraction over C strings, rather than raw C strings. If youâre willing to do that, use C++âs std::string, let it worry about all of this for you, and spend your time on your application logic and not on tedious bits of C memory management.
strlcpy is better, and if truncation to the length of your dest buffer is what you want, then itâs the best solution. More commonly, I want to reallocate a larger buffer and try again, but youâre correct that strdup is a much simpler way to get that result most of the time.
I decided to look up the Linux implementation of strlcpy, and it works the same way as my function above: a strlen and then a memcpy. So it does still traverse the array twice, but I donât see why thatâs a problem.
I decided to look up the Linux implementation of strlcpy, and it works the same way as my function above: a strlen and then a memcpy. So it does still traverse the array twice, but I donât see why thatâs a problem.
I found that a bit surprising, but thatâs the in-kernel version so who knows what the constraints were. The FreeBSD version (which was taken from OpenBSD, which is where the function originated) doesnât. The problem with traversing the string twice is threefold:
If the string is large, the first traversal will evict parts of beginning from L1 cache so youâll hit L1 misses on both traversals.
You are far more likely to want to use the destination soon than the source, but the fact that youâve read it twice in quick succession will hint the caches that youâre likely to use the source again and theyâll prioritise evicting things that you donât want.
[Far less important on modern CPUs]: Youâre running a load more instructions because you have all of the loop logic twice.
The disadvantage of this is that itâs far less amenable to vectorisation than the strlen + memcpy version. Without running benchmarks, I donât know which is going to be slower. The cache effects wonât show up in microbenchmarks so Iâd need to find a program that used strlcpy on a hot path for it to matter.
You raise some compelling points! And compiler optimizations will throw another wrench in there. Without doing rigorous benchmarking, this is all speculation, but itâs interesting speculation.
I honestly donât know what the point of strncpy is. I understand the urge to strcpy to copy a short string into a large buffer; it only copies as many bytes as necessary. But strncpy does not do this â it copies the string into the buffer, and then it fills the buffer with null bytes until it has written as many bytes as you told it to. Basically, itâs worse than memcpy in every way unless this particular weird behaviour is what you really want. To call it ânicheâ is not only fair, itâs kind.
strncpy was intended for fixed-length character fields such as utmp; it wasnât designed for null-terminated strings. Itâs error prone so I replace strnc(at|py) with strlc(at|py) or memmove.
strcpy, like gets, is fundamentally unsafe and thereâs no way to use it safely unless the source buffer is known at compile-time.
Huh? The danger of gets is completely different from that of strcpy (and the former is certainly worse) â gets does I/O, taking in arbitrary, almost-certainly unknown input data; strcpy operates entirely on data already within your programâs address space and (hopefully) already known to be a valid, NUL-terminated string of a known length. Yes, it is very possible (easy, even) to screw that up and end up with arbitrary badness, but itâs a lot easier to get right than ensuring that whatever bytes gets pulls in are going to contain a linefeed within the expected number of bytes (the only way I can think of offhand for using gets safely would involve a dup-ing a pipe or socketpair or something you created yourself to your own stdin and writing known data into it).
(This is not to say that strcpy is great, nor to negate the point of the article that it can and quite arguably should be replaced by memcpy in most cases. But itâs not as grossly broken as gets.)
strcpy is suitable for some situations â copying static strings, or strings that are otherwise of a known length. In the latter case, memcpy is better. In the former case, I actually think strcpy is fine, even though this article argues against it. I would expect a modern compiler to optimize those copies to memcpy anyway.
gets is basically totally unusable in all situations. Somebody doing something bizarre like you mentioned should probably rethink their approachâŚ
I feel like I need corroboration to know if this is really good advice or not. đ
I include an implementation of strlcpy if thatâs missing on the target, itâs not a complex function to implement if you cannot include a third-party implementation for some reason.
If you can replace strcpy with memcpy, then itâs true you should have been using memcpy in the first place. However you cannot always replace strcpy with memcpy with the same efficiency, and strlcpy has the correct semantics.
I do agree the *_s variants are pointless.
Totally agree! As I read this post, I remembered this other post on the same topic.
strcpy
, likegets
, is fundamentally unsafe and thereâs no way to use it safely unless the source buffer is known at compile-time. I know multiple people who give out the advice to usestrncpy
instead ofstrcpy
, but Iâm not a believer. Usingstrncpy
requires that you know the length of the destination buffer, and if you know that, then you could be usingmemcpy
instead.If you want a safer
strcpy
, here you go:This is basically how strcpy is implemented in glibc, with the length check added. This is what unaware people believe
strncpy
does.This still isnât totally foolproof â if
src
is not a valid string, or either pointer isNULL
, thatâs undefined behaviour. Also, technically all identifiers should be unique in their first 6 characters, and all identifiers beginning withstr
are reserved anyway, but thatâs C programming for you.I honestly donât know what the point of
strncpy
is. I understand the urge tostrcpy
to copy a short string into a large buffer; it only copies as many bytes as necessary. Butstrncpy
does not do this â it copies the string into the buffer, and then it fills the buffer with null bytes until it has written as many bytes as you told it to. Basically, itâs worse thanmemcpy
in every way unless this particular weird behaviour is what you really want. To call it ânicheâ is not only fair, itâs kind.2 years ago, I submitted to a C library a pull request which changed a
strncpy
to amemcpy
when gcc started issuing warnings about bad uses ofstrncpy
. I kinda wish gcc would issue a warning for any use of thestr*cpy
functions, possibly with a link to some helpful advice on what to do instead.The length of the destination buffer is the maximum number of characters you can copy. The length of the source string is the maximum number that you want to copy. In any cases where the former is smaller than the latter, you want to detect an error.
The
strlcpy
function is good for this case. It doesnât require you to scan the source string twice (once to find the null terminator, once to do the copy) and it lets you specify the maximum size. It always returns a null-terminated buffer (unlikestrncpy
, which should never be used because if the destination is not long enough then it doesnât null terminate and so is spectacularly dangerous).There are three cases:
memcpy
.strlcpy
, check for error (or donât if you donât care about truncation - the result is deterministic and if youâve asked for a string up to a certain size thenstrlcpy
may enforce this for you).strdup
and let it allocate a buffer thatâs big enough for your string.99% of cases Iâve used,
strdup
is the right thing to do. Donât worry about the string length, just let libc handle allocating a buffer for it. For most of the rest,strlcpy
is the right solution. Ifmemcpy
looks like the right thing, youâre probably dealing with some abstraction over C strings, rather than raw C strings. If youâre willing to do that, use C++âsstd::string
, let it worry about all of this for you, and spend your time on your application logic and not on tedious bits of C memory management.strlcpy
is better, and if truncation to the length of yourdest
buffer is what you want, then itâs the best solution. More commonly, I want to reallocate a larger buffer and try again, but youâre correct thatstrdup
is a much simpler way to get that result most of the time.I decided to look up the Linux implementation of
strlcpy
, and it works the same way as my function above: astrlen
and then amemcpy
. So it does still traverse the array twice, but I donât see why thatâs a problem.I found that a bit surprising, but thatâs the in-kernel version so who knows what the constraints were. The FreeBSD version (which was taken from OpenBSD, which is where the function originated) doesnât. The problem with traversing the string twice is threefold:
The disadvantage of this is that itâs far less amenable to vectorisation than the
strlen
+memcpy
version. Without running benchmarks, I donât know which is going to be slower. The cache effects wonât show up in microbenchmarks so Iâd need to find a program that usedstrlcpy
on a hot path for it to matter.You raise some compelling points! And compiler optimizations will throw another wrench in there. Without doing rigorous benchmarking, this is all speculation, but itâs interesting speculation.
strncpy was intended for fixed-length character fields such as utmp; it wasnât designed for null-terminated strings. Itâs error prone so I replace strnc(at|py) with strlc(at|py) or memmove.
Huh? The danger of
gets
is completely different from that ofstrcpy
(and the former is certainly worse) âgets
does I/O, taking in arbitrary, almost-certainly unknown input data;strcpy
operates entirely on data already within your programâs address space and (hopefully) already known to be a valid, NUL-terminated string of a known length. Yes, it is very possible (easy, even) to screw that up and end up with arbitrary badness, but itâs a lot easier to get right than ensuring that whatever bytesgets
pulls in are going to contain a linefeed within the expected number of bytes (the only way I can think of offhand for usinggets
safely would involve a dup-ing a pipe or socketpair or something you created yourself to your own stdin and writing known data into it).(This is not to say that
strcpy
is great, nor to negate the point of the article that it can and quite arguably should be replaced bymemcpy
in most cases. But itâs not as grossly broken asgets
.)Okay, I might have exaggerated there. :^)
strcpy
is suitable for some situations â copying static strings, or strings that are otherwise of a known length. In the latter case,memcpy
is better. In the former case, I actually thinkstrcpy
is fine, even though this article argues against it. I would expect a modern compiler to optimize those copies tomemcpy
anyway.gets
is basically totally unusable in all situations. Somebody doing something bizarre like you mentioned should probably rethink their approachâŚ