1. 1

    I’m curious as to why Base::setName is not pure virtual. That would mean that if you forgot to override it you’d get a compile-time error. Instead of the current situation where if you forget to override it (or attempt to override it but make a mistake) you don’t find out until you get an assertion failure at run-time.

    Oh, and GCC also supports -Woverloaded-virtual for anybody who would like to have this warning but isn’t using clang.

    1. 1

      I can’t go back far enough in our repository to see if there is a reason for not being an abstract base class, and I cannot open the previous version control system from here (visual sourcesafe) to look back even further.

      Thanks for the tip on GCC!

    1. 46

      One of my responsibilities at a previous job was running Coverity static analysis on a huge C codebase and following up on issues. It wouldn’t be uncommon to check a new library (not Curl) of 100k lines of code, and find 1000 memory issues. The vast majority would be pretty harmless – 1-byte buffer overflows and such – but then there were always some doozies that could easily lead to RCEs that we’d have to fix. At the point where the vast majority of people working in a language are messing up on a regular basis, it’s the language’s fault, not the people.

      1. 22

        For anyone wondering about setting up static analysis for your own codebase, some things to know!

        • Static analysis, unlike dynamic analysis, is analysis performed on source code, and encompasses numerous individual analysis techniques. Some of them are control-flow based, some of them are looking for the presence of concerning patterns or use of unsafe functions. As an example, most engines doing checks for null-pointer dereferences are basing their analysis on a control flow graph of the program, looking for places where a potential null assignment could flow to a dereference.
        • Static analysis is a conservative analysis, meaning it may have false positives. The false positive rate can also be impacted by coding style. That said, most static analyzers are configurable, and you can and should (especially early on) filter and prioritize findings which your current coding practices may flag disproportionately.
        • Many static analysis tools will give you severity ratings for the findings. These are not the same as CVSS scores (Common Vulnerability Scoring System). They are based on a general expectation for a category of weakness a finding falls into (for example, they might say “CWE-476 (null pointer dereference) is often critical, so we’ll call anything that matches this CWE a ‘critical’ finding.’” These ratings say nothing about whether something is critical in your application, and you can have completely inconsequential findings rated critical, while actually critical vulnerabilities sit in the “low” category.
        • Additionally, understand that these categories, which are often given on a four-level scale in the popular tools, are defined along two axes: expected likelihood of exploitation and expected severity of exploitation. High likelihood + high severity = critical, low likelihood + high severity = high, high likelihood + low severity = medium, low likelihood + low severity = low.
        • The nature of the false positive rates means you’re much better off having a regular or continuous practice of static analysis during which you can tune configuration and burn down findings, to increase signal and value over time.
        • If you have too many findings to handle, you may be tempted to sample. If you do, use standard statistical techniques to make sure your sample size is large enough, and your sample is representative of the population. You may also consider whether the sample is representative of a subset of the population’s CWEs which you consider high priority (you may, for example, choose the regularly-updated CWE Top 25 list, or you may choose something like the OWASP top-10 mapped to CWE).

        Hope this helps someone who is interested in setting up a software assurance practice using static analysis!

        1. 12

          Conversations arguing over the counterfactual of whether using or not using C would have helped are less interesting than acknowledging, whatever language you’re using, there are software assurance techniques you start doing today to increase confidence in your code!

          The thing you do to increase confidence doesn’t have to be changing languages. Changing languages can help (and obviously for anyone who’s read my other stuff, I am a big fan of Rust), but it’s also often a big step (I recommend doing it incrementally, or in new components rather than rewrites). So do other stuff! Do static analysis! Strengthen your linting! Do dynamic analysis! More testing! Formal methods for the things that need it! Really, just start.

          1. 5

            Excellent point. Coverity is a really, really good tool; I do wish there were an open-source equivalent so more people could learn about static analysis.

        2. 5

          The vast majority would be pretty harmless – 1-byte buffer overflows and such

          Be careful with that. Overflowing a buffer even by a single byte can be exploitable.

        1. 7
          test -f FILE && source $_ || echo "FILE does not exist" >&2
          

          This doesn’t work if the filename has a space or an asterisk in it. You need to surround $_ with double-quotes. https://www.shellcheck.net/ is your friend.

          1. 1

            assuming it was added based on @Diti ‘s comment above, it would work for the intended shell, zsh since zsh doesn’t expand unquoted variables the same. I agree the author should probably update this.

          1. 19
            [ $USER != "root" ] && echo You must be root && exit 1
            

            I’ve always felt a bit uneasy about this one. I mean, what if echo fails? :-)

            So I usually do

            [ $USER != "root" ] && { echo You must be root; exit 1; }
            

            instead… just to be safe.

            1. 10

              Indeed, echo can fail. Redirecting stdout to /dev/full is probably the easiest way to make this happen but a named pipe can be used if more control is required. The sentence from the article “The echo command always exists with 0” is untrue (in addition to containing a typo).

              1. 3

                Don’t you need set +e; before echo, just to be extra safe?

                1. 3

                  I had to look that up. set +e disables the -e option:

                            -e      Exit immediately if a simple command (see SHELL  GRAMMAR
                                    above) exits with a non-zero status
                  

                  That’s not enabled by default, though, and I personally don’t use it.

                  1. 1

                    Or &&true at the end, if it’s okay for this command to fail. EDIT: see replies

                    It’s as much of a kludge as any other, and I’m not sure how to save the return value of a command here, but bash -ec 'false && true; echo $?' will return 0 and not exit from failure. EDIT: it echoes 1 (saving the return value), see replies for why.

                    1. 2

                      You probably mean || true. But yeah, that works!

                      1. 1

                        I did mean || true, but in the process of questioning what was going on I learned that && true appears to also prevent exit from -e and save the return value!

                        E.G.,

                        #!/bin/bash -e
                        f(){
                        return 3
                        }
                        f && true ; echo $?
                        

                        Echoes 3. I used a function and return to prove it isn’t simply a generic 1 from failure (as false would provide). Adding -x will also show you more of what’s going on.

                  2. 2

                    I personally use the following formatting, which flips the logic, uses a builtin, and printd to stderr.

                    [ "${USER}" == "root" ] || {printf "%s\n" "User must be 'root'" 1>&2; exit 1; }

                    When I start doing a larger amount of checks, I wrap the command group within a function, which turns into the following, and can optionally set the exit code.

                    die() { printf "%s\n" "${1}" 1>&2; exit ${2:-1}; }
                    ...
                    [ "${USER}" == "root" ] || die "User must be 'root'"
                    
                    1. 2

                      I also always print to standard out, but I’m pretty sure most shells have echo as a built-in. The form I usually use is

                      err() { echo "$1" 1>&2; exit 1; }
                      
                  1. 3

                    This line isn’t right:

                    this.statusRegister.negative = u8(result !== 0);
                    

                    It should be checking result & 0x80 or something.

                    1. 2

                      I’ve been running a Mozilla DXR instance for our internal code. Does anyone have experience with both? What are the advantages of sourcegraph over DXR?

                      1. 1

                        I’ve also been running a Mozilla DXR instance. I’ve been very happy with it. Disclaimer: I have been a contributor to DXR in the past.

                        I only have minimal experience with Sourcegraph. Sourcegraph does fairly well in my opinion. The only annoying thing that I notice missing is “Find declarations”. You can search for references and it looks like any declarations are in that list but there is no easy way to find the declaration(s) separately.

                        The main problem with DXR is that it has no future. Development has been abandoned. Any development effort had migrated to SearchFox. DXR was explicitly designed to be able to index arbitrary code but it appears that SearchFox may be designed only to index Firefox. I’ve never tried to use it so I don’t know how easy it would be to get your own custom code indexed by a SearchFox instance. With the recent layoffs at Mozilla I doubt even SearchFox is going to be getting much work done on it. DXR only works with ElasticSearch 1.7.x and not newer versions which is becoming increasingly difficult to deal with.

                        Sourcegraph has two different ways to index your C++ code: lsif-cpp and lsif-clang, with the latter being the newer, recommended option. The lsif-cpp indexer is based on the DXR clang plugin. Compare https://github.com/sourcegraph/lsif-cpp/blob/master/clang/dxr-index.cpp with https://github.com/mozilla/dxr/blob/master/dxr/plugins/clang/dxr-index.cpp.

                        Sourcegraph has support for a lot more languages than DXR so if you’re using something other than Python, Javascript, Rust or C++ it will probably provide a better experience.

                        If you want to see what using Sourcegraph is like, they have a version at https://sourcegraph.com/search that indexes a bunch of public repos from GitHub. They have the DXR GitHub repo indexed so we can search within that.

                        For example, here are all the places where the string ->get appears in C++ files

                        And here are all the references to the function getFileInfo (look in the bottom frame)

                        1. 1

                          Thanks for the explanation! I had a closer look and it seems pretty good. If I ever have to setup a code searching tool again it will probably be sourcegraph. Our current setup still runs on Ubuntu 16.04 which will lose support in 2021. I remember trying to get DXR running on Ubuntu 20.04 but it was too much of a pain due to dependencies on old software (like the old Elasticsearch). The only potential issue with sourcegraph is that multi-branch indexing is still experimental and we will need that. At the moment I think Mozilla’s future is too uncertain to invest much time in searchfox.

                      1. 3

                        You can restrict the commits searched to initial commits with array_length(parent). Playing with that column, apparently some commits manage to have dozens of parents. I don’t even know how you manage that.

                        select lower(trim(message)) as message, count(*)
                        from bigquery-public-data.github_repos.commits
                        where (array_length(parent)=0)
                        group by message
                        order by count(*) desc
                        limit 100;
                        

                        First few values:

                        message                                                            f0_
                        initial commit                                                     1957292
                        first commit                                                       151151
                        init                                                               39375
                        initial commit.                                                    36600
                        initial                                                            17894
                        initial import                                                     14737
                        create readme.md                                                   11510
                        init commit                                                        9692
                        update license.md                                                  6606
                        first                                                              6034
                        first commit.                                                      5688
                        initial version                                                    5325
                        create license.md                                                  3968
                                                                                           3908
                        inital commit                                                      3854
                        initial import.                                                    3459
                        create gh-pages branch via github                                  3371
                        initial release                                                    3348
                        initial checkin                                                    3194
                        initial commit to add default .gitignore and .gitattribute files.  2967
                        initial revision                                                   2676
                        :boom::camel: added .gitattributes & .gitignore files              2200
                        :neckbeard: added .gitattributes & .gitignore files                2198
                        first version                                                      2193
                        :octocat: added .gitattributes & .gitignore files                  2159
                        :space_invader: added .gitattributes & .gitignore files            2154
                        :confetti_ball: added .gitattributes & .gitignore files            2154
                        init project                                                       2150
                        :tada: added .gitattributes & .gitignore files                     2139
                        :circus_tent: added .gitattributes & .gitignore files              2134
                        :lollipop: added .gitattributes & .gitignore files                 2079
                        
                        1. 3

                          apparently some commits manage to have dozens of parents. I don’t even know how you manage that.

                          This is called an “octopus merge” if you want to search for more information. You may also be interested in this article about octopus merges in the Linux kernel

                          1. 2

                            FYI I just updated my article to include the array_length(parent)=0 filter. Thanks for the input!

                            1. 2

                              You’re welcome :)

                              Why did you do this? AND LENGTH(TRIM(LOWER(message))) > 0 Surely the empty commit message is still a valid commit message?

                              1. 1

                                Hehe yes it was to filter out the empty commits. You’re technically right, but I was more interested in the actual text in the initial commit messages, although like you mention it is worthy to note that empty messages are up there. However, it’s less clear that the empty commit messages are actually initial commit messages since they could come from detached head states with no parents.

                                Also, in case you’re curious, I just wrote a post using a similar method to try and answer What % Of Git Commit Messages Use The Imperative Mood?.

                            2. 2

                              Haha just reading this - and it’s funny you mention this because I realized the same way of identifying initial commits in a discussion on the Reddit thread yesterday. I think this way is certainly more accurate than the method I used (just looking through the top counts and picking ones that looked like initial commit messages). I will most likely update my post to reflect this method.

                            1. 11

                              Not everything is UTF-8, but it should be.

                              1. 3

                                Do you think should file paths on Unix systems be UTF-8 instead of simply being encoding-agnostic byte sequences terminated by 0x00 and delimited by 0x2F? I can see it both ways personally. As it stands, file paths are not text, but they’re nearly always treated as text. All text definitely should be UTF-8, but are file paths text? Should they be text?

                                1. 28

                                  Paths consist of segments of file names. File names should be names. Names should be text. Text represented as a sequence of bytes must have a specified encoding, otherwise it’s not text. Now the only question left is: which encoding should we use? Let’s just go with UTF-8 for compatibility with other software.

                                  I would actually put further restrictions on that:

                                  • file names should consist of printable characters — what good is a name if the characters it’s made of cannot be displayed?
                                  • file names shouldn’t be allowed to span multiple lines — multiline file names will only cause confusion and will often be harder to parse (not just for humans, but also for CLI programs)

                                  As it is now in Unix, file names aren’t for humans. And neither are they for scripts. They’re for… file systems.

                                  1. 6

                                    I agree with you about those restrictions. In some sense, Windows has succeeded in this area, where Unix has failed. In Unix:

                                    • File names can begin with a hyphen, which creates ambiguity over whether it is a command-line flag or an actual file (prompting the convention of -- separating flags from file arguments).
                                    • File names can contain newlines, which creates almost unsolvable problems with most Unix tools.

                                    In Windows, however (source):

                                    • File names cannot contain forward slashes, and thus cannot be confused with command-line flags (which begin with a slash).
                                    • File names cannot contain line feeds or carriage returns. All characters in the range 0-31 are forbidden.
                                    • File names cannot contain double quotation marks, which means you can very easily parse quoted file names.

                                    Of course, both allow spaces in file names, which creates problems on both systems. If only shell/DOS used commas or something to separate command-line arguments instead of spaces…

                                    1. 1

                                      Windows also doesn’t allow files named NUL, PRN, or CON. :D

                                    2. 5

                                      There is a long essay by David A. Wheeler about problems that are caused by weird filenames and what we might do to fix them. Including suggestions on possible restrictions that operating systems might impose on valid filenames. Prohibiting control characters (including newline) is one of the items on the list. Scroll down to the very bottom to see it.

                                      1. 2

                                        Ideally you want to bring reasonable naming capabilities to folks from the world of non-latin character sets. That’s a really good driver to go beyond existing C-Strings and other “Os String” encodings.

                                        But when you say “UTF-8, but printable”, it’s not UTF-8 anymore. Also, what’s a “line”? 80 characters? 80 bytes? Everything that doesn’t contain a newline? Mh. Allowing UTF-8 will bring some issues with Right-To-Left override characters and files named “txt.lol.exe” on certain operating systems.

                                        It’s tough, isn’t it? :-)

                                        1. 7

                                          Also, what’s a “line”? 80 characters? 80 bytes?

                                          Anything that doesn’t contain a newline. The point is that filenames with newlines in them break shell tools, GUI tools don’t allow you to create filenames with newlines in them anyway, and very few tools other than GNU ls have a reasonable way to present them.

                                          Lots of stuff doesn’t allow you to include the ASCII control plane. Windows already bans the control plane from file names. DNS host names aren’t allowed to contain control characters (bare “numbers and letters and hyphen” names certainly don’t, and since domain registrars use a whitelist for extended characters, I doubt you could register a punycode domain with control characters either). The URL standard requires control plane characters to be percent encoded.

                                          1. 1

                                            \r\n or \n? ;) You get my point?

                                            1. 3
                                              1. If it includes \r\n, then it includes \n.

                                              2. If the goal is to avoid breaking your platform’s own default shell, then the answer should be “whatever that shell uses.”

                                    3. 3

                                      “Is” and “ought”, however, remain dangerous things to confuse.

                                    1. 8

                                      I feel like those who are young enough might not realize that these are puns based on celebrity names:

                                      https://en.wikipedia.org/wiki/Ernie_Kovacs

                                      https://en.wikipedia.org/wiki/Kim_Novak

                                      combined with

                                      https://en.wikipedia.org/wiki/VAX

                                      1. 4

                                        I found this article a bit disappointing. It has plenty of opinions disguised as facts, like “My point here is that constructors must not be used for fallible constructions.”. In fact the standard imposes no such requirement and you’re free to do this if you wish. It may or may not be a good idea. There is probably an argument to be made as to why this should be avoided but the author doesn’t really attempt it other than “Exceptions in C++ are heavy”. Which is not really that convincing here since any time this exception would be thrown indicates a logic error in your program and would most likely terminate the entire process anyway.

                                        For an article that starts off by talking about how to enforce a non-null std::unique_ptr, the complete lack of any mention of gsl::not_null is striking. And GSL’s not_null and Expects show us that there is another way to handle precondition failures other than the ones described in this article: call std::terminate. That may seem extreme at first but when you consider that any place where this may happen indicates a logic error in the program which usually cannot be reasonably recovered from, it makes some sense. And if you want to customize this you can always use std::set_terminate.

                                        There are some times when using a static member function to do construction makes sense. But there’s no mention of the downsides of such a choice. The type is now harder to use generically. If you have some generic function like template <typename T> T make(...) { return T(...); } you can’t use it with T=NonZero because NonZero has no public constructor. And it means that each person trying to use your type has to remember the name from_u32 in order to construct one. This is not a serious problem if you have one or two such types in your codebase but if you make every type behave like this it becomes a lot of extra names to remember.

                                        1. 2

                                          There are C compilers from Keil targeting 8051, and ARM.

                                          1. 9

                                            Oh cool, a new language from Microsoft that helps us write correct code.

                                            Throws Verona on the pile with P, P#, IVy, Dafny, F*, Midori, Koka, and Lean

                                            (I kid, I kid. Microsoft does a lot of great work in this field, but a lot of their experiments end up not working out.)

                                            1. 10

                                              But you have a point nevertheless…

                                              • PHP -> Asp
                                              • Java -> C#
                                              • JVM -> .net
                                              • Google -> Bing
                                              • /your/filesystem/path -> \your\filesystem\path (hehe, I know, I know… but it’s funny)
                                              • netscape -> explorer
                                              • ODF -> OOXML
                                              • {Your functional} -> F# …
                                              • Rust -> Ver[m]ona

                                              EDIT (I almost forgot)

                                              • GNU Linux -> WSL

                                              It’s just the Microsoft way

                                              1. 2

                                                Huh. Never thought about it before but does Microsoft (research) have a Lisp?

                                                1. 9

                                                  As a matter of fact, yes! Microsoft Lisp was a LISP interpreter in the 1980s for DOS.

                                                  As I recall, it was slower than BASIC, had a terrible editor, and was missing basic stuff like being able to parse 'x as (quote x)

                                                2. 1

                                                  I think ASP (1996) predates PHP (1997).

                                                  Both were ultimately a really bad idea though.

                                                  1. 12

                                                    PHP was a genius idea in its time, because it enabled extremely cheap shared hosting.

                                                    If you had a low-traffic site, your options around 2002 were either a PHP host for under $5/mo, or the cheapest available colocated host at around $200/mo.

                                                    It wasn’t a better language, but it was a much better setup experience with a smoother on-ramp that cost less to host (which echoes, to me, modern criticisms of golang).

                                                    1. 3

                                                      PHP/FI first appeared in 1994/1995, and although the “FI” part bears little resemblance to the “modern” PHP3 that showed up in 1997 (of which I think it’s fair to say PHP7 has much more in common with than PHP/FI), several critical ideas showed up here that undoubtedly influenced the design and development ASP.

                                                      1. 1

                                                        ASP was a good idea (glue to stick together existing application code and render its output to the web) that everybody ignored and just tried to create entire applications with. It was never meant to do that, which is why it was so terrible at doing so for the first 5 or so years of its life

                                                        1. 1

                                                          Why were they bad ideas?

                                                      2. 5

                                                        What’s the definition of “working out” ? P looks like it’s still active:

                                                        https://github.com/p-org/P

                                                        I remember it had this success story but I don’t know the details very well. I’m not a Windows user so I haven’t been motivated to look.

                                                        https://blogs.msdn.microsoft.com/b8/2011/08/22/building-robust-usb-3-0-support/

                                                        It sounds like this Verona project is very early, but I think they framed the problem “right”. I would like something like Rust that works better with legacy code. Rewriting code for only of security purposes (as opposed to functionality) is a waste and splits the ecosystem / developer effort. It’s the most expensive way to achieve that goal.

                                                        And empirically speaking I don’t think it happens… People talk about it but don’t do it on major systems because of the effort involved. (e.g. speaking as someone who has been “rewriting” bash for nearly 4 years, with Oil. Of course Oil isn’t motivated only by security..)

                                                        For example, I would be surprised if Firefox is >90% Rust within 10 years. Or even 20 years. It makes more sense to split the application up and rewrite certain parts of it in a safe language. That is, follow the Verona strategy (if it is feasible, which is a big “if”, and something that’s a worthy target of research).

                                                        1. 1

                                                          Re P. From Microsoft’s site:

                                                          “P got its start in Microsoft software development when it was used to ship the USB 3.0 drivers in Windows 8.1 and Windows Phone. These drivers handle one of the most important peripherals in the Windows ecosystem and run on hundreds of millions of devices today. P enabled the detection and debugging of hundreds of race conditions and Heisenbugs early on in the design of the drivers, and is now extensively used for driver development in Windows.”

                                                        2. 1

                                                          Don’t forget about Spec# and Sing#

                                                          1. 1

                                                            My understanding is that F* code (EverCrypt in particular) shipped in Windows kernel.

                                                          1. 2

                                                            OP here. About the workaround for the missing pipefail, I’d like to ask Lobsters for their opinion: do you see any drawback with this approach?

                                                            1. 2

                                                              I can see the following:

                                                              • If you run the script from a shell, this shell will print “Terminated” at the end. At least it does for me, using bash. This will depend on your shell. This might be unexpected to the person running your program.
                                                              • After the script exits, the other parts of the pipeline may still be running in the background. To see what I mean, change your sed -e s/3/5/ into something like perl -e "sleep 2; print <>". Run the script and it will print “Terminated” and return you to your prompt. Then 2 seconds later the output shows up on your terminal. You may be able to work around this and the previous point by sending SIGINT instead of SIGTERM.
                                                              • The value of $? in the shell that launched the script is going to be strange (130, 143, etc). The actual exit code from the failing subprocess is lost.
                                                              • A program can use WIFEXITED and WIFSIGNALED to determine whether a subprocess exited cleanly or was killed by a signal. Your script will show up as having been killed by a signal (because it was), not merely exiting with a non-zero exit status. If the program treats these two situations differently then you’ll end up with the wrong behavior.

                                                              See also https://groups.google.com/forum/#!topic/comp.unix.shell/UHX-bBndq7k which is a scary looking monster.

                                                            1. 5

                                                              There is SWEET16 but that was used for actual serious work so maybe it doesn’t meet your criteria.

                                                              There are plenty of One Instruction Set Computers to choose from. Subleq seems to be the design most often discussed.

                                                              I wanted to link to some documentation of The Pinky Processor but all I could find was a post that briefly mentions it. As far as I remember the interesting thing about the design was that it was addressing granularity was not based on bytes like we’re used to but was instead done by bits. You could specify any bit in memory with a single address and there were no alignment requirements so you could put a 4-bit field followed immediately by a 13-bit field, followed immediately by a 5-bit field, and so on. If anybody can find the original specification document I’d love to see it.

                                                              Edited to add: The Gray-1: a computer made from just memory chips.

                                                              1. 2

                                                                Each and every link has brought a smile; I especially like the Gray-1 computer.

                                                              1. 4

                                                                we keep the VM alive until you log out or until your normal build time limit has elapsed

                                                                Immediately on logout? There should be like a couple minute window after logout when you can log back in, because connections can drop, accidental Ctrl-D in the wrong window can happen, etc.

                                                                soon you’ll be just a few keystrokes away from an ARM or PowerPC shell, too

                                                                Let me guess, EC2 and IntegriCloud? :)

                                                                1. 16

                                                                  Aye, I’ll improve upon this over time.

                                                                  Regarding EC2 and IntegriCloud, no - sr.ht is run entirely on owned hardware in a private rack in a colocated datacenter. I don’t put your data in the hands of megacorps.

                                                                  1. 2

                                                                    I don’t put your data in the hands of megacorps.

                                                                    I’m impressed.

                                                                  2. 1

                                                                    There should be like a couple minute window after logout when you can log back in, because connections can drop, accidental Ctrl-D in the wrong window can happen, etc.

                                                                    Not just for this exact scenario, but you can use SSH Sockets for this purpose!

                                                                    1. 1

                                                                      I’ve had that enabled for years. Not sure how it would help when the TCP connection gets dropped because of network problems.

                                                                    2. 1

                                                                      accidental Ctrl-D in the wrong window can happen

                                                                      Set IGNOREEOF and you can (mostly) avoid that problem.

                                                                      1. 0

                                                                        You can always rebuild there, yeah?

                                                                      1. 1
                                                                        #define strscpy(dst, src, len)  \
                                                                            do {                        \
                                                                                memset(dst, 0, len);    \
                                                                                strlcpy(dst, src, len); \
                                                                            while (0);
                                                                        

                                                                        How’s this? I bet there’s still a one-off bug somewhere.

                                                                        1. 3

                                                                          not that it matters, but memset(3) will return dst, so you could (maybe not should) also do

                                                                          #define strscpy(dst, src, len) \
                                                                          	strlcpy(memset(dst, 0, len), src, len)
                                                                          
                                                                          1. 2

                                                                            Still has the problem of evaluating len twice.

                                                                            For clarity’s sake, a better approach here would be to implement strscpy as a (potentially inline) function rather than a macro. The types of all the arguments are known and there’s no preprocessor trickery going on.

                                                                          2. 2

                                                                            Probably just a typo, but drop the semicolon after while (0). Having it defeats the purpose of wrapping your code in a do {} while loop in the first place.

                                                                            1. 1

                                                                              You’re right that it’s a typo, but it doesn’t break anything, as far as I see. It would just equality valid to add or to omit a semicolon in the real code.

                                                                              1. 10

                                                                                The whole point of using do { ... } while (0) is to handle the case where adding a semicolon in the real code is not valid. Consider the calling code

                                                                                if (a)
                                                                                    macro();
                                                                                else
                                                                                    foo();
                                                                                

                                                                                If you define your macro as #define macro() do { ... } while (0) then this works fine. But if you define it as do { ... } while (0); then this expands to

                                                                                if (a)
                                                                                    do { ... } while (0);;  /* note two semicolons here */
                                                                                else
                                                                                    foo();
                                                                                

                                                                                That extra semicolon counts as an extra empty statement between the body of the if and the else. You can’t have two statements in the body of an if (without wrapping things with curly braces) so the compiler will refuse to compile this. Probably complaining that the else has no preceding if. This is the same reason why plain curly braces don’t work properly in a macro.

                                                                            2. 2

                                                                              How do you detect truncation?

                                                                              strlcpy will also attempt to evaluate strlen(src), meaning that if src is malformed, you will read memory that should not be read, and you will waste time evaluating it in every case.

                                                                              ssize_t strscpy(char *dst, const char *src, size_t len)
                                                                              {
                                                                              	size_t nleft = len;
                                                                              	size_t res = 0;
                                                                              
                                                                              	/* Copy as many bytes as will fit. */
                                                                              	while (nleft != 0) {
                                                                              		dst[res] = src[res];
                                                                              		if (src[res] == '\0')
                                                                              			return res;
                                                                              		res++;
                                                                              		nleft--;
                                                                              	}
                                                                              
                                                                              	/* Not enough room in dst, set NUL and return error. */
                                                                              	if (res != 0)
                                                                              		dst[res - 1] = '\0';
                                                                              	return -E2BIG;
                                                                              }
                                                                              
                                                                              1. 1
                                                                                char *dir, pname[PATH_MAX];
                                                                                if (strlcpy(pname, dir, sizeof(pname)) >= sizeof(pname))
                                                                                    goto toolong;
                                                                                
                                                                            1. 1

                                                                              I think you meant “Raymond Chen” as opposed to “Raymond Wong”.

                                                                              1. 3

                                                                                That is a very interesting bug. When reading the title I thought it would be yet another random quoting problem, but using the wrong comment chars is an interesting case.

                                                                                Also: shellcheck is awesome, use it more!

                                                                                1. 1

                                                                                  I wonder where the // for comments come from?

                                                                                  Edit apparently PHP, C#, and others - http://www.spiceforms.com/blog/list-comment-syntax-programming-languages/

                                                                                  1. 4

                                                                                    Another “C-ism” that many people use in shell scripts is if [ $foo == "foo" ]. == is not the comparison operator for [; = is.

                                                                                    Most (all?) modern [ implementations have implement == as an (usually undocumented) feature, so it’ll work just fine. But it’s often a good sign that someone hasn’t actually bothered to learn shell scripting, and that scripts they write need an extra careful review.

                                                                                    I bet that’s basically what happened here: someone didn’t actually bother to learn shell scripting.

                                                                                    1. 1

                                                                                      The thing that most people also not seem to know is that [ is actually a program, namely the “test” program, and not shell syntax.

                                                                                      1. 3

                                                                                        The thing that most people also not seem to know is that [ is actually a program

                                                                                        To be fair, that is a really weird shell “feature”.

                                                                                    2. 3

                                                                                      Modern languages that use // for comments likely borrowed them from C++. And C++ took them from BCPL

                                                                                      1. 1

                                                                                        Thanks. I used to have a decent grasp of this kind of stuff but recently it’s slipped my mind.

                                                                                  1. 6

                                                                                    Something that seems fundamental to this, and that I never seem to see people talking about, is that C and C++ must use an external build system. In languages like Rust, C#, Go, etc there are modules with a particular structure that the compiler can investigate the program and construct dependencies between files without human help, and even find what external libraries to use. You can’t do this in C because there are no modules, just header files. And header files are subject to the whims of macros, ifdefs, and so on, plus everyone organizes them more or less as they feel fit. So there’s no possible way to go from a compiler loading main.c to the compiler knowing what else to build and link.

                                                                                    You could try to dictate a module system via convention, something like “if you include libfoo.h anywhere the program should get linked with libfoo.a”. But the time to make that sort of decision was probably 1975 or so; now you couldn’t do it without breaking nearly every existing program.

                                                                                    This is basically why single-header libraries exist, why make/ninja is always necessary, and so on. Contrast with Rust, where even big and complex programs like Servo are built basically just with Cargo. There’s nothing special about Rust either, besides that it’s had the chance to start fresh and learn from the mistakes of others.

                                                                                    1. 3

                                                                                      You could try to dictate a module system via convention, something like “if you include libfoo.h anywhere the program should get linked with libfoo.a”. But the time to make that sort of decision was probably 1975 or so; now you couldn’t do it without breaking nearly every existing program.

                                                                                      MSVC has something sorta-kinda like this. You can put #pragma comment(lib, "foo") in libfoo.h. Any file that includes this header will then tell the linker to all try to pull in the “foo” library. Of course the linker still has to be able to find that library somehow. If that library ships with MSVC then it isn’t too much of a problem. But if it is a third-party library you might need to specify the directory where it is located which means it no longer feels like it just works automatically.

                                                                                      I’m not sure why this would break every existing program. I’m sure there would be some problems. For instance where a program includes a header file from a library and only wants to use some macros from that header so the program doesn’t bother linking against the actual library. And maybe the actual library isn’t available in the linker’s search path. In that case then adding a pragma like the above to the header file would mean that the program would now fail to link. But this seems like it would be a rare case.

                                                                                      1. 2

                                                                                        Don’t forget that when C was invented, the goal was to provide a language more comfortable than assembly. Languages like rust, go, … had more time to look at all the alternatives, think about the problems existing elsewhere and find good improvements over it. During that time, C tools could only be patched and upgraded to improve usability. I think it was done well because tools like make that were invented afterward but still integrates well to the building process, and can still compete well with the new tools/features from the new languages.

                                                                                        1. 1

                                                                                          Oh, certainly. In context, C can’t be blamed for this… looking at languages made around similar times, such as Pascal or BCPL, they do basically the same thing. (I thought BLISS was on this list too, but apparently not…) It wasn’t until a decade later that it becomes common that you get things like Ada, Modula-2, Common Lisp, and other languages that make modules an integral part of the language.

                                                                                      1. 1

                                                                                        Alas, I’ve lost the link, but there’s also a cool hack that replaces a Nintendo cartridge with a “dynamic” rom to achieve graphics beyond the reach of the original hardware.

                                                                                            1. 1

                                                                                              Yes! I didn’t remember it seeing it on YouTube, but now I realize why: I saw it in person at deconstruct.