1. 21
  1. 9

    Under the section “Seconds since the epoch”, there is this trick:

    secs=$((`TZ=GMT0 date \
    +"((%Y-1600)*365+(%Y-1600)/4-(%Y-1600)/100+(%Y-1600)/400+1%j-1000-135140)\
    *86400+(1%H-100)*3600+(1%M-100)*60+(1%S-100)"`))
    

    This produces incorrect results when the current date falls in a leap year. For example, we are in a leap right now, so try this:

    $ secs=$((`TZ=GMT0 date +"((%Y-1600)*365+(%Y-1600)/4-(%Y-1600)/100+(%Y-1600)/400+1%j-1000-135140)*86400+(1%H-100)*3600+(1%M-100)*60+(1%S-100)"`)); echo "$secs"; python3 -c "import time; print(int(time.time()))"
    1588670315
    1588583915
    

    The result from the sh trick is off by 86400 seconds (24 hours). This issue occurs because the leap year calculation in the trick above is overcounting the number of days when we are in a leap year. For example, if the current date is 1972-01-01 00:00:00 UTC, the time that has elapsed since Unix epoch is exactly 730 days (63072000 seconds). This can be calculated in a straightforward manner: 365 x 2 = 730. However, the shell snippet above sees that 1972 is a leap year already and adds an additional day for it when it should not when we are still in 1972. That logic would work fine for 1973-01-01 00:00:00 UTC, i.e., when the year 1972 is completely over but not when we are still in 1972.

    Having understood the cause of the issue, here is how we can fix the trick, so that it does not overcount a day for the leap year when we are still within the leap year:

    secs=$((`TZ=GMT0 date \
    +"((%Y-1600)*365+(%Y-1601)/4-(%Y-1601)/100+(%Y-1601)/400+1%j-1000-135140)\
    *86400+(1%H-100)*3600+(1%M-100)*60+(1%S-100)"`))
    

    Here is a test that shows that the issue is now fixed:

    $ secs=$((`TZ=GMT0 date +"((%Y-1600)*365+(%Y-1601)/4-(%Y-1601)/100+(%Y-1601)/400+1%j-1000-135140)*86400+(1%H-100)*3600+(1%M-100)*60+(1%S-100)"`)); echo "$secs"; python3 -c "import time; print(int(time.time()))"
    1588583977
    1588583977
    

    Having fixed that, I wonder if it is worth playing with fire like that. I have always used the much simpler awk-based trick like this:

    secs=$(awk "BEGIN{srand(); print srand()}")
    

    This produces the correct result. For example, see this output:

    $ awk "BEGIN{srand(); print srand()}"; python3 -c "import time; print(int(time.time()))"
    1588584040
    1588584040
    

    From the awk specification in POSIX.1-2008:

    srand([expr])
    Set the seed value for rand to expr or use the time of day if expr is omitted. The previous seed value shall be returned.

    I don’t know if the above specification is sufficient enough to rely on awk’s srand() function to get the Unix epoch timestamp. I guess it depends on how an implementation interprets the phrase “the time of day”. On all Unix or Linux systems I have worked with so far, this is indeed interpreted as Unix epoch timestamp.

    1. 2

      The following is not safe:

      var=$(dirname "$f")

      I…I had no idea! I was pretty sure I was safe because I rely so heavily on shellcheck.

      1. 1

        Has anyone ever actually seen a file with a newline in the name?

        I mean this breaks with spaces at the end too but that also seems pretty oddball naming

        1. 4

          Has anyone ever actually seen a file with a newline in the name?

          Yes. It was in the home directory of an entirely non-technical user – I don’t think save dialogs and other such file-naming GUI bits would generally allow that to hapen, so I’m not really sure how it could have arisen (perhaps an email attachment or web download with an externally-supplied filename?), but one day there it was, breaking things that assumed it wouldn’t be there…

          1. 3

            I don’t write a ton of code, but when I must, I make a deliberate choice not to try to catch every conceivable corner case. Instead, I put my focus on explicitly catching and dealing with the most likely ways that things might go wrong, and sanitize data only where it absolutely must be sanitized. (And further, there is something to be said as well for maintaining a sane data set, but that’s a can of worms for another day.)

            Sometimes this irritates my co-workers. For example, when writing to a file in Python, there are a bunch of things that can go wrong. But I don’t need to check for all of them, because Python has exceptions built in and it will tell you when something unexpected happened by throwing an exception. Wtih an informative traceback and everything. It’s all built right into the language! All the programmer really has to do is make sure that data is not destroyed when Something Bad happens. And maybe post a “whoops, something went wrong message” if the program is being executed by a non-technical user, at the very worst.

            1. 2

              What you are noting is that there is a cost in readability, maintenance, and likelihood of mistakes in dotting every single i.

              Sometimes, that cost might be worth paying. Sometimes, the cure is worse than the disease.

            2. 3

              Has anyone ever actually seen a file with a newline in the name?

              On purpose? No. By accident? Yes.

              1. 1

                Probably, but this only breaks with a newline at the end, which seems way less likely to me.

                1. 1

                  well technically var=$(dirname "$f") will break with whitespace anywhere, because the call isn’t quoted. But var="$(dirname "$f")" would only break with it at the end, yes.

                  1. 3

                    It won’t, actually; word splitting and globbing don’t apply to variable assignments. It doesn’t apply to word in case word in either.

                    1. 1

                      Huh, learn something new every day.