1. 35
  1.  

  2. 9

    Yes, this is also a good page about the issue, and related issues:

    https://dwheeler.com/essays/filenames-in-shell.html

    As some might know, https://www.oilshell.org/ is very much about proper parsing and string safety.

    Code and data are kept separate. For example:

    • Shell arithmetic expansion conflates data and code, but Oil statically parses arithmetic expressions instead.
    • Shellshock was also about the conflation of data and code, with the export -f misfeature that serialized a function to a string!

    However I don’t really have answer to this flag vs. arg issue (flag being code, and arg being data), other than “use -- everywhere”.

    Relatedly, in Oil, echo is no longer a special case because it supports -- (if you use bin/oil rather than bin/osh). So you can do echo -- $x, which is a safe version of echo $x.

    • In POSIX shell you have to use printf to echo an arbitrary string, although not many shell scripts follow that discipline!
    • You can’t use echo -- $x in POSIX shell because that would output 2 dashes also :)

    If anyone has a better idea than “use -- everywhere”, let me know :)

    I guess you write shell in two modes: the quick and dirty throwaway, and then Oil is supposed to allow you to upgrade that to a “real” program. So there could be a lint tool that warns about -- or just auto-refactors the script for you. Or maybe there is some option that breaks with shell more radically.

    1. 4

      If you ever write a script that operates on untrusted files, always make sure the command will do exactly the thing you wanted it to do.

      The problem is that when you ask yourself “does this do exactly what I want it to?”, you don’t have the imagination to come up with something like “filenames that look like flags will be interpreted as one.”

      Someone who would make a safer, saner shell to write programs with would be a hero.

      1. 3

        I started some work on a safer, more explicit shell, realizing that the fundamental offering of any shell is the ability to just type the name of a function and arguments unadorned. I called this “quotation”.

        However, after thinking about it more, I realized that no solution will solve the dissonance inherent to any language like that. You will always be tripping up and compromising over where something is or is not quoted. All templating languages have this problem.

        Instead, I’m currently of the opinion that an approach more like PowerShell, in which you call functions, not write the names of programs and arguments as text, is the right way forward. This removes the problem of quotation. The downside to this approach is that it requires work to produce the APIs. It’s fine if you have a large standard library, as PowerShell does, but being able to pull a binary off the shelf e.g. one written in e.g. Python or C should still be natural.

        The missing part therefore, in my opinion, is that programs (in any system be it Linux, OS X, Windows, BSD), ought to be accompanied by a schema (could be in JSON, doesn’t matter), let’s say git and git.schema, which can be interpreted offline or “cold” – without running the program (very important) –in order to know (1) arguments/flags the program accepts (as commands or switches), (2) the types/formats of those inputs, (3) possibly list the side-effects of those commands.

        This allows a shell or IDE to provide a very strong completion and type-checking checking story, and to provide it out of the box. A projectional editor would be satisfying here, too (even something as simple as CLIM’s listener).

        When downloading a random binary online, you could additionally download the schema for it. The schema file itself can contain a SHA256 of the binary that it’s talking about, to avoid accidental misuse. Currently if you want completion for an exe, you have to generate some bash. So it’s clear that the need is already there, it’s just implemented in a poorly done way.

        The upside to this approach is that it’s additive, no one has to change their existing software. Additionally, it’s easy to produce for old software; You can make a parser for --help or man pages to generate a “best guess” schema for a program. The reason you wouldn’t put this into a shell by default would be because some programs don’t accept --help and/or they run side effects and delete things. Existing opt parser libraries can generate such schemas, just like some of them can generate bash at the moment.

        Another upside is that it can simply be a standard/RFC published freely, just like JSON.

        I haven’t moved forward on this schema idea yet, because it’s hard to research existing solutions (hard to google for). It would be an FFI standard but instead of calling C functions you’re calling processes.

        1. 2

          Yeah I agree with your diagnosis of the issue. You need a schema and a parser to go along with every command.

          I say a parser because in general every command has its own syntax, in addition to semantics (the types of the arguments). Some commands recognize fully recursive languages in their args like find, test, and expr. (Although of course there are common conventions.)

          It relates pretty strongly to this idea about shell-agnostic autocompletion we’ve been kicking around. Every command needs a parser and a schema there too.

          https://github.com/oilshell/oil/wiki/Shellac-Protocol-Proposal

          And yes it’s nice to have the property that it’s a pure addition. It’s basically like TypeScript or MyPy (which I’m using in Oil). They handle a lot of messiness to provide you with a smooth upgrade path.

          If you’d like to pursue these ideas I think Oil will help because it has an accurate and flexible shell parser :) I break shell completion into two parts: (1) completing the shell language, then (2) completing the language of each individual tool. That’s the only way to do it correctly AFAICT. bash conflates the two problems which leads to a lot of upstream hacks.

        2. 1

          That’s really interesting, especially thanks for the dwheeler page.

          I solve shell problems by using shellcheck all the time, it catches most mistakes one can make and it itegrates nicely into existing editors.

          Oil looks really interesting and is certainly better than the shell+shellcheck combo, but I don’t think I want to write everything in new syntax that is not universal and might mean I’ll have to rewrite it in bash later anyway.

          1. 3

            Well Oil is very bash compatible – it runs your existing shell/bash scripts. You can use it as another correctness tool on top of ShellCheck. It catches problems at runtime in addition at parse time.

            Example: Oil’s Stricter Semantics Solve Real Problems

            There are a bunch of other examples I should write about too.

            If your shell scripts are used for years, and keep getting enhanced, chances are that you will run into the limitations of bash. And you won’t want to rewrite the whole thing in another language. That’s what Oil is for :) It’s fine to stick with bash now, but as you use shell more and more, you will run into a lot of limitations.

          2. 1

            Yes, this is also a good page about the issue, and related issues:

            I disagree with all items marked as “#WRONG” in this site. Clean globbing is too powerful and beautiful to complexify with these mundane problems. Filenames are variable names, not data, and you get to chose them. What is actually wrong is the existence of files with crazy names. This should be solved at the filesystem level by disallowing e.g. the space character on a filename (no need to bother the user, it could be translated simply into a non-breaking space character, and nobody would notice except shell scripts).

            1. 3

              Filenames are variable names, not data, and you get to chose them

              The problem is you don’t always; if I write a script and you run it, then the author of the script didn’t choose the filenames.

              What is actually wrong is the existence of files with crazy names. This should be solved at the filesystem level

              That is more or less the same conclusion the author makes: https://dwheeler.com/essays/fixing-unix-linux-filenames.html

            2. 1

              a better idea than “use – everywhere”

              An environment variable indicating the providence of each argument, and sending patches to getopt, argp, coreutils, bash, and your favorite unix programs.

            3. 4

              How about disallowing problematic filenames? I.e., filenames can’t start with dashes, and can’t contain asterisks. While we’re at it, skip newlines.

              Edit: Oh my, dwheeler has an essay on this, too: https://dwheeler.com/essays/fixing-unix-linux-filenames.html Any idea whether the Linux kernel has gained some mechanisms as suggested in “13. What to do?” in the meantime?

              1. 4

                Actually I thought of a dirt simple solution. We could have an option safeglob or no_glob_dash in shell, which prevents the result of a glob from starting with -. If it sees a file like -rf, then it’s a runtime error.

                It’s not a foolproof thing, because you can still confuse code and data with tar xvf (no dash), but it would stop 99% of problems. And it would not impact working shell scripts too badly.

                I’m not sure if it should be on by default though. Or I suppose it could be even smarter and check if a command is preceded with -- and turn it off? So mv -- * dest/ would still work with files with -?

                1. 2

                  How about prefixing every expanded file with ./ (unless it starts with /)? feels super blunt and probably not 100% foolproof either, but maybe better than worse? or is it also mentioned on the dwheeler page already as wrong? to my surprise, the dwheeler’s article seems to actually support such an approach! other than that, I guess that the only way to solve this fully would be probably if typing was supported by the shell and all cli tools… (a la powershell?) edit: maybe there could be also a helper “files(…)” (“paths”?) function, which would prefix each item of a list with ./, unless it already starts with / or ./? kinda poor man’s last resort type declaration by the script writer…

                  1. 2

                    Yeah, the Wheeler page mentioned that shell authors could use ./* instead of *.

                    So it would be possible to treat * like ./* in this safe mode – so filenames would be returned with a leading ./.

                    On second thought, it might not be safe because I think some programs like rsync treat slashes as significant.

                    It feels a little too magic … I think having a runtime error and have the user change their program is better.

                    Usually they will just have to add a --. Having the shell detect that is heuristic too – a program could easily be written to accept flags after --, but somehow I think it’s less “magic”.

                    I’ll make a note of it on the bug though. Thanks for the feedback!

                    1. 1

                      But what if you went the opt-out way? A * could be just documented as working this way by default, people would see it immediately in any command and I’d imagine internalize quickly, still with some way to disable it if really actually needed… Or at least have an opt-in flag in “extra safe” mode? It’s a bold move so I understand you feel uneasy about it, esp. given the amount of bold moves you’re already doing - I am sure it adds up and can get nerve wracking :) Also, please note that AFAIR in rsync it’s the trailing (suffix) slash that is important, not the leading (prefix) dot-slash (i.e. it’s foo vs. foo/)… so, kinda still interested in seeing actual counterexamples ;)

                      1. 1

                        What’s wrong with the runtime error? I wrote an example on Zulip:

                        https://oilshell.zulipchat.com/#narrow/stream/121540-oil-discuss/topic/Proposal.3A.20shopt.20-s.20no_glob_dash/near/182270245

                        When the user gets the error, they can fix their script by adding --. Then their script is fixed for all shells, not just Oil!

                        In general I dislike “DWIM” and “magic”. Oil is more like Python than Perl :)

                        https://en.wikipedia.org/wiki/DWIM

                        This has to be opt-in for bin/osh, no matter what solution we choose, because of the potential breakage. But it should be opt-out for Oil. That’s another incentive to turn Oil features on!

                        1. 1

                          Thanks for discussing! Replied on github, as that’s kinda easiest for me (while not cluttering lobste.rs anymore), and also that’s where the proposed feature is actually currently described IIUC…

                2. 2

                  Well, unless the shell has knowledge of every command, it doesn’t know what’s a filename and what’s not.

                  It doesn’t even know what are flags and what are args! It’s all a set of argv arrays to a shell (except for builtins like echo or read, which is does understand).

                  For autocompletion, we add some heuristics to tell those two things apart.

                  An approximate 90% solution works for completion. I was hoping to have something better than that for security sensitive issues.

                  But I guess that already exists: Just use --. Maybe the problem is more education / lint tools than anything that’s done in the shell proper.

                  Another slight problem is that the shell doesn’t know which external commands accept -- and which don’t. I guess the only way to solve that is with heuristics, or possibly with autoconf-like feature detection.

                  1. 2

                    Well, unless the shell has knowledge of every command, it doesn’t know what’s a filename and what’s not.

                    I was thinking blocking bad filenames kernel level, not shell level.

                  2. 2

                    The problem is the existing directories. One idea is to simply use a FUSE mount that translates them to harmless characters and operate the shell script only on the fuse mounted directory. Safemount is my tiny project to do so, and so far, it seems to work.

                    1. 2

                      But why should such directories exist? I find it quite likely that if I set up a new Debian machine with a kernel option to disallow weird filenames, it would work just fine.

                      1. 2

                        When I worked with people outside the traditional hacker world, I found directories and filenames often contained spaces. These were created using the GUI (Mac and Windows especially). Hence, the problem of existing directories.

                    2. 1

                      How about disallowing problematic filenames? I.e., filenames can’t start with dashes, and can’t contain asterisks. While we’re at it, skip newlines.

                      And spaces! Plain spaces in filenames is the root of all evil.

                    3. 2

                      I wonder how much code would break if the behavior were changed so that * expanded to the same strings as ./* instead.

                      “Never write , always write ./” seems like a good lint rule for shell.

                      1. 2

                        I learned that GNU tar will behave subtly differently with ./. The ShellCheck page mentions this. Oil has a different solution:

                        http://www.oilshell.org/blog/2020/02/dashglob.html

                        http://www.oilshell.org/blog/2020/02/dashglob.html#appendix-b

                        1. 1

                          Oh lovely, that GNU tar behaviour different does not at all spark joy.

                          I’m not 100% convinced about dashglob as a solution because you can now be in the situation where you ran “rm *”, expected it to remove all the files in the current directory, but “./–rf” has not been removed and subsequent steps which assume the directory was emptied will fail.

                          Shellcheck (or just avoiding shell) for the win, I guess. <3

                          1. 1

                            That’s also true for dotfiles though. If you do rm * you will be left with .foo, and you cannot rmdir after that.

                            dashglob was partly inspired by bash’s dotglob.

                            1. 1

                              Point

                        2. 1

                          “Never write *, always write ./*”

                          I prefixed each * with a \.

                          1. 1

                            Oh! Thabks. I didn’t notice the formatting was broken. I do know how to use markdown, I am just less likely to check the result when I’m on my phone and typing is already hard . :)

                        3. 1

                          Updated: I addressed this problem in Oil (and linked this thread)

                          http://www.oilshell.org/blog/2020/02/dashglob.html

                          1. 2

                            Thanks for letting us know. This is awesome feature and I’m looking forward to release. It’s really interesting that tar handles foo.txt and ./foo.txt in a different way, I didn’t know that. It’s great that you noticed it early, it might cause a lot of confusion.