1. 35
  1. 7

    Bash can also employ a kind of function pointer, which can greatly simplify command-line argument parsing. Consider the following simplified example:

    ARG_ARCH="amd64"
    
    ARGUMENTS+=(
      "ARG_ARCH,a,arch,Target operating system architecture (${ARG_ARCH})"
      "log=utile_log,V,verbose,Log messages while processing"
    )
    
    utile_log() { echo "$1" }
    
    noop()      { return 1 }
    
    log=noop
    

    By creating a comma-delimited list of command-line arguments, the parsing logic and control flow can be influenced from a single location in the code base. The trick is to eval-uate “log=utile_log” when the “-V” command-line argument is provided (or assign ARG_ARCH to the user-supplied value after “-a”). Using the $log variable invokes the function, such as in:

    preprocess() {
      $log "Preprocess"
    }
    

    If “-V” isn’t supplied, then every invocation of $log simply returns without printing a message. The upshot is a reduction in conditional statements. Function pointers FTW. I wrote about this technique in my Typesetting Markdown series.

    Here’s a rudimentary implementation and example usage to wet your whistle:

    1. 7

      There’s already a no-op builtin: :

      test_var_fnref (){
        log=noop
        noop()      { return 0; }
        for x in {1..100000}; do
          $log "Preprocess"
        done
      }
      
      test_var_builtinref (){
        log=:
        noop()      { return 0; }
        for x in {1..100000}; do
          $log "Preprocess"
        done
      }
      

      It’ll save you the function call overhead when you aren’t printing the messages:

      $ test_var_fnref
      Sun Oct 31 2021 19:07:49 (1.15s)
      
      $ test_var_builtinref
      Sun Oct 31 2021 19:07:55 (553ms)
      

      If it’s a hot loop, it’s also worth knowing that aliases get ~inlined at definition time, so you can do something like:

      shopt -s expand_aliases
      alias log=:
      
      test_alias_ref(){
        for x in {1..100000}; do
          log "Preprocess"
        done
      }
      

      And wring a little more performance out:

      $ declare -f test_alias_ref
      test_alias_ref () 
      { 
          for x in {1..100000};
          do
              : "Preprocess";
          done
      }
      
      $ test_alias_ref
      Sun Oct 31 2021 19:08:04 (543ms)
      
    2. 4

      Well, it means that: 1. Variables are lexically scoped.

      No, it doesn’t mean that. Callees still see their caller’s environment when using subshells. If you replace all curlies with parentheses in the dynamic.sh example, it will still work exactly the same. But that should be no surprise, since dynamic.sh already has local a declarations and they don’t stop its func3 from working either.

      What the use of a subshell for a function does is that absolutely every variable is automatically local – implicitly. The visibility of variable names does not change (which it would have to, for scope to become lexical). Only the duration that assignments remain effective for changes: at function exit, all assignments are unwound.

      Subfunctions work just like you would expect, too

      Well, yeah. That’s because blindly guessing that local foo is how you localise a function named foo doesn’t mean that it is. After all, setting foo=1 after declaring foo () { ... } doesn’t make it impossible to call the function. In fact functions live in a completely separate namespace from variables. Localising a variable by the same name as a function simply does nothing.

      (Prior to Shellshock there may have been some mistaken basis to the guess based on the fact that declaring a foo function would set the foo variable. This is no longer the case, but even when it was, setting the variable only served to inherit functions to subshells. It never was how functions were looked up to run them; again, they live in a completely separate namespace.)

      1. 2

        That actually seems great. Does anybody see any drawback (besides the overhead of starting a subshell) with using this tip?

        1. 9

          Forks are slow so starting a subshell is not an insignificant cost. It also makes it impossible to return values besides an exit status back from a function.

          Zsh has “private” variables which are lexically scoped. ksh93 also switched to lexical scoping instead of dynamic scoping but note that in ksh, you need to use function name { syntax instead of name() { to get local variables.

          1. 9

            Also, in zsh you can just use always to solve the problem in the article:

            {
                 foo
            } always {
                 cleanup stuff
            }
            
            1. 3

              Every time I learn a new thing about zsh, I’m struck by how practical the feature is and how amazing it is that I didn’t know about said feature the past dozen times I really, really needed it. I looked around the internet for documentation of this, and I found:

          2. 2

            A guy on the orange site timed subshell functions to take roughly twice as long.

          3. 2

            A lot of those things are not even bashisms. I’ve recently been writing some POSIX shell scripts and learning the same thing as the author of this: shell scripts are a lot more powerful than I’d previously thought. In my case, it’s probably a result of coming from DOS and thinking of shell scripts as a slightly more advanced form of batch file, rather than as a programming language.

            That said, given the relative size of bash and lua, I’d be more inclined to consider lua if I had sufficiently complex logic that I wanted something that looked more like a program than a batch file. Lua 5.3 is around 240K. In an interactive build, it’s linked against around 300K of readline (or a smaller amount of libedit). Bash is 1.2M. In exchange for the much smaller binary size, Lua gives you a rich ecosystem of packages.

            1. 1

              I tend to use functions when I need to execute within the current shell scope. If you’re going out of your way to use a subshell it’s just easier to write a shell script instead. Do you have a reason to pick a function using subshells over a shell script?

              1. 5

                A separate script is a much more complete boundary. This of course has its pros (better defined interface) but it also has its cons:

                • You need to explicitly pass all arguments.
                  • Sometimes it is nice to just have the log level you parsed automatically be available to the subshell.
                  • You need to serialize and deserialize any data you need.
                • A lot more overhead. Fork is pretty cheap compared to starting a fresh interpreter and executing a new script.
                • For small scripts it can be easier just to see the script inline than wonder what the alternate script is doing.
              2. 1

                The way I’m used to using functions, you give the function some input, and it supplies some output, which I then use in my code. The only way to get that output in bash is by mutating a global variable. So, this whole exercise is kind of pointless, since I can’t mutate a variable from a subshell.