1. 14
  1. 4

    You can skip the coproc if you use a global variable as your mechanism of returning values.

    I usually use $REPLY since bash read uses that variable to store the result if no variable name was given.

    So:

    if ! emoji=$(short-code-emoji "$code_accum" "$cldr_file"); then 
        printf 'ERROR: Unable to get emoji :%s:\n' "$code_accum" >&2
        return 1
    fi
    printf '%s' "$emoji"
    parsing_code='false'
    continue
    

    Would become:

    if ! short-code-emoji "$code_accum" "$cldr_file"; then 
        printf 'ERROR: Unable to get emoji :%s:\n' "$code_accum" >&2
        return 1
    fi
    printf '%s' "$REPLY"
    parsing_code='false'
    continue
    

    They you could just modify your global hash map of memoizied results.

    1. 2

      I like your use of the REPLY variable to store the return value. This seems like a good approach if you are okay with storing the state in a global.

      1. 4

        Well you are already storing the state of your coproc’s file descriptors in a global array including its pid. On top of that, you can have only one coproc at a time.

        I mean if maximum performance and the need to bring back data from child processes is key, the only hope left is to just break out eval as shown in @abathur’s example, though I probably would have the caller run it, not that what was posted was not sound or anything. Bash and globals are life, much like awk and other ancient unixy things.

        In pure honesty, if maximum performance is the goal here, restructuring this whole snippet to basically be a jq/awk script would be the best way, or even batching the emojis you need to fetch so jq is invoked once.

        1. 3

          That is a good point about the coproc file descriptors being global, I wish that wasn’t necessary. I agree that for maximum performance there are better solutions then using a coproc, but I think they present a nice balance of allowing for memoization, encapsulation of logic, and good ergonomics with typical Bash programming which uses command substitutions. Thanks for reminding me about the single coproc limitation, I am going to add a note about that to the blog post. There was also some recent movement around possibly having that limitation removed, https://mail.gnu.org/archive/html/help-bash/2021-03/msg00207.html.

    2. 3

      Unpopular opinion: by the time a shell script has reached this level of structure, it’s time to rewrite it in a different programming language that more naturally supports structured code, data types, etc.

      1. 3

        Not unpopular.

      2. 2

        Huh, I had no idea that coproc even exists.

        1. 1

          It is indeed an obscure, ¡but fun!, feature of shells, I had never heard of it either until I went hunting for different strategies to memoize Bash function calls. Ultimately the coproc is syntactic sugar around functionality that could be implemented with named pipes using mkfifo. As I mentioned in the post, Stéphane Chazelas has a great post detailing the support in various shells, https://unix.stackexchange.com/a/86372/83704.

        2. 2

          Author here, would love to hear other folks approaches to memoizing in a shell such as Bash.

          1. 3

            I’m not aware of a great approach once you need to invoke it in a subshell.

            There are more options if you can re-cut the code to just invoke the function directly. This is often the case if you just need to print; less likely if you absolutely need it in a variable. For example, you can use eval to define a new function that’ll just re-print the value on-demand later:

            #!/usr/bin/env bash
            
            function rando {
            	printf -v argstr "_%q_" "$@"
            	if ! type "rando:$argstr" &> /dev/null; then
            		out="$RANDOM"
            		eval "rando:$argstr(){ echo '$out'; }"
            	fi
            	"rando:$argstr"
            }
            
            rando butter bubbles
            rando butter bubbles
            

            It’s technically a bit of a variation on this, but I’ve also had some ~idiomatic luck with using https://github.com/bashup/events for some cases like this.

            It’ll let you specify an event callback, and check if one is set. A callback can be any command, so if the handler isn’t already set, you can set one that uses echo/printf, and then it’ll print each time you fire the event. This comes with some overhead (not great for loops that must be as tight as possible…), but it also creates a simple affordance for registering many different ~cached values from different places in the code, but then print them all back out by firing a single event. (I don’t, unfortunately, have a terribly easy-to-follow public example; if you’re curious you might be able to work it out starting from a more complex one at https://github.com/abathur/shell-hag/blob/master/hag.bash#L497)

            1. 1

              That is super interesting approach, defining a new function for every call with different arguments, though I don’t love how it pollutes the global function namespace. I have never heard of https://github.com/bashup/events, though it looks interesting on an initial glance. I’ll have to spend some time reading its code to understand how it works, thanks!

              1. 1

                I don’t love how it pollutes the global function namespace.

                I prefixed the function names here with rando: for exactly this reason.

                I have never heard of https://github.com/bashup/events, though it looks interesting on an initial glance. I’ll have to spend some time reading its code to understand how it works, thanks!

                It isn’t widely used. I do vaguely intend to blog about it at some point to help boost a bit (in the broader context of work I’ve been doing to make nixpkgs a good home/ecosystem for bash/shell projects, and some bash ~profile modules I’ve been picking at).

                It isn’t magic, but I think the overhead is usually a fair trade when the event-oriented abstractions/idioms are a natural fit for a task/domain. That’s very abstract, but some examples I’ve found: internal to a modular project, simplifying ~collaboration between separate projects, lazy setup/init, and user/run-time extensibility.

          2. 1

            Forking is always a speed penalty with bash. I need to use this more in loops instead of forking a bunch in my inner loop.