1. 19
    1. 12

      This is really cool, and a great way to actually justify those crazy one-liners. I have a bunch of them in my own dotfiles I should break apart and explain, if for nothing else, to remember what I have!


      One big thing I want to point out: This only works for Github, and not anything else. What about Gitlab? Or even aliases for modules on different domains (like all of the Kubernetes libraries being on Github but the official package being k8s.io/kubernetes). The blog post / title suggests that it’ll go through all of the dependencies, but then specifically filters for only github.com.

      This particular concern of mine isn’t super important for this blog post, but it does implicitly reinforce a sentiment I’ve seen elsewhere on the internet: Github is the only place that code lives. Again, I don’t want to distract too much from the original post, but I feel like I need to push back on that sentiment so that we don’t forget it.

      1. 3

        Maybe github is the only place where unmaintained code lives.

    2. 11

      It feels unfair to use the last commit date on a repo as a proxy for whether or not that repo is maintained. Can a project not be just “finished” without also necessarily being “abandoned”?

      1. 4

        +1 - maybe as an initial read of “this hasn’t been touched in 5 years, lemme check the issues/PRs/MRs on the project to see if there’s anything of concern” but I would definitely say that a repo that’s not seen updates doesn’t necessarily mean it’s dead

      2. 2

        Sadly, there are some companies (I will name no names) who do believe that such projects are no longer acceptable for use (I name no names).

      3. 1

        I agree with your sentiment, however, this one liner can be used as a starting point for further exploration.

        Given the hoopla in the industry around “supply chain security”, engineering teams are increasingly pressured to scrutinize their use of OSS. Abandoned projects are considered especially problematic.

    3. 4

      pretty cool! I hope the author doesn’t mind having the UUOC pointed out :) https://en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_cat

      might be easier to use awk for some of the sed chain as well. certainly is an interesting idea about having updated insights into your Go dependencies. Sometimes I feel like go.mod just balloons and is ignored, this definitely reminds me to take another look through some of my projects. Thank you!

      1. 3

        Eh. The shell is an interactive REPL in which you generate results by successively passing one command’s output to a subsequent command as input. cat filename is a totally normal and valid initial stage for any pipeline, and cat filename | grep foo is a totally normal and valid expression of a two-stage pipeline. More importantly, it represents the true “nature” of the expression, which isn’t really captured by the (possibly more efficient) grep foo < filename. And I can’t get behind the notion that efficiency is what determines the uselessness of cat 😉

        1. 1

          sure that makes sense, logically reading the pipeline it’s always good to start with ‘where am I getting this data’, that does seem intuitive. I just want to point out the more relevant example down below the one you quoted, where
          cat file | grep pattern

          can instead be written as

          grep pattern file

          again not a big deal, it’s just fun to call them out since people have been guilty of the UUOC pattern for decades now. maybe says more about how the tools are introduced than the people learning!

          1. 2

            In the same way, people have been guilty of often inappropriately calling out this supposed crime of the UUOC pattern for decades too :-)

            I agree with peterbourgon in that the use of cat filename as the first component of a pipeline makes sense for a lot of reasons, including the one he describes, but also (for me) the use of it to be a placeholder, with a file of cached or sample data, for the eventual real data retrieval part, which might be an HTTP request or other expensive operation.

        2. 1

          It’s a shame that < is the opposite way around to > and | in a shell. In these scripts, cat is used simply because < is not the same way around as the other pipeline operators. It also highlights the asymmetry. | and > define what the thing at the target of the redirection is (program or file), whereas < defines what the source is.

          The > operator is comfortable shorthand but without it you’d probably have pipelines that looked like:

          $ cat input_file | tool | otherTool | fwrite output_file
          

          In some ways, it would be a lot cleaner to have this structure, where | is the only operator, but with some syntax in the shell for differentiating between files and commands.

          1. 1

            You can write < input_file tool | … tho it looks weird.

            Redirection operators are general purpose tools for manipulating the file descriptor table. The < and > operators are a combination of open(2) and dup2(2) and | is pipe(2) - there’s no syntax for dup-like functionality with |.

            1. 2

              Redirection operators are general purpose tools for manipulating the file descriptor table

              Well, kind of. There are some extensions to modify arbitrary file descriptors, but I don’t think they’re standard. And they’re limited to open and pipe, so (unless you’re on Plan 9) you need external tools to do things like redirect the output to a socket.

              The < and > operators are a combination of open(2) and dup2(2) and | is pipe(2) - there’s no syntax for dup-like functionality with |.

              These (at least the first two) predate dup, let alone dup2. Originally, UNIX had a small array of file descriptors per process and found a free one by doing a linear scan over this array until it found the first free one. The shell took advantage of this by closing stdin / stdout and then opening the new file over the top. Because these were file descriptors 0 and 1 (I think stderr was actually a later addition, but I would have to look it up to be sure), they were guaranteed to be found first, so redirection was trivial to implement. AT&T UNIX v3 introduced both dup and and pipe, presumably because pipe returned two file descriptors and there was no way of guaranteeing the place of the second one without clobbering the file descriptor before it, so you needed to call pipe, dup, and close for pipe.

              By the time dup2 was introduced (AT&T UNIX v7, the one Tanenbaum regarded as perfect), there was no need to maintain the guarantee that file descriptors were allocated in order, but it was sufficiently entrenched that removing the guarantee would break things (including, but not only, old versions of the shell). POSIX standardised this behaviour from the start and so all subsequent *NIX systems have been stuck with it.

              I’d still like to see a process creation API that let me specify explicitly populate the child’s file descriptor table, rather than taking a copy of mine and manipulating it.

              1. 2

                [Warning: idle speculation and half formed ideas ahead!] WRT spawning processes, I think the right api is to create an empty process then endow it with whatever resources it needs to get started (ie, fork is wrong, but so is clone, so is spawn, so is CreateProcess, etc.). It would be nice if the calls for granting resources to another process could also be used for debugging / introspection. Might even work for some kinds of RPC? I dunno if it would make sense to require all calls to take a target process handle, even when a process is operating on itself. But it could enable things like per-thread descriptor tables, distinct from a process-wide shared descriptor table. The io_uring descriptor judo is heading that way, tho it is per ring rather than per thread.

                1. 1

                  I think the right api is to create an empty process then endow it with whatever resources it needs to get started

                  That’s what Windows does. It’s quite tricky to get right with permissions models, particularly when you start adding something like CHERI or MPK (or similar schemes - Apple has an MPK-like extension) that adds a within-address-space protection model. In particular, one of the things that you want to do within an empty process is map memory.

                  On Windows, all of the system calls that modify a process take a HANDLE to the target process (which is usually a magic value meaning ‘the current process’) and so you can do things like the equivalent of mmap a file or a shared memory object into the target. The worst calls, from the perspective of sub-process isolation / memory safety is WriteProcessMemory, which is basically a memcpy from the current process to the specified process (this trivially bypasses memory safety because the specified process can be the current process).

                  On Windows, the code that implements CreateProcess[Ex] can run entirely in userspace, creating a new process, mapping system DLLs and the binary into it, mapping some memory to use as a stack, copying configuration information into the process, and creating a new thread that runs with the newly created stack. That’s great, but it means that the parent process then has the ability to completely tamper with the process. I believe on more recent versions of Windows there’s a service that actually does this for you so that you don’t hold a handle to the target process that has these rights, and that’s essential for being able to do things like signed binaries and guarantee that they’re running in anything like a valid process environment. You definitely wouldn’t want to have an equivalent of setuid, for example, if a parent process held a handle that let it inject data and memory mappings into the setuid binary.

                  It would be nice if the calls for granting resources to another process could also be used for debugging / introspection.

                  This is, indeed, what Windows looks like, but I don’t really like it because of the permissions issue. Debugging is an inherently privileged operation. When you attach a debugger to a process, you are doing something that is allowed to arbitrarily tamper with program state. With something like CHERI or the memory key-like behaviour Apple implements with permission indirection, you can enforce policies within an application that say ‘this private key is accessible only via this small bit of my TLS library that signs a session key’, for example. When you attach a debugger, you’re exercising a permission that can violate that security policy and you’re willing to do so in the interest of making your code actually work. In contrast, I don’t want a process to be able to violate my process’ security policies just because it spawned me.

                  On both macOS, for example, there are flags that you can set in the header for a binary that disallows debug interfaces from connecting to a process. You can use this with MAC-framework labels to create files on the filesystem that can be accessed only by a specific binary (or, more usefully, by a binary with a particular label conveyed via code signing). This lets you implement things like the Keychain daemon and restrict access to encryption keys to that process. Juniper did something similar with a port of NetBSD’s veriexec framework and the FreeBSD MAC framework.

                  I can imagine this being somewhat nice if integrated with Capsicum such that you had a set of rights via a process descriptor and could gradually reduce them. Especially if some things implicitly removed rights. For example, execve would remove the rights to modify the process unless the binary header explicitly opted in to keeping them (which you might want in some cases, for example to allow the parent to add and remove file descriptors from your file descriptor table as a faster way of sending file descriptors via UNIX domain sockets).

                  But it could enable things like per-thread descriptor tables, distinct from a process-wide shared descriptor table

                  With clone (Linux) and rfork (*BSD), you can have per-thread file descriptor tables. They’re a bit annoying though because you typically want a subset of file descriptors to be shared and others to be global. With Mach, you always have a task port that can be used to set up other ports, which makes this kind of thing possible (threads on Mach are tasks that share an address space with a parent task, file descriptor tables are a userspace abstraction over a set of port handles and so can be shared or not shared).

              2. 1

                ksh added file descriptor numbers on the left of redirection operators, and most of ksh is in posix https://pubs.opengroup.org/onlinepubs/007908799/xcu/chap2.html#tag_001_007

                The Bourne shell only has dup not dup2 https://man.freebsd.org/cgi/man.cgi?query=sh&manpath=Unix+Seventh+Edition

    4. 3

      Sorting by “last commit” is what’s probably missing most to identify the oldest (and most likely unmaintained) projects.

    5. 3

      Extra points for using asciinema 👍