1. 4

    Nice article, I definitely think this is needed and timely. I have seen “casual” shell injections popping often. For example, on the xargs thread from 2 weeks ago, where the post specifically advocated practices that were vulnerable to shell injection (from untrusted filenames).

    In contrast, xargs can eliminate this risk if used to correctly (although as always it’s not that easy to figure out from the man page. The key is that it mostly deals with “args”, not flat strings!)

    https://lobste.rs/s/wlqveb/xargs_considered_harmful

    https://lobste.rs/s/wlqveb/xargs_considered_harmful#c_kwsxtc (comment pointing out the shell injection)

    I also remember some “minimal” CGI code in C that someone posted that has extremely obvious HTML injection. (One of my pet peeves is that XSS has an unintuitive name; HTML+JavaScript injection is a better name for it.)


    I’m beginning to think the Shell/SQL/HTML+JS injection issue injection is a hole in education. I would approach it from 2 angles: “memes” and language theory.

    For memes, xkcd made a good “Bobby Tables” one in 2007, which I reposted here:

    https://old.reddit.com/r/oilshell/comments/n9qcrp/exploits_of_a_mom_sql_injection/

    Basically every working programmer should understand why the payload is:

    • Robert’); DROP TABLE Students; --

    For shell injection, I suggested “The Ballad of Rimraf”:

    • The Ballad of ; rm -rf / (with the common misusage being os.system('mplayer %s' % filename)

    For HTML+JS injection, it could be the alert("pwned") restaurant or something. I didn’t think that one through.

    If someone can actually make comics, they should run with this :)


    From the language theory side, I also have a hard time explaining it, but I think there is an idea of creating an injection attack as sort of a proof by counterexample. You want to “prove” (or really informally reason) that your code is correct over arbitrary strings in an alphabet (filenames, URL parameters, etc.). The injection attack is the counterexample.

    There are actually 3 different ways to avoid injection attacks, which I outline here: http://www.oilshell.org/blog/2021/06/oil-language.html#educational-posts-to-support-string-safety

    And I think this language-based reasoning gets at why they all work (?) At least in my mind.


    (BTW I think it would be worthwhile to repeat the actionable advice up front, in addition to putting it at the end. It may be a lot for some readers to get through.)

    As another footnote, Joel Spolsky has a 2005 article about using a coding convention (really “Hungarian” notation) to avoid XSS. But I believe this is mostly obsolete if you’re using a template language, which didn’t really exist at the time (other than PHP which was/is unsafe by default!). It’s really for people who are manually creating HTML pages in VB or Java, which almost nobody does anymore.

    https://www.joelonsoftware.com/2005/05/11/making-wrong-code-look-wrong/

    1.  

      Most ORMs and DB libraries have automatically protected you from SQL injection for years and years now.

      But people keep writing articles insisting that libraries and ORMs are bad and everyone should instead write their own SQL by hand.

      1.  

        I don’t have any real stance on that – my issue is that IF you do drop down to SQL, then you need to know enough not to write SQL injections :) And also you shouldn’t write blog posts with obvious SQL / shell / HTML+JS injections.

    1. 1

      Why provide an exploitable API, while a safe version is possible and is more direct? I don’t know, but my guess is that it’s mostly just history. C has system, Perl’s backticks correspond directly to that, Ruby got backticks from Perl, Python just has system, node was probably influenced by all these scripting languages.

      You might want to run a command that you wrote yourself and has no user input, or that uses pipes or other actual shell features. Though spawn does have a shell parameter to give the exec-style behavior, so that’s what it should’ve just been like by default.

      1.  

        I don’t think any explanation in the “because it is useful” class works: a bunch of newer languages don’t provide this API, and people don’t complain.

        Specifically:

        • If you wrote the command yourself and it has no user input, just use spawn.
        • If you need pipes, spawn has to have the ability to connect two processes anyway.
        • If you need other shell features, well, then spawn /bin/sh -c yourself.

        I also consider the shell=true argument in node/Python to be a misfeature.

        1.  

          You never need the unsafe API to use pipelines – it can always be expressed in the safe API. Instead of:

          os.system('ls | head -n %s' % number')
          

          You can do:

          subprocess.call(['sh', '-c', 'ls | head -n "$1"', 'dummy0', str(number)])
          

          So you are basically letting the shell dynamically do the substitution rather than building up shell code yourself. This is better.

          It’s the same reason that this awk invocation is meh:

          seq 3 | awk "{ \$1 > $threshold }"  # note double quotes
          

          And this is better:

          seq 3 | awk -v threshold=$threshold '{ $1 > threshold }'  # note single quotes
          
          1.  

            thanks for the awk trick.

        1. 9

          But the problem is fundamentally unsolvable, because it’s built into Julia on a basic design level.

          Would love to read more about this! My naive understanding is that this mostly just quality of implementation issue: Julia uses LLVM to compile code during execution, and lacks a tiered JIT. Are there reasons why we can’t just add interpreter tier to Julia, beyond “someone has to do that work”?

          1. 10

            There is some debate about it here, including from a core Julia dev.

            https://news.ycombinator.com/item?id=27961251

            My (mostly uninformed) feeling is that the job of the compiler is to get rid of abstractions (e.g. monomorphizing code), and perfect optimization is a global process. So when any piece of code can be redefined at any time, that makes recompilation slow.

            Of course that doesn’t mean you can’t do some tricks, but it’s working against the grain of the problem.

            I’d also say from reading about v8 over the years, the tiers seem to become a huge tax. It’s not just “someone has to do that work”, but “every language change from now on requires more work” (from a limited group of people; it’s not orthogonal).

            I don’t think the Julia language is done evolving, so I can see that duplicating the semantics of the language in multiple places is something they would be reluctant to do. (again this is pure speculation) Hopefully there is some kind of IR that makes this less burdensome, but compilers are always messier than you’d like to think :)


            edit: I think the quote is a shorter way to explain it.

            Just as separate compilation isn’t possible for C++ template code, it’s a challenge for Julia as well.

            e.g. if all your C++ code is in templates – and there are some styles that lean that way for zero runtime cost – then C++ doesn’t have incremental compilation at all. It has plenty of duplicate compilation if you like :)

            1. 3

              It also sounds like they can improve the caching / precompilation:

              While currently precompile can only save the time spent on type-inference, in the long run it may be hoped that Julia will also save the results from later stages of compilation. If that happens, precompile will have even greater effect, and the savings will be less dependent on the balance between type-inference and other forms of code generation.

              https://julialang.org/blog/2021/01/precompile_tutorial/

              1. 1

                Yeah, caching is the first thing I thought of when I saw the “unsolvable” problem. I wonder if caching the whole heap could be an option here as it is in some Standard ML implementations, SBCL, …

              2. 3

                Given the existence of things like ghci/runhaskell that also have to compile a complex language before they start to run, I feel like it can’t be unsolvable

                1. 2

                  Are there reasons why we can’t just add interpreter tier to Julia

                  I have had this exact thought; I don’t think there is a fundamental reason why not, though it would take a tremendous amount of refactoring.

                  1. 3

                    You can already set the compilation level of Julia per function, and the lowest level is sort of an interpreter.

                    There has been some work on a fully ahead of time compiler for Julia and the core team have mentioned using more conventional JIT techniques with an interpreter level, too.

                    1. 2

                      Yeah I mean it’s not hard to see why “just go and create a second implementation of your language that retains perfect compatibility with the first” isn’t really something a lot of people want to hear.

                      1. 1

                        I imagine there’s a way too could do it incrementally with careful planning, but I don’t know enough about Julia internals to make any real statements about the level of difficulty that entails. Could be really easy for all I know

                  1. 3

                    I don’t know… any “best of Joel Spolsky” list that doesn’t have The Law of Leaky Abstractions in it is suspect, IMO!

                    1. 3

                      Yes great point! I even invoked that 2 days ago with respect to Kubernetes: the new abstractions over the OS leak, and they don’t compose with the old ones.

                      https://news.ycombinator.com/item?id=27910897

                      I think that term has reached the common parlance, so I almost forget it comes from Joel. But I do remember reading that article for the first time, and there were arguments about it!

                      Performance and security are typically the things that leak – they’re crosscutting concerns and don’t live in a particular part of the codebase. (Also related to O(n^2) bugs which tend to happen when you call “black box” functions in a loop without realizing they’re linear)

                      I will add this to the appendix at least, thanks :)

                      1. 2

                        It’s probably just “right place at the right time,” but the first time I saw the Law of Leaky Abstractions article, I was still something of a beginner software dev, and was in that spot between “can write code that mostly works” and “can write something with a good API”.

                        I think about that article constantly, even today.

                        Performance and security are typically the things that leak

                        I’d say you’re lucky if it’s only performance and security that are leaking through the abstractions you’re working with. xD

                        1. 1

                          Yup I read that article pretty much at the time I graduated from university and started working professionally with 100K+ line codebases.

                          We were definitely taught about the power of abstraction in school, and that is kind of the way my mind leaned. But instantly I had a name for the problems that (often premature) abstraction caused, and so did everyone else, because I remember seeing that phrase around a lot back then! And it’s still survived to the point that we don’t necessarily associate it with Joel.

                          I agree that many abstractions are “wrong” and you don’t even get to performance or security. But even if you get it “right” you still have those problems :)

                          The Unix way is more to keep things transparently simple than abstract them. Unix has some solid abstractions with processes and the file system, but they leak on purpose, like ioctl(), etc. Some things arguably were never fully abstracted (signals)

                      1. 1

                        Thanks, I fixed the link!

                      1. 7

                        The book The UNIX Programming Environment by Kernighan and Pike is also a great read on much the same topic. It’s expensive to buy new but I found a cheap second-hand copy.

                        The thing is, I find the argument really compelling… but it doesn’t seem to have caught on. Some people really like this style of computing - see the Plan 9 holdouts, for example - but UNIX moved away from it pretty quickly once it left Bell Labs (hence this paper, which uses BSD and SysV changes as examples). GNU is definitely not UNIX, and Windows never was.

                        It seems like the UNIX style appeals strongly to a few people, but has no mass appeal. Maybe it appeals strongly to developers whose main focus is building software systems, but less to end users - and hence less to developers whose main focus is building software for those end users? Contrast Kernighan and Pike’s paper with Jamie Zawinski’s Unity of Interface, which he described as, “An argument for why kitchen-sink user interfaces (such as web browsers, and emacs) are a good idea.”

                        1. 11

                          Modularity is great if it comes with composability. UNIX originally justified its existence with a typesetting system and the UNIX kind of modularity where you consume a text file and produce a text file was great for that. The only stage that wasn’t plain text was the set of printer control commands that were sent to the printer device (over a stream interface, which looked like a text stream if you squinted a bit). It doesn’t actually work for anything non-trivial.

                          Here’s a toy example: Consider ls -Slh. In a UNIX purist world, -l is the only one of these that you actually need in ls and actually you don’t need that because if -l is the default you can get the non-verbose output with ls -l | cut -f 9 -w and you can make that a shell alias if you use it often enough. You don’t need -S because sort can sort things, so you can do ls -l | sort -k5. Except that now you’ve gone away from the UNIX spirit a bit because now both sort and cut have the functionality to split a line of text into fields. Okay, now that you’ve done that how do you add the -h bit as a separate command? You can do it with some awk that splits the input and rewrites that one column, but now you’ve added an entire new Turing-complete language interpreter to the mix. You could, in fact, just do ls | awk and implement the whole sort part in awk as well. If you do this, you’ll discover that awk scripts compose a lot better than UNIX pipelines because they have functions that can take structured values.

                          Many years ago, I sent a small patch to OpenBSD’s du to add the -d option, which I use with around 90% of du invocations (du -h -d1 is my most common du invocation). The patch was rejected because you can implement the -d equivalent with -c and a moderately complex find command. And that’s fine in isolation, but it doesn’t then compose with the other du flags.

                          The UNIX-Haters Handbook explained the problem very well: We’ve implemented some great abstractions in computing for building reusable components: they’re functions that consume and produce rich data types. Composing systems implemented in terms of these primitives works great. Witness the huge number of C/C++ libraries, Python packages, and so on in the world. In contrast, plumbing together streams of text works well for trivial examples, can be forced to work for slightly more complex examples, and quickly becomes completely impossible for more complex ones.

                          Even some of the most successful kitchen-sink UIs have been successful because they support composition. Word and Photoshop, for example, each have rich ecosystems of extensions and plugins that add extra features and owe their success in a large part to these features. You can build image-processing pipelines in UNIX with ImageMagick and friends but adding an interactive step in the middle (e.g. point at the bit you want to extract) is painful, whereas writing a Photoshop workflow that is triggered from a selection is much easier. Successful editors, such as vim, emacs, and VS Code, are popular because of their extensions far more than their core functionality: even today, NeoVim and Vim are popular, yet nvi has only a few die-hard users.

                          1. 4

                            Yeah I agree the sort -k5 thing is annoying. Several alternative shells solve that to some extent, like PowerShell, nushell, and maybe Elvish (?). They have structured data, and you can sort by a column by naming it, not by coming up with a sort invocation that re-parses it every time (and sort is weirder than I thought.)

                            However I want a Bourne shell to have it, and that’s Oil, although it’s not done yet. I think what PowerShell etc. do has a deficiency in that it creates problems of composition. It doesn’t follow what what I’m calling the Perlis-Thompson principle (blog posts forthcoming).

                            So Oil will support an interchange format for tables, not have an in-memory representation of tables: https://github.com/oilshell/oil/wiki/TSV2-Proposal (not implemented)

                            I also think the shell / awk split is annoying, and Shell, Awk, and Make Should Be Combined.

                            1. 1

                              Unfortunately, it’s not really something that you can solve in a shell because the problem is the set of communication primitives that it builds on. PowerShell is incredibly badly named, because it’s not really a shell (a tool whose primary purpose is invoking external programs), it’s a REPL. It has the converse problem: there’s no isolation between things in PowerShell.

                              Something like D-BUS would be a reasonably solid building block. If the shell managed a private D-BUS namespace for the current session then you could write composed commands that run a bunch of programs with RPC endpoints in both directions and some policy for controlling how they go together. You could then have commands that exposed rich objects to each other.

                              ’d be quite curious to know what a shell based around the idea of exporting D-BUS objects would look like, where if you wrote foo | bar then it would invoke both with D-BUS handles in well-known locations for each, rather than file descriptors for stdin / stdout. For example, a shell might provide a /shell/pipeline/{uuid} namespace and set an environment variable with that UUID and another with the position in the pipeline for each process so that foo would expose something in /shell/pipeline/3c660423-ee16-11eb-b8cf-00155d687d03/1 and look for something exposed as /shell/pipeline/3c660423-ee16-11eb-b8cf-00155d687d03/2, expose the /0 and /3 things for input from and output to the terminal (and these could be a richer set of interfaces than a PTY, but might also provide the file descriptor to the PTY or file directly if desired). Each shell session might run its own D-BUS bus or hook into an existing one and the shell might also define builtins for defining objects from scripts or hooking in things from other places in a D-BUS system.

                            2. 4

                              Interestingly enough, OpenBSD added the -d option to du some years ago: https://cvsweb.openbsd.org/src/usr.bin/du/du.c?rev=1.26&content-type=text/x-cvsweb-markup

                              And the reason is to be compatible with other BSDs and Linux and so we’re back to the original topic ;)

                              1. 5

                                I think I submitted my patch to add -d in 2004 or 2005, so it took OpenBSD over a decade to actually add the feature. I was running a locally patched du for the entire time I was using OpenBSD. That interaction put me off contributing to OpenBSD - the reaction of ‘this feature is a waste of time, you can achieve the same thing with this complicated find expression’ and the hostility with which the message was delivered made me decide not to bother with writing any other code that would require that I interacted with the same people.

                                1. 3

                                  Sorry to hear that. In general I find it is extremely difficult to reject patches gracefully, even when you are trying to. It’s one of those things where text loses nuance that would do the job in real life, and so you have to be extra enthusiastic about it. I usually try to start on a positive statement ‘cause that’s what people will see first, something like “I love this, thanks for submitting it. But, there’s no way I can accept this because…”. If it’s a (smallish) bug fix rather than a feature addition I often try to accept it anyway, even if it kinda sucks, and then clean it up myself.

                                  I’m all for technical rigor, but if a project wants to attract contributions, it’s nice to have a reputation for being easy to work with. And it’s better for everyone if people are encouraged to submit patches without them fearing a nasty rejection. You never know when someone sending a patch is a 13 year old self-taught hacker with more enthusiasm than sense, just starting to play with getting involved in the world outside their own head.

                                  1. 2

                                    Interesting, found the original mailing list response where the “rejection” was given: https://marc.info/?l=openbsd-tech&m=115649277726459&w=2

                                    How frustrating to receive that response, “it is not being added as it is not part of POSIX”, since -d got added some years later, with the commit message explicitly acknowledging its omission from POSIX standards at the time. :\

                                    On the upside, looks like you had the right intentions all along so I send some kudos your way for trying :)

                                    1. 3

                                      How frustrating to receive that response, “it is not being added as it is not part of POSIX”, since -d got added some years later, with the commit message explicitly acknowledging its omission from POSIX standards at the time. :\

                                      To be fair, eight years is a long time and the project might have changed its mind about POSIX.

                                      I started this comment by writing that I was not sure the reasoning was inconsistent. David’s patch was rejected in part due to portability concerns. And schwarze’s commit message does mention compatibility with other BSDs and GNU as justification.

                                      But on the other hand, support for -d appeared in NetBSD around the same time as David’s patch and in FreeBSD even before that. Soooo… you’re probably right :-)

                                      1. 2

                                        haha, yeah I mean, it’s fair to stick to POSIX, but I guess in David’s case it was a matter of the person reviewing the submission being more stringent about that than the person/people who adopted the change later.

                                        Out of curiosity I checked the FreeBSD du.c commit history and found that the -d option was added in Oct 1996! Glad they kept the full commit history upon each transition to a new versioning technology (since I’ve definitely encountered projects where that history was lost). Ah well, anyway, that’s more than I expected to learn about the history of du this week! haha :)

                              2. 5

                                To me, the closest thing to the Unix philosophy today is the Go standard library. It has lots of functions with simple APIs that take io.Readers or io.Writers and lets you compose them into whatever particular tool you need. I guess this makes sense, since it’s from Rob Pike.

                                The thing about the Unix philosophy is it’s not just about “small” tools that do one thing. It’s about finding the right abstraction that covers a wider range of cases by being simpler. Power comes from having fewer features.

                                For example, file systems before Unix were more complicated and featureful. We see today that for example, iOS makes files complicated in an old school mainframe like way where some data is in the Photos library, some is locked in a particular app, some needs iTunes to be extracted, and some is actually in the Files app. This might be the right choice in terms of creating a lot of small seemingly simple interfaces instead of one actually simple interface that can do everything, but it makes it much harder to extend things in ways not envisioned by the software developers.

                                1. 3

                                  To me, the closest thing to the Unix philosophy today is the Go standard library. It has lots of functions with simple APIs that take io.Readers or io.Writers and lets you compose them into whatever particular tool you need. I guess this makes sense, since it’s from Rob Pike.

                                  Yeah, he mentioned this in “Less is Exponentially More”:

                                  Doug McIlroy, the eventual inventor of Unix pipes, wrote in 1964 (!):

                                  We should have some ways of coupling programs like garden hose–screw in another segment when it becomes necessary to massage data in another way. This is the way of IO also.

                                  That is the way of Go also. Go takes that idea and pushes it very far. It is a language of composition and coupling.

                                  It’s fun to think of Go programs like that. Like a website could be a bunch of http handler functions slotted into a mux, and an HTML template parser is wired up to the handers’ response witers. It’s nice to think like that in terms of packages too. My favorite packages are those that I can think of as a “part” that I can hook up to other parts.

                                  The thing about the Unix philosophy is it’s not just about “small” tools that do one thing. It’s about finding the right abstraction that covers a wider range of cases by being simpler. Power comes from having fewer features.

                                  Definitely. I think the “do one thing…” etc sayings kind of miss the point. And io.Writers and io.Readers are definitely that kind of abstaction. They have one method and can therefore easily cover a wide range of types. I think Go’s interfaces work that way in general too, since they grow more specific and “narrow” with more methods and the number of types that implement them shrinks.

                              1. 3

                                Here are all the questions / bikesheds that popped in my mind as I took the tour!

                                1. Why an explicit setvar keyword?

                                2. :d in the first write JSON example is surprising… and never explained later

                                3. Note that the $ before the quote does not mean “interpolation”. It’s an unfortunate syntax collision.

                                  What prevents removing this unfortunate syntax collision?

                                4. How many other shell operators are there beyond :-?
                                  And why add them since Oil has a modern expression language?

                                  I guess I miss the ?? / ?: operator.

                                5. @ for splice feels weirdly Perl. JS and Python users would know ... (spread syntax) or * (unpacking operator)

                                6. The brace expansion mini-language undermines the promise of the word/command/expression languages being comprehensive.

                                7. Why proc over fun or func or function?

                                8. Why -> over .? Neither JS nor Python use ->. Does anything outside of C and C++?

                                9. Is the () expression form specific for while and if?

                                10. Why both ! and not?

                                11. Is the () form in case different than for while and if?

                                12. 🤔 What does if (not try myproc) do then?

                                13. Oh! There are functions! (I clicked on the link in the Ruby-like Blocks section.)

                                14. For compatibility, you can also use None, True, and False.

                                  Compatibility with what?

                                15. Why ~== instead of ~=, keeping all comparison operators two characters?

                                16. append and extend go away with [1] + [2] or [1, @[2]]

                                17. :d showed up again. My guess is that it’s a symbol (ala Ruby) and that json can reach into the scope.

                                  Why isn’t json a pair of functions? Or isn’t read --json ala read --qsn?

                                Wow, the Oil language is exciting! A cleaned up shell, sign me up!

                                Please add versioning to it now, before it’s too late. 🙏🏾

                                1. 2

                                  @ for splice feels weirdly Perl. JS and Python users would know … (spread syntax) or * (unpacking operator)

                                  It feels more Lisp than Perl to me. In Perl, @foo is just the list stored in variable foo, not that list spliced into any outer context. Within a Lisp quasiquotation, on the other hand, ,@foo means to splice in the list stored in foo.

                                  1. 1

                                    Great feedback! I started a FAQ here: https://github.com/oilshell/oil/wiki/Oil-Language-FAQ

                                    And here are some more answers.

                                    1. Good point, I will explain :d, but : is sort of a pseudo-sigil that means “this is a variable name that will be modified. Instead of shell’s read x, you do read :x which I think is more clear.

                                    2. I mentioned the $'' issue here: https://www.oilshell.org/release/0.9.0/doc/warts.html

                                    It’s bash syntax. I wanted to change it to c'\n', but that turned out to be hard to parse in command mode, and it also introduces complications with multiline strings. c'foo' already means something in shell. It might be possible to get rid of, but I feel like it’s a lot of complexity and explaining.

                                    1. Oil supports all the shell operators, but I felt :- is the one that’s still useful. I somewhat agree about ?? and that is on Github somewhere. But overall I do feel like there’s value in keeping OSH+Oil small, not just Oil small.

                                    2. The @ kind of comes from shell itself with "$@", and PowerShell / Perl. We can’t use * because that means glob, and ... is going to be used for multiline commands.

                                    3. Brace expansion and globs are both part of the word language, and we don’t change it. I don’t see the problem; they’re “time tested” at this point.

                                    4. proc means “procedure” or “process”. Added to the wiki.

                                    5. Added to the FAQ.

                                    6. Yes if (x) and while (x) are special variants of the shell if and while. There might be a for () later – it would make sense but isn’t strictly necessary now.

                                    7. ! inverts an exit code while not inverts a Python-like boolean. Added to the FAQ.

                                    8. Not sure exactly what this means, but the case syntax is quirky with ;; and (glob) basically because of shell legacy. I decided not to change it too much.

                                    9. This is a syntax error. Try it in Oil! I think you might need to get to “command vs. expression mode”. For another analogy, ! is part of command mode, and not is part of expression mode.

                                    10. There should be functions, but I haven’t fully defined them yet! I think they will be built ON TOP of “procs”.

                                    11. True False None are compatible with Python. I will update the doc. Oil’s expression syntax is largely derived from Python, with a few exceptions. I’m working on an “Oil vs Python” and “Oil vs Shell” doc.

                                    12. Hm ~= instead of ~== is possible, but I didn’t want it to get confused with Perl’s regex matching operator. The negated form !~= or ~!= is sort of an awkward problem though. It could change.

                                    13. Yeah there is some argument for this, but I think it’s OK to have one form that mutates and one form that creates a new object.

                                    14. Good question, I’ve debated read --json and write --json. That’s too long to answer here, but join me on Zulip and we can talk about it :) (link on home page)

                                    What do you mean by versioning?

                                    1. 1

                                      Mate, cheers for the extensive explanations and follow-up! 🙇🏽‍♀️

                                      But overall I do feel like there’s value in keeping OSH+Oil small, not just Oil small.

                                      Oh! I think I misunderstood. Are Oil and OSH both able to be used in the same ifle?

                                      What do you mean by versioning?

                                      I mean a way to indicate what version of Oil is used in a file.

                                      shopt oil:2021 opts the file into the current dialect. But the inevitable major changes come, leave an out with shopt oil:2048.

                                      1. 1

                                        Yes good point! bash has shopt -s compat44 for compatibility with bash 4.4, etc.

                                        I wouldn’t add that for individual options, but I think adding it for the groups makes sense: https://github.com/oilshell/oil/issues/973


                                        Yes OSH and Oil are able to be used in the same file, but you’re not really supposed to. You could put shopt -- set oil:all halfway down the file if you want, and start using Oil syntax, but it’s better to keep it separated by file.

                                        They are really the same interpreter, parameterized by shopt options. That is what the “runtime” section of the doc is supposed to convey, but I can see why that is not super clear now.

                                        Great feedback, thanks!

                                  1. 38

                                    FWIW the motivation for this was apparently a comment on a thread about a review of the book “Software Engineering at Google” by Titus Winters, Tom Manshreck, and Hyrum Wright.

                                    https://lobste.rs/s/9n7aic/what_i_learned_from_software_engineering

                                    I meant to comment on that original thread, because I thought the question was misguided. Well now that I look it’s actually been deleted?

                                    Anyway the point is that is that the empirical question isn’t really actionable IMO. You could “answer it” and it still wouldn’t tell you what to do.

                                    I think you got this post exactly right – there’s no amount of empiricism that can help you. Software engineering has changed so much in the last 10 or 20 years that you can trivially invalidate any study.

                                    Yaron Minsky has a saying that “there’s no pile of sophomores high enough” that is going to prove anything about writing code. (Ironically he says that in advocacy of static typing, which I view as an extremely domain specific question.) Still I agree with his general point.


                                    This is not meant to be an insult, but when I see the names Titus Winters and Hyrum Wright, I’m less interested in the work. This is because I worked at Google for over a decade and got lots of refactoring and upgrade changelists/patches from them, as maintainer of various parts of the codebase. I think their work is extremely valuable, but it is fairly particular to Google, and in particular it’s done without domain knowledge. They are doing an extremely good job of doing what they can to improve the codebase without domain knowledge, which is inherent in their jobs, because they’re making company-wide changes.

                                    However most working engineers don’t improve code without domain knowledge, and the real improvements to code require such knowledge. You can only nibble at the edges otherwise.

                                    @peterbourgon said basically what I was going to say in the original thread – this is advice is generally good in the abstract, but it lacks context.

                                    https://lobste.rs/s/9n7aic/what_i_learned_from_software_engineering

                                    The way I learned things at Google was to look at what people who “got things done” did. They generally “break the rules” a bit. They know what matters and what doesn’t matter.

                                    Jeff Dean and Sanjay Ghewamat indeed write great code and early in my career I exchanged a few CLs with them and learned a lot. I also referenced a blog post by Paul Bucheit in The Simplest Explanation of Oil.

                                    For those who don’t know, he was creator of GMail, working on it for 3 years as a side project (and Gmail was amazing back then, faster than desktop MS Outlook, even though it’s rotted now.) He mentions in that post how he prototyped some ads with the aid of some Unix shell. (Again, ads are horrible now, a cancer on the web – back then they were useful and fast. Yes really. It’s hard to convey the difference to someone who wasn’t a web user then.)

                                    As a couple other anecdotes, I remember people a worker complaining that Guido van Rossum’s functions were too long. (Actually I somewhat agreed, but he did it in service of getting something done, and it can be fixed later.)

                                    I also remember Bram Moolenaar’s (author of Vim) Java readability review, where he basically broke all the rules and got angry at the system (for a brief time I was one of the people who picked the Python readability reviewers, so I’m familiar with this style of engineering. I had to manage some disputes between reviewers and applicants.).

                                    So you have to take all these rules with a grain of salt. These people can obviously get things done, and they all do things a little differently. They don’t always write as many tests as you’d ideally like. One of the things I tried to do as the readability reviewer was to push back against dogma and get people to relax a bit. There is value to global consistency, but there’s also value to local domain-specific knowledge. My pushing back was not really successful and Google engineering has gotten more dogmatic and sclerotic over the years. It was not fun to write code there by the time I left (over 5 years ago)


                                    So basically I think you have to look at what people build and see how they do it. I would rather read a bunch of stories like “Coders at Work” or “Masterminds of Programming” than read any empirical study.

                                    I think there should be a name for this empirical fallacy (or it probably already exists?) Another area where science has roundly failed is nutrition and preventative medicine. Maybe not for the same exact reasons, but the point is that controlled experiments are only one way of obtaining knowledge, and not the best one for many domains. They’re probably better at what Taleb calls “negative knowledge” – i.e. disproving something, which is possible and valuable. Trying to figure out how to act in the world (how to create software) is less possible. All things being equal, more testing is better, but all things aren’t always equal.

                                    Oil is probably the most rigorously tested project I’ve ever worked on, but this is because of the nature of the project, and it isn’t right for all projects as a rule. It’s probably not good if you’re trying to launch a video game platform like Stadia, etc.

                                    1. 8

                                      Anyway the point is that is that the empirical question isn’t really actionable IMO. You could “answer it” and it still wouldn’t tell you what to do.

                                      I think you got this post exactly right – there’s no amount of empiricism that can help you.

                                      This was my exact reaction when I read the original question motivating Hillel’s post.

                                      I even want to take it a step further and say: Outside a specific context, the question doesn’t make sense. You won’t be able to measure it accurately, and even if you could, there would such huge variance depending on other factors across teams where you measured it that your answer wouldn’t help you win any arguments.

                                      I think there should be a name for this empirical fallacy

                                      It seems especially to afflict the smart and educated. Having absorbed the lessons of science and the benefits of skepticism and self-doubt, you can ask of any claim “But is there a study proving it?”. It’s a powerful debate trick too. But it can often be a category error. The universe of useful knowledge is much larger than the subset that has been (or can be) tested with a random double blind study.

                                      1. 5

                                        I even want to take it a step further and say: Outside a specific context, the question doesn’t make sense. You won’t be able to measure it accurately, and even if you could, there would such huge variance depending on other factors across teams where you measured it that your answer wouldn’t help you win any arguments.

                                        It makes a lot of sense to me in my context, which is trying to convince skeptical managers that they should pay for my consulting services. But it’s intended to be used in conjunction with rhetoric, demos, case studies, testimonials, etc.

                                        It seems especially to afflict the smart and educated. Having absorbed the lessons of science and the benefits of skepticism and self-doubt, you can ask of any claim “But is there a study proving it?”. It’s a powerful debate trick too. But it can often be a category error. The universe of useful knowledge is much larger than the subset that has (or can) be tested with a random double blind study.

                                        I’d say in principle it’s Scientism, in practice it’s often an intentional sabotaging tactic.

                                        1. 1

                                          It makes a lot of sense to me in my context, which is trying to convince skeptical managers that they should pay for my consulting services. But it’s intended to be used in conjunction with rhetoric, demos, case studies, testimonials, etc.

                                          100%.

                                          I should have said: I don’t think it would help you win any arguments with someone knowledgeable. I completely agree that in the real world, where people are making decisions off rough heuristics and politics is everything, this kind of evidence could be persuasive.

                                          So a study showing that “catching bugs early saves money” functions here like a white lab coat on a doctor: it makes everyone feel safer. But what’s really happening is that they are just trusting that the doctor knows what he’s doing. Imo the other methods for establishing trust you mentioned – rhetoric, demos, case studies, testimonials, etc. – imprecise as they are, are probably more reliable signals.

                                          EDIT: Also, just to be clear, I think the right answer here, the majority of the time, is “well obviously it’s better to catch bugs early than later.”

                                          1. 2

                                            the majority of the time

                                            And in which cases is this false? Is it when the team has lots of senior engineers? Is it when the team controls both the software and the hardware? Is it when OTA updates are trivial? (Here is a knock-on effect: what if OTA updates make this assertion false, but then open up a huge can of security vulnerabilities, which overall negates any benefit that the OTA updates add?) What does a majority here mean? I mean, a majority of 55% means something very different from a majority of 99%.

                                            This is the value of empirical software study. Adding precision to assertions (such as understanding that a 55% majority is a bit pathological but a 99% majority certainly isn’t.) Diving into data and being able to understand and explore trends is also another benefit. Humans are motivated to categorize their experiences around questions they wish to answer but it’s much harder to answer questions that the human hasn’t posed yet. What if it turns out that catching bugs early or late is pretty much immaterial where the real defect rate is simply a function of experience and seniority?

                                            1. 1

                                              This is the value of empirical software study. I think empirical software study is great, and has tons of benefits. I just don’t think you can answer all questions of interest with it. The bugs question we’re discussing is one of those.

                                              And in which cases is this false? Is it when the team has lots of senior engineers? Is it when the team controls both the software and the hardware? Is it when OTA updates are trivial? (Here is a knock-on effect: what if OTA updates make this assertion false, but then open up a huge can of security vulnerabilities, which overall negates any benefit that the OTA updates add?)

                                              I mean, this is my point. There are too many factors to consider. I could add 50 more points to your bullet list.

                                              What does a majority here mean?

                                              Something like: “I find it almost impossible to think of examples from my personal experience, but understand the limits of my experience, and can imagine situations where it’s not true.” I think if it is true, it would often indicate a dysfunctional code base where validating changes out of production (via tests or other means) was incredibly expensive.

                                              What if it turns out that catching bugs early or late is pretty much immaterial where the real defect rate is simply a function of experience and seniority?

                                              One of my points is that there is no “turns out”. If you prove it one place, it won’t translate to another. It’s hard even to imagine an experimental design whose results I would give much weight to. All I can offer is my opinion that this strikes me as highly unlikely for most businesses.

                                              1. 4

                                                Why is software engineering such an outlier when we’ve been able to measure so many other things? We can measure vaccine efficacy and health outcomes (among disparate populations with different genetics, diets, culture, and life experiences), we can measure minerals in soil, we can analyze diets, heat transfer, we can even study government policy, elections, and even personality 1 though it’s messy. What makes software engineering so much more complex and context dependent than even a person’s personality?

                                                The fallacy I see here is simply that software engineers see this massive complexity in software engineering because they are software experts and believe that other fields are simpler because software engineers are not experts in those fields. Every field has huge amounts of complexity, but what gives us confidence that software engineering is so much more complex than other fields?

                                                1. 3

                                                  Why is software engineering such an outlier when we’ve been able to measure so many other things?

                                                  You can measure some things, just not all. Remember the point of discussion here is: Can you empirically investigate the claim “Finding bugs earlier saves overall time and money”? My position is basically: “This is an ill-defined question to ask at a general level.”

                                                  We can measure vaccine efficacy and health outcomes (among disparate populations with different genetics, diets, culture, and life experiences)

                                                  Yes.

                                                  we can measure minerals in soil, we can analyze diets, heat transfer,

                                                  Yes.

                                                  we can even study government policy

                                                  In some way yes, in some ways no. This is a complex situation with tons of confounds, and also a place where policy outcomes in some places won’t translate to other places. This is probably a good analog for what makes the question at hand difficult.

                                                  and even personality

                                                  Again, in some ways yes, in some ways no. With the big 5, you’re using the power of statistical aggregation to cut through things we can’t answer. Of which there are many. The empirical literature on “code review being generally helpful” seems to have a similar force. You can take disparate measures of quality, disparate studies, and aggregate to arrive at relatively reliable conclusions. It helps that we have an obvious, common sense causal theory that makes it plausible.

                                                  What makes software engineering so much more complex and context dependent than even a person’s personality?

                                                  I don’t think it is.

                                                  Every field has huge amounts of complexity, but what gives us confidence that software engineering is so much more complex than other fields?

                                                  I don’t think it is, and this is not where my argument is coming from. There are many questions in other fields equally unsuited to empirical investigation as: “Does finding bugs earlier save time and money?”

                                                  1. 2

                                                    In some way yes, in some ways no. This is a complex situation with tons of confounds, and also a place where policy outcomes in some places won’t translate to other places. This is probably a good analog for what makes the question at hand difficult.

                                                    That hasn’t stopped anyone from performing the analysis and using these analyses to implement policy. That analysis of this data is imperfect is beside the point; it still provides some amount of positive value. Software is in the data dark ages in comparison to government policy; what data driven decision has been made among software engineer teams? I don’t think we even understand whether Waterfall or Agile reduces defect rates or time to ship compared to the other.

                                                    With the big 5, you’re using the power of statistical aggregation to cut through things we can’t answer. Of which there are many. The empirical literature on “code review being generally helpful” seems to have a similar force. You can take disparate measures of quality, disparate studies, and aggregate to arrive at relatively reliable conclusions. It helps that we have an obvious, common sense causal theory that makes it plausible.

                                                    What’s stopping us from doing this with software engineering? Is it the lack of a causal theory? There are techniques to try to glean causality from statistical models. Is this not in line with your definition of “empirically”?

                                                    1. 5

                                                      That hasn’t stopped anyone from performing the analysis and using these analyses to implement policy. That analysis of this data is imperfect is beside the point; it still provides some amount of positive value.

                                                      It’s not clear to me at all that, as a whole, “empirically driven” policy has had positive value? You can point to successful cases and disasters alike. I think in practice the “science” here is at least as often used as a veneer to push through an agenda as it is to implement objectively more effective policy. Just as software methodologies are.

                                                      Is it the lack of a causal theory?

                                                      I was saying there is a causal theory for why code review is effective.

                                                      What’s stopping us from doing this with software engineering?

                                                      Again, some parts of it can be studied empirically, and should be. I’m happy to see advances there. But I don’t see the whole thing being tamed by science. The high-order bits in most situations are politics and other human stuff. You mentioned it being young… but here’s an analogy that might help with where I’m coming from. Teaching writing, especially creative writing. It’s equally ad-hoc and unscientific, despite being old. MFA programs use different methodologies and writers subscribe to different philosophies. There is some broad consensus about general things that mostly work and that most people do (workshops), but even within that there’s a lot of variation. And great books are written by people with wildly different approaches. There are a some nice efforts to leverage empiricism like Steven Pinker’s book and even software like https://hemingwayapp.com/, but systematization can only go so far.

                                                  2. 2

                                                    We can measure vaccine efficacy and health outcomes (among disparate populations with different genetics, diets, culture, and life experiences)

                                                    Good vaccine studies are pretty expensive from what I know, but they have statistical power for that reason.

                                                    Health studies are all over the map. The “pile of college sophomores” problem very much applies there as well. There are tons of studies done on Caucasians that simply don’t apply in the same way to Asians or Africans, yet some doctors use that knowledge to treat patients.

                                                    Good doctors will use local knowledge and rules of thumb, and they don’t believe every published study they see. That would honestly be impossible, as lots of them are in direct contradiction to each other. (Contradiction is a problem that science shares with apprenticeship from experts; for example IIRC we don’t even know if a high fat diet causes heart disease, which was accepted wisdom for a long time.)

                                                    https://www.nytimes.com/2016/09/13/well/eat/how-the-sugar-industry-shifted-blame-to-fat.html

                                                    I would recommend reading some books by Nassim Taleb if you want to understand the limits of acquiring knowledge through measurement and statistics (Black Swan, Antifragile, etc.). Here is one comment I made about them recently: https://news.ycombinator.com/item?id=27213384

                                                    Key point: acting in the world, i.e. decision making under risk, are fundamentally different than scientific knowledge. Tinkering and experimentation are what drive real changes in the world, not planning by academics. He calls the latter “the Soviet-Harvard school”.

                                                    The books are not well organized, but he hammers home the difference between acting in the world and knowledge over and over in many different ways. If you have to have scientific knowledge before acting, you will be extremely limited in what you can do. You will probably lose all your money in the markets too :)


                                                    Update: after Googling the term I found in my notes, I’d say “Soviet-Harvard delusion” captures the crux of the argument here. One short definition is the the (unscientific) overestimation of the reach of scientific knowledge.

                                                    https://www.grahammann.net/book-notes/antifragile-nassim-nicholas-taleb

                                                    https://medium.com/the-many/the-right-way-to-be-wrong-bc1199dbc667

                                                    https://taylorpearson.me/antifragile-book-notes/

                                                    1. 2

                                                      This sounds like empiricism. Not in the sense of “we can only know what we can measure” but in the sense of “I can only know what I can experience”. The Royal Society’s motto is “take nobody’s word for it”.

                                                      Tinkering and experimentation are what drive real changes in the world, not planning by academics.

                                                      I 100% agree but it’s not the whole picture. You need theory to compress and see further. It’s the back and forth between theory and experimentation that drives knowledge. Tinkering alone often ossifies into ritual. In programming, this has already happened.

                                                      1. 1

                                                        I agree about the back and forth, of course.

                                                        I wouldn’t agree programming has ossified into ritual. Certainly it has at Google, which has a rigid coding style, toolchain, and set of languages – and it’s probably worse at other large companies.

                                                        But I see lots of people on this site doing different things, e.g. running OpenBSD and weird hardware, weird programming languages, etc. There are also tons of smaller newer companies using different languages. Lots of enthusiasm around Rust, Zig, etc. and a notable amount of production use.

                                                        1. 1

                                                          My bad, I didn’t mean all programming has become ritual. I meant that we’ve seen instances of it.

                                                      2. 1

                                                        Good vaccine studies are pretty expensive from what I know, but they have statistical power for that reason.

                                                        Oh sure, I’m not saying this will be cheap. In fact the price of collecting good data is what I feel makes this research so difficult.

                                                        Health studies are all over the map. The “pile of college sophomores” problem very much applies there as well. There are tons of studies done on Caucasians that simply don’t apply in the same way to Asians or Africans, yet some doctors use that knowledge to treat patients.

                                                        We’ve developed techniques to deal with these issues, though of course, you can’t draw a conclusion with extremely low sample sizes. One technique used frequently to compensate for low statistical power studies in meta studies is called Post-Stratification.

                                                        Good doctors will use local knowledge and rules of thumb, and they don’t believe every published study they see. That would honestly be impossible, as lots of them are in direct contradiction to each other. (Contradiction is a problem that science shares with apprenticeship from experts; for example IIRC we don’t even know if a high fat diet causes heart disease, which was accepted wisdom for a long time.)

                                                        I think medicine is a good example of empiricism done right. Sure, we can look at modern failures of medicine and nutrition and use these learnings to do better, but medicine is significantly more empirical than software. I still maintain that if we can systematize our understanding of the human body and medicine that we can do the same for software, though like a soft science, definitive answers may stay elusive. Much work over decades went into the medical sciences to define what it even means to have an illness, to feel pain, to see recovery, or to combat an illness.

                                                        I would recommend reading some books by Nassim Taleb if you want to understand the limits of acquiring knowledge through measurement and statistics (Black Swan, Antifragile, etc.). Here is one comment I made about them recently: https://news.ycombinator.com/item?id=27213384

                                                        Key point: acting in the world, i.e. decision making under risk, are fundamentally different than scientific knowledge. Tinkering and experimentation are what drive real changes in the world, not planning by academics. He calls the latter “the Soviet-Harvard school”.

                                                        I’m very familiar with Taleb’s Antifragile thesis and the “Soviet-Harvard delusion”. As someone well versed in statistics, these are theses that are both pedestrian (Antifragile itself being a pop-science look into a field of study called Extreme Value Theory) and old (Maximum Likelihood approaches to decision theory are susceptible to extreme/tail events which is why in recent years Bayesian and Bayesian Causal analyses have become more popular. Pearson was aware of this weakness and explored other branches of statistics such as Fiducial Inference). (Also I don’t mean this as criticism toward you, though it’s hard to make this tone come across over text. I apologize if it felt offensive, I merely wish to draw your eyes to more recent developments.)

                                                        To draw the discussion to a close, I’ll try to summarize my position a bit. I don’t think software empiricism will answer all the questions, nor will we get to a point where we can rigorously determine that some function f exists that can model our preferences. However I do think software empiricism together with standardization can offer us a way to confidently produce low-risk, low-defect software. I think modern statistical advances have offered us ways to understand more than statistical approaches in the ‘70s and that we can use many of the newer techniques used in the social and medical sciences (e.g. Bayesian methods) to prove results. I don’t think that, even if we start a concerted approach today to do this, that our understanding will get there in a matter of a few years. To do that would be to undo decades of software practitioners creating systemic analyses from their own experiences and to create a culture shift away from the individual as artisan to a culture of standardization of both communication of results (what is a bug? how does it affect my code? how long did it take to find? how long did it take to resolve? etc) and of team conditions (our team has n engineers, our engineers have x years of experience, etc) that we just don’t have now. I have hope that eventually we will begin to both standardize and understand our industry better but in the near-term this will be difficult.

                                            2. 4

                                              Here’s a published paper that purposefully illustrates the point you’re trying to make: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC300808/. It’s an entertaining read.

                                              1. 1

                                                Yup I remember that from debates on whether to wear masks or not! :) It’s a nice pithy illustration of the problem.

                                              2. 2

                                                Actually I found a (condescending but funny/memorable) name for the fallacy – the “Soviet-Harvard delusion” :)

                                                An (unscientific) overestimation of the reach of scientific knowledge.

                                                I found it in my personal wiki, in 2012 notes on the book Antifragile.

                                                Original comment: https://lobste.rs/s/v4unx3/i_ing_hate_science#c_nrdasq

                                              3. 3

                                                I’m reading a book right now about 17th century science. The author has some stuff to say about Bacon and Empiricism but I’ll borrow an anecdote from the book. Boyle did an experiment where he grew a pumpkin and measured the dirt before and after. The weight of the dirt hadn’t changed much. The only other ingredient that had been added was water. It was obvious that the pumpkin must be made of only water.

                                                This idea that measurement and observation drive knowledge is Bacon’s legacy. Even in Bacon’s own lifetime, it’s not how science unfolded.

                                                1. 2

                                                  Fun fact: Bacon is often considered the modern founder of the idea that knowledge can be used to create human-directed progress. Before him, while scholars and astronomers used to often study things and invent things, most cultures still viewed life and nature as a generally haphazard process. As with most things in history the reality involves more than just Bacon, and there most-certainly were non-Westerners who had similar ideas, but Bacon still figures prominently in the picture.

                                                  1. 1

                                                    Hm interesting anecdote that I didn’t know about (I looked it up). Although I’d say that’s more an error of reasoning within science? I realized what I was getting at could be called the Soviet-Harvard delusion, which is overstating the reach of scientific knowledge (no insult intended, but it is a funny and memorable name): https://lobste.rs/s/v4unx3/i_ing_hate_science#c_nrdasq

                                                    1. 1

                                                      To be fair, the vast majority of the mass of the pumpkin is water. So the inference was correct to first order. The second-order correction of “and carbon from the air”, of course, requires being much more careful in the inference step.

                                                    2. 2

                                                      So basically I think you have to look at what people build and see how they do it. I would rather read a bunch of stories like “Coders at Work” or “Masterminds of Programming” than read any empirical study.

                                                      Perhaps, but this is already what happens, and I think it’s about time we in the profession raise our standards, both of pedagogy and of practice. Right now you can take a casual search on the Web and you can find respected talking-heads talk about how their philosophy is correct, despite being in direct contrast to another person’s philosophy. This behavior is reinforced by the culture wars of our times, of course, but there’s still much more aimless discourse than there is consistency in results. If we want to start taking steps to improve our practice, I think it’s important to understand what we’re doing right and more importantly what we’re doing wrong. I’m more interested here in negative results than positive results. I want to know where as a discipline software engineering is going wrong. There’s also a lot at stake here purely monetarily; corporations often embrace a technology methodology and pay for PR and marketing about their methodology to both bolster their reputations and to try to attract engineers.

                                                      think there should be a name for this empirical fallacy (or it probably already exists?) Another area where science has roundly failed is nutrition and preventative medicine.

                                                      I don’t think we’re even at the point in our empirical understanding of software engineering where we can make this fallacy. What do we even definitively understand about our field? I’d argue that psychology and sociology have stronger well-known results than what we have in software engineering even though those are very obviously soft sciences. I also think software engineers are motivated to think the problem is complex and impossible to be empirical for the same reason that anyone holds their work in high esteem; we believe our work is complicated and requires highly contextual expertise to understand. However if psychology and sociology can make empirical progress in their fields, I think software engineers most definitely can.

                                                      1. 2

                                                        Do you have an example in mind of the direct contradiction? I don’t see much of a problem if different experts have different opinions. That just means they were building different things and different strategies apply.

                                                        Again I say it’s good to “look at what people build” and see if it applies to your situation; not blindly follow advice from authorities (e.g. some study “proved” this, or some guy from Google who may or may not have built things said this was good; therefore it must be good).

                                                        I don’t find a huge amount of divergence in the opinions of people who actually build stuff, vs. talking heads. If you look at what says John Carmack says about software engineering, it’s generally pretty level-headed, and he explains it well. It’s not going to differ that much from what Jeff Dean says. If you look at their C++ code, there are even similarities, despite drastically different domains.

                                                        Again the fallacy is that there’s a single “correct” – it depends on the domain; a little diversity is a good thing.

                                                        1. 4

                                                          Do you have an example in mind of the direct contradiction? I don’t see much of a problem if different experts have different opinions. That just means they were building different things and different strategies apply.

                                                          Here’s two fun ones I like to contrast: The Unreasonable Effectiveness of Dynamic Typing for Practical Programs (Vimeo) and The advantages of static typing, simply stated. Two separate authors that came to different conclusions from similar evidence. While yes their lived experience is undoubtedly different, these are folks who are espousing (mostly, not completely) contradictory viewpoints.

                                                          I don’t find a huge amount of divergence in the opinions of people who actually build stuff, vs. talking heads. If you look at what says John Carmack says about software engineering, it’s generally pretty level-headed, and he explains it well. It’s not going to differ that much from what Jeff Dean says. If you look at their C++ code, there are even similarities, despite drastically different domains.

                                                          Who builds things though? Several people build things. While we hear about John Carmack and Jeff Dean, there are folks plugging away at the Linux kernel, on io_uring, on capability object systems, and all sorts of things that many of us will never be aware of. As an example, Sanjay Ghewamat is someone who I wasn’t familiar with until you talked about him. I’ve also interacted with folks in my career who I presume you’ve never interacted with and yet have been an invaluable source of learnings for my own code. Moreover these experience reports are biased by their reputations; I mean of course we’re more likely to listen to John Carmack than some Vijay Foo (not a real person, as far as I’m aware) because he’s known for his work at iD, even if this Vijay Foo may end up having as many or more actionable insights than John Carmack. Overcoming reputation bias and lack of information about “builders” is another side effect I see of empirical research. Aggregating learnings across individuals can help surface lessons that otherwise would have been lost due to structural issues of acclaim and money.

                                                          Again the fallacy is that there’s a single “correct” – it depends on the domain; a little diversity is a good thing.

                                                          This seems to be a sentiment I’ve read elsewhere, so I want to emphasize: I don’t think there’s anything wrong with diversity and I don’t think Emprical Software Engineering does anything to diversity. Creating complicated probabilistic models of spaces necessarily involve many factors. We can create a probability space which has all of the features we care about. Just condition against your “domain” (e.g. kernel work, distributed systems, etc) and slot your result into that domain. I don’t doubt that a truly descriptive probability space will be very high dimensional here but I’m confident we have the analytical and computational power to perform this work nonetheless.

                                                          The real challenge I suspect will be to gather the data. FOSS developers are time and money strapped as it is, and excluding some exceptional cases such as curl’s codebase statistics, they’re rarely going to have the time to take the detailed notes it would take to drive this research forward. Corporations which develop proprietary software have almost no incentive to release this data to the general public given how much it could expose about their internal organizational structure and coding practices, so rather than open themselves up to scrutiny they keep the data internal if they measure it at all. Combating this will be a tough problem.

                                                          1. 2

                                                            Yeah I don’t see any conflict there (and I’ve watched the first one before). I use both static and dynamic languages and there are advantages and disadvantages to each. I think any programmer should comfortable using both styles.

                                                            I think that the notion that a study is going to change anyone’s mind is silly, like “I am very productive in statically typed languages. But a study said that they are not more productive; therefore I will switch to dynamically typed”. That is very silly.

                                                            It’s also not a question that’s ever actionable in reality. Nobody says “Should I use a static or dynamic language for this project?” More likely you are working on existing codebase, OR you have a choice between say Python and Go. The difference between Python and Go would be a more interesting and accurate study, not static vs. dynamic. But you can’t do an “all pairs” comparison via scientific studies.

                                                            If there WERE a study definitely proving that say dynamic languages are “better” (whatever that means), and you chose Python over Go for that reason, that would be a huge mistake. It’s just not enough evidence; the languages are different for other reasons.

                                                            I think there is value to scientific studies on software engineering, but I think the field just moves very fast, and if you wait for science, you’ll be missing out on a lot of stuff. I try things based on what people who get things done do (e.g. OCaml), and incorporate it into my own work, and that seems like a good way of obtaining knowledge.

                                                            Likewise, I think “Is catching bugs earlier less expensive” is a pretty bad question. A better scientific question might be “is unit testing in Python more effective than integration testing Python with shell” or something like that. Even that’s sort of silly because the answer is “both”.

                                                            But my point is that these vague and general questions simply leave out a lot of subtlety of any particular situation, and can’t be answered in any useful way.

                                                            1. 2

                                                              I think that the notion that a study is going to change anyone’s mind is silly, like “I am very productive in statically typed languages. But a study said that they are not more productive; therefore I will switch to dynamically typed”. That is very silly.

                                                              While the example of static and dynamic typing is probably overbroad to be meaningless, I don’t actually think this would be true. It’s a bit like saying “Well I believe that Python is the best language and even though research shows that Go has propertries <x, y, and z> that are beneficial to my problem domain, well I’m going to ignore them and put a huge prior on my past experience.” It’s the state of the art right now; trust your gut and the guts of those you respect, not the other guts. If we can’t progress from here I would indeed be sad.

                                                              It’s also not a question that’s ever actionable in reality. Nobody says “Should I use a static or dynamic language for this project?” More likely you are working on existing codebase, OR you have a choice between say Python and Go. The difference between Python and Go would be a more interesting and accurate study, not static vs. dynamic. But you can’t do an “all pairs” comparison via scientific studies.

                                                              Sure, as you say, static vs dynamic languages isn’t very actionable but Python vs Go would be. And if I’m starting a new codebase, a new project, or a new company, it might be meaningful to have research that shows that, say, Python has a higher defect rate but an overall lower mean time to resolution of these defects. Prior experience with Go may trump benefits that Python has (in this synthetic example) if project time horizons are short, but if time horizons are long Go (again in the synthetic example) might look better. I think this sort of comparative analysis in defect rates, mean time to resolution, defect severity, and other attributes can be very useful.

                                                              Personally, I’m not satisfied by the state of the art of looking at builders. I think the industry really needs a more rigorous look at its assumptions and even if we never truly systematize and Fordify the field (which fwiw I don’t think is possible), I certainly think there’s a lot of progress for us to make yet and many pedestrian questions that we can answer that have no answers yet.

                                                              1. 2

                                                                Sure, as you say, static vs dynamic languages isn’t very actionable but Python vs Go would be. And if I’m starting a new codebase, a new project, or a new company, it might be meaningful to have research that shows that, say, Python has a higher defect rate but an overall lower mean time to resolution of these defects.

                                                                Python vs Go defect rates also seem to me to be far too general for an empirical study to produce actionable data.

                                                                How do you quantify a “defect rate” in a way that’s relevant to my problem, for example? There are a ton of confounds: genre of software, timescale of development, size of team, composition of team, goals of the project, etc. How do I know that some empirical study comparing defect rates of Python vs. Go in, I dunno, the giant Google monorepo, is applicable to my context? Let’s say I’m trying to pick a language to write some AI research software, which will have a 2-person team, no monorepo or formalized code-review processes, a target 1-year timeframe to completion, and a primary metric of producing figures for a paper. Why would I expect the Google study to produce valid data for my decision-making?

                                                              2. 1

                                                                Nobody says “Should I use a static or dynamic language for this project?”

                                                                Somebody does. Somebody writes the first code on a new project and chose the language. Somebody sets the corporate policy on permissible languages. Would be amazing if even a tiny input to these choices were real instead of just perceived popularity and personal familiarity.

                                                        2. 2

                                                          I meant to comment on that original thread, because I thought the question was misguided. Well now that I look it’s actually been deleted?

                                                          Too many downvotes this month. ¯\_(ツ)_/¯

                                                          1. 1

                                                            This situation is not ideal :(

                                                        1. 2

                                                          I made another pass over this. There are a few TODOs but it lists almost everything.

                                                          Let me know what you think! The language is still open to suggestion, especially from people who try it on real use cases. Actually almost everybody who has tried it has ended up influencing the language in some way (e.g. Till Red, Raphael Megzari recently)

                                                          1. 3

                                                            It was nice to read. While Oil often appears on this site, I didn’t really understand what the language is trying to do until now. It takes the conveniences of shell programming, not having to escape every string every time. And adds JSON data structures on top.

                                                            1. 2

                                                              Yes that’s one way to describe it! JSON is an almost universal interchange format, and a shell that has JSON support should also have JS/Python-like data types and expressions to complement it.

                                                              I have a draft of a blog post called “Essential Features of the Oil Language” which describes these as the four main features:

                                                              1. Python-like data types and expressions
                                                              2. Egg Expressions (Oil Regexes)
                                                              3. Abstraction With Shell-like procs and Ruby-like blocks
                                                              4. Interchange formats like JSON and QTT (quoted, typed tables)

                                                              https://oilshell.zulipchat.com/#narrow/stream/266575-blog-ideas/topic/Five.20Essential.20Features.20of.20the.20Oil.20Language

                                                              On the compatible side, we have Four Features of the OSH Language:

                                                              1. Reliable error handling (errexit)
                                                              2. Safe handling of user-supplied, untrusted data (QSN, etc.)
                                                              3. Eliminate quoting hell (arrays and splicing, mentioned in the doc)
                                                              4. Static Parsing for better error messages and tooling

                                                              So I’d say those 8 things are a good summary of what Oil’s trying to do! It’s taking awhile but I don’t see any one else doing a graceful upgrade of shell.

                                                              Thanks for the feedback!

                                                              1. 1

                                                                There is https://www.nushell.sh/. It doesn’t try to be a scripting language, but it has similar goals as a shell. The main difference I would say is that they extend the pipe to support types, instead of splitting commands and functions as Oil does. Both have pros and cons.

                                                                1. 1

                                                                  Yup I’ve looked at almost all alternative shells and made a huge list of them! https://github.com/oilshell/oil/wiki/Alternative-Shells

                                                                  I would like to add something like:

                                                                  myls -q | where [size > 10]  # structured data that's then filtered by 'where' builtin
                                                                  

                                                                  but it’s a little far in the future. My thinking is that myproc [a, b] is actually a lazy argument list, and I can parse it with shopt --set parse_bracket (analogous to how we use shopt --set parse_paren parse_brace to make if (x) { ... } work.

                                                                  That is all hidden under bin/oil, so you don’t need to remember the names. But that is the underlying mechanism for changing the syntax, and gradually migrating OSH to Oil.

                                                          1. 2

                                                            This was very helpful for me, as someone who’s taken a look at osh/oil from time to time, but never gone that deep. A few observations/requests to say certain things explicitly.

                                                            In the list section, you mention arrays and lists. Are they different? Are dict keys always strings? Can they be quoted strings with underscores or meta chars? Are expressions always in parens?

                                                            Structurally, I wonder if it might make sense to move some examples to the top to give people more of a feel, before talking about words, and other things that might matter less to someone who lands on the page without familiarity with the project.

                                                            1. 2

                                                              Thanks, this is great feedback, I will update the doc. (Although some of it will have to go on linked docs to keep the length down.)

                                                              • Yes good point array vs. list is confusing. Oil is like Python – it only has lists. I sometimes use “array” to mean “list of strings” but I think I should settle on one term. I want to keep consistency with Python’s terminology, but “list” is sort of a bad name, since it reminds people of linked list? JavaScript uses “array” so maybe I should adopt that terminology.
                                                                • JS: array and “object” (object isn’t good)
                                                                • Python: list and dict
                                                                • Oil: array and dict? Or maybe array and map? Not sure if that’s a good idea.
                                                              • The dict keys have to be strings, unlike Python. The rule is the same as in JavaScript, except with the -> operator instead of . (dot)
                                                                • d->key is the same as d['key']
                                                                • If you have special chars, do d['foo+bar'], since d->foo+bar is parsed like addition.
                                                              • Expressions aren’t always in parens
                                                              • Yes, good point about examples. I got other feedback that the “word/command/expression” stuff was too abstract, so I will try to reduce / de-emphasize it. I think it makes sense for the END of the doc rather than the beginning.

                                                              I will update the doc but let me know if you more feedback!

                                                            1. 2

                                                              How large is the data in uncompressed CSV format? How long does the data take when loading in with the sqlite CLI?

                                                              1. 4

                                                                I can answer my own question, the 100,000,000 rows sqlite3_opt.py produced when I ran it created a 1.5 GiB SQLite3 file and a 1.8 GiB CSV file when I dumped it out. If you just wrote zeros to a file on NVMe-backed storage the above files could be generated in 0.5 - 0.8 seconds. So 1B rows, or ~15GiB in SQLite format could be written in 8 seconds if it was being copied from another NVMe-backed storage device. That gives another 52 seconds to all the overhead of generating random data and writing it out in SQLite3’s internal layout.

                                                                1. 3

                                                                  So do you estimate it will faster than the fastest Rust version? That was 100 M rows in 33 seconds.

                                                                  I’d expect you could do .import csv of a 1.5 GB CSV file on a modern machine in less than 30 seconds?

                                                                  Also this architecture is parallelizable over 2 cores – one generating CSV, and one sqlite process writing the data.

                                                                  My instinct was that the “shell version” of this program would be the fastest :)

                                                                  1. 3

                                                                    Without the overhead of generating the dataset, loading the CSV version into SQLite3 via a single import command took 3m5s seconds on my 2020 MBP with the DB on an SSD. This works out to about 10 MiB/s.

                                                                    $ cat load.sql 
                                                                    PRAGMA journal_mode = OFF;
                                                                    PRAGMA synchronous = 0;
                                                                    PRAGMA cache_size = 1000000;
                                                                    PRAGMA locking_mode = EXCLUSIVE;
                                                                    PRAGMA temp_store = MEMORY;
                                                                    .mode csv
                                                                    .separator ","
                                                                    .timer on
                                                                    .import sqlite3_opt.csv user
                                                                    
                                                                    $ time sqlite3 import.db < load.sql
                                                                    
                                                                    real    3m5.419s
                                                                    user    2m46.685s
                                                                    sys     0m12.391s
                                                                    

                                                                    I’m not sure if there is a good way to break past the single-core bottleneck when loading data in. I can see one core sat at 100% when this above import happens.

                                                                    Even on Presto+HDFS clusters, creating a new table based on a SELECT * from another table will result in each node in the cluster building their own section of the dataset with a single core. There seems to be some enforced linearization there as well. Using Zstd instead of Zlib for compression at best improves perf by ~30%.

                                                                    Can anyone shed some light on getting past the single-core compute restrictions of SQLite when loading CSV data in?

                                                                    1. 1

                                                                      Thanks for the info! I wonder if the CSV parser is kinda slow? I hope I will find time to test it out, as I’m kinda surprised that 100M rows or 1.5 GB can’t be loaded faster than 3 minutes.

                                                                      I was also wondering about a single INSERT statement of multiple rows would be faster. You use the SQL parser instead of the CSV parser.

                                                                      INSERT into user VALUES (row 1 ...), (row 2 ...), ... (row N ...) ;
                                                                      

                                                                      I think one issue is that the example code uses an if statement and separate INSERT statements, and I think the Rust version does too:

                                                                      https://github.com/avinassh/fast-sqlite3-inserts/blob/master/sqlite3_opt.py#L26

                                                                      Basically I think you could factor that code into data, and it would be faster (?). I guess the CSV method is already doing that though. I’d be interested to see your code and maybe run it on my machine.

                                                                      I don’t think sqlite has any multicore / multithreading features – which is why I think splitting it up into 2 processes (generation and loading) is probably a decent win.

                                                                      1. 2

                                                                        I did a quick test comparing inserting via CSV versus SQL. It seems SQL triggers about the same number of calls to fdatasync but they take about ~3x longer.

                                                                        $ cat test1.csv
                                                                        1,hello,123
                                                                        
                                                                        $ cat load.sql
                                                                        .mode csv
                                                                        .separator ","
                                                                        .import test1.csv test
                                                                        
                                                                        $ cat load2.sql
                                                                        INSERT INTO test (a,b,c) VALUES (1, 'hello', 123);
                                                                        
                                                                        $ strace -wc sqlite3 import.db < load.sql
                                                                        
                                                                        % time     seconds  usecs/call     calls    errors syscall
                                                                        ------ ----------- ----------- --------- --------- ----------------
                                                                         78.02    0.005216        1304         4           fdatasync
                                                                          4.29    0.000287           7        41           mmap
                                                                          2.53    0.000169           8        19         1 openat
                                                                          1.67    0.000111         111         1           execve
                                                                        ...
                                                                        ------ ----------- ----------- --------- --------- ----------------
                                                                        100.00    0.006685                   213        10 total
                                                                        
                                                                        
                                                                        $ strace -wc sqlite3 import.db < load2.sql
                                                                        % time     seconds  usecs/call     calls    errors syscall
                                                                        ------ ----------- ----------- --------- --------- ----------------
                                                                         79.94    0.015514        3878         4           fdatasync
                                                                          3.61    0.000701          17        41           mmap
                                                                          2.34    0.000454         453         1           execve
                                                                        ...
                                                                        ------ ----------- ----------- --------- --------- ----------------
                                                                        100.00    0.019406                   209        10 total
                                                                        

                                                                        I suspect something like valgrind could give a much more authoritative answer on the number of op codes needed for the operations above.

                                                                    2. 1

                                                                      I was going to mention that there’s the .import function as well, and pythons sqlite3 package I think has the ability to call functions from the package on the db. I was loading a large set on Friday and a 6.44Gb CSV with about 6 columns took single-digit minutes using .import and I wasn’t using any of the additional performance settings.

                                                                    3. 1

                                                                      You could pipe the CSV output from Python to the sqlite tool instead of writing it to a file first.

                                                                      But the tool isn’t doing anything magic, so I wouldn’t expect it to be faster than a C or Rust program. This would just be a fast solution involving a lot less coding.

                                                                  1. 7

                                                                    I prefer LMDB’s approach (described in this 2012 paper.) I am not an expert on this stuff, although I’ve been building my own B-tree storage manager for a few weeks, so I’m learning rapidly. Durability is hard!

                                                                    TL;DR: SQLite either updates the file in place but saves copies of the overwritten pages for rollback, or writes to a temporary log file and periodically flushes it by doing the above. LMDB treats pages as copy-on-write, although once a page has been copied once it can then be overwritten while in the same transaction. It does some very clever free-page management to provide MVCC.

                                                                    In practice LMDB is a lot faster than SQLite. There was a proof-of-concept project that replaced SQLite’s b-tree engine with LMDB and sped it up a lot, but sadly it was an old version and wasn’t kept up to date.

                                                                    1. 8

                                                                      In practice LMDB is a lot faster than SQLite.

                                                                      I feel like this needs some qualifications. There are many dimensions to performance (read vs. write, physical media like SSD, NVMe, etc.), and also hard tradeoffs that different projects reconcile in different ways (durability vs. performance).

                                                                      I’m not an expert either but these comments give more color on the subtleties:

                                                                      https://news.ycombinator.com/item?id=18413743

                                                                      Some people say LMDB loses ACID semantics to be fast in certain situations; some people say RockDB (LevelDB fork) is massively faster, etc.

                                                                      It’s very easy to make a proof of concept that’s faster (namely because it doesn’t have to work all the time, and because you probably don’t know if it doesn’t work all the time!).

                                                                      1. 1

                                                                        I think you’re referring to this comment about ACID:

                                                                        • LMDB by default provides full ACID semantics, which means that after every key-value write committed, it needs to sync to disk. Apparently if this happens tens of times per second, your system performance will suffer.
                                                                        • LMDB provides a super-fast asynchronous mode (MDB_NOSYNC), and this is the one most often benchmarked. Writes are super-fast with this. But a little known fact is that you lose all of ACID[…]

                                                                        Yes, just like SQLite, commits are expensive because they provide durability. But the performance of commits (the full disk flush) isn’t nearly as bad with SSDs. If you find it a problem, you can leave a transaction open and just commit-and-reopen periodically; at worst you’ll lose changes in the latest transaction.

                                                                        You might also check out libMDBX, a fork of LMDB with a lot of optimizations and improvements. I was working with it last year, because development of LMDB itself seems to have stopped or stalled since 2015.

                                                                      2. 6

                                                                        The LumoSQL project has benchmarks for most SQLite variants. Apparently the SQLite LSM tree has improved a lot and is now competitive with the LMDB backend.

                                                                        1. 2

                                                                          The SQLite LSM tree is in limbo, last I heard. It was part of a “SQLite 4 prototype that was abandoned in 2014. It’s not used in SQLite 3, unless there was a recent announcement I’ve missed.

                                                                          1. 4

                                                                            It was ported as an SQLite 3 extension in 2017, discussed here. There’s a short 2020 mailing list thread on it that suggests it’s still maintained, but not under active development due to little user interest so far.

                                                                            1. 2

                                                                              Yeah right, I actually meant to say the default KV store.

                                                                              By January 2020 the LumoSQL project concluded: Howard’s 2013 performance work is reproducible SQLite’s key-value store improved in performance since 2013, getting close to parity with LMDB by some measures

                                                                          2. 2

                                                                            Thank you for the link to the LMDB paper!

                                                                          1. 6

                                                                            Hi there! I’m Aloke, an engineer at Warp and author of the blog post. Happy to answer any questions.

                                                                            If you are interested, you can join the waitlist for Warp here: https://www.warp.dev/ and our discord community here https://discord.com/invite/T2p5xFgpjr

                                                                            1. 1

                                                                              Hm this part is interesting:

                                                                              To fix these issues, we create a separate grid for each command and output based on the precmd/preexec hooks we receive from the shell. This level of grid isolation ensures we can separate commands and their output without having to deal with the output of one command overwriting that of another. We found this approach to be a good balance of being faithful to the VT100 spec and using the existing Alacritty grid code with rendering blocks in a more flexible way that was traditionally prescribed by a teletype terminal.

                                                                              This is a creative solution and makes sense for compatibility. I’d be interested any corner cases that don’t work, i.e. does it handle async commands with & ? To me it would be nice to pop out another tab for that or something.

                                                                              Oil has started to provide an API over Unix domain sockets to give a “headless shell”.

                                                                              http://www.oilshell.org/blog/2021/06/hotos-shell-panel.html#oils-headless-mode-should-be-useful-for-ui-research

                                                                              Basically what I want to do is punt the whole UI question elsewhere, since I’m mainly focused on shell as a programming language.

                                                                              So I envision all the shell UI could be in a rich browser-like GUI (optionally, you can obviously still use a terminal).

                                                                              I’d say the GUI should have a terminal, but it doesn’t have to be a terminal. In other words, I think of shell as a language-oriented interface, but not necessarily a terminal-oriented one.

                                                                              I talked with the authors of https://www.withfig.com/ a little bit. They also have a create solution with the OS X accessibility framework to parse shell via OS hooks. I think the preexec hooks have a similar flavor. But really shells should allow more introspection and Oil is on its way. If you’re interested in exploring any of these things feel free to join https://www.oilshell.zulipchat.com/ . There is very nascent work on a client for a headless shell there.

                                                                              1. 1

                                                                                I spent about ten minutes yesterday hunting for the waitlist signup form, then gave up. Looked again right now and still can’t find it. I’m using Safari on an iPad; maybe there’s some layout glitch hiding the form on smaller screens?

                                                                                1. 2

                                                                                  huh sorry about that, there’s an issue on some versions of Safari where the request access button doesn’t work, we’re in the process of getting in a fix.

                                                                                  In the meantime you can find the form here: https://zachlloyd.typeform.com/to/yrwMkgtj?typeform-embed=oembed&typeform-medium=embed-oembed

                                                                              1. 4

                                                                                Judging by the comments here I’m not interested in reading the article.

                                                                                But, why use ls | grep foo at all instead of *foo* as the argument for rm?

                                                                                1. 5

                                                                                  I was also distracted by using the output of ls in scripting, which is a golden rule no-no.

                                                                                  1. 1

                                                                                    Is this not what ls -D is for?

                                                                                  2. 4

                                                                                    Despite “The UNIX Way” saying that we have all these little composable command line tools that we can interop using the universal interchange language of plaintext, it is also said that we should never parse the output of ls. The reasons for this are unclear to me, patches that would have supported this have been rejected.

                                                                                    Definitely the glob is the right way to do this, and if things get more complex the find command.

                                                                                    1. 4

                                                                                      “Never parse the output of ls” is a bit strong, but I can see the rationale for such a rule.

                                                                                      Basically the shell already knows how to list files with *.

                                                                                      for name in *; do  # no external processes started here, just glob()
                                                                                         echo $name
                                                                                      done
                                                                                      

                                                                                      That covers 90% of the use cases where you might want to parse the output of ls.

                                                                                      One case where you would is suggested by this article:

                                                                                      # Use a regex to filter Python or C++ tests, which is harder in the shell (at least a POSIX shell)
                                                                                      ls | egrep '.*_test.(py|cc)' | xargs -d $'\n' echo
                                                                                      

                                                                                      BTW I’d say ls is a non-recursive special case of find, and ls lacks -print for formatting and -print0 for parseable output. It may be better to use find . -maxdepth 1 in some cases, but I’m comfortable with the above.

                                                                                    2. 3

                                                                                      why use ls | grep foo at all instead of *foo* as the argument for rm

                                                                                      Almost always, I use the shell iteratively, working stepwise to my goal. Pipelines like that are the outcome of that process.

                                                                                      1. 2

                                                                                        I gave an example below – if you want to filter by a regex and not a constant string.

                                                                                        # Use a regex to filter Python or C++ tests, which is harder in the shell (at least a POSIX shell)
                                                                                        ls | egrep '.*_test.(py|cc)' | xargs -d $'\n' echo
                                                                                        

                                                                                        You can do this with extended globs too in bash, but that syntax is pretty obscure. You can also use regexes without egrep via [[. There are millions of ways to do everything in shell :)

                                                                                        I’d say that globs and find cover 99% of use cases, I can see ls | egrep being useful on occasion.

                                                                                        1. 1

                                                                                          If normal globs aren’t enough, I’d use extended glob or find. But yeah, find would require options to prevent hidden files and recursive search compared to default ls. If this is something that is needed often, I’d make a function and put it in .bashrc.

                                                                                          That said, I’d use *_test.{py,cc} for your given example and your regex should be .*_test\.(py|cc)$ or _test\.(py|cc)$

                                                                                          I have parsed ls occasionally too - ex: -X to sort by extension, -q and pipe to wc for counting files, -t for sorting by time, etc.

                                                                                          And I missed the case of too many arguments for rm *foo* (for which I’d use find again) regarding the comment I made. I should’ve just read the article enough to know why ls | grep was being used.

                                                                                        2. 1

                                                                                          That’s clearly just a placeholder pipeline. No one actually wants *foo* anyhow.

                                                                                        1. 3

                                                                                          I disagree with the article in general, but I’ll give it an up-vote because I learnt about the -L option which I’ve never seen/used before.

                                                                                          1. 4

                                                                                            I’ve never used it but I claim it can be always replaced with -n ? (see my long comment here)

                                                                                            $ seq 10 | xargs -n 3 -- echo
                                                                                            1 2 3
                                                                                            4 5 6
                                                                                            7 8 9
                                                                                            10
                                                                                            

                                                                                            I’m interested in any counterexamples. The difference appears to be that -n works on args that were already tokenized by -d or -0, while -L has its own tokenization rules? I think the former is better because it’s more orthogonal to the rest of the command.

                                                                                            Here’s a longer example:

                                                                                            $ { echo 'foo bar'; echo 'spam    eggs'; echo 'ale bean'; } | xargs -d $'\n' -n 2 -- ~/bin/argv
                                                                                            ['foo bar', 'spam    eggs']
                                                                                            ['ale bean']
                                                                                            

                                                                                            It correctly does the tokenization you want (split on newlines), and then produces batches of 2 args.

                                                                                            1. 2

                                                                                              I don’t think -L can always be replaced with -n. They appear the same because seq 10 gives only one token on each line, and -L aggregates lines. Look what happens if you have 3 tokens on each line, for example:

                                                                                              $ seq 10 | xargs -L 3 | xargs -L 2 
                                                                                              1 2 3 4 5 6
                                                                                              7 8 9 10
                                                                                              

                                                                                              While -n is tokens:

                                                                                              $ seq 10 | xargs -n 3 | xargs -n 2 
                                                                                              1 2
                                                                                              3 4
                                                                                              5 6
                                                                                              7 8
                                                                                              9 10
                                                                                              

                                                                                              I agree that -n seems more generally useful.

                                                                                              1. 3

                                                                                                Yeah I shouldn’t have said “always replace”. I think it’s more like “-L is never what you want; you want -n” :) It does something different that’s not good. Again I’d be interested in any realistic counterexamples

                                                                                          1. 41

                                                                                            Eh, there are some problems with xargs, but this isn’t a good critique. First off it proposes a a “solution” that doesn’t even handle spaces in filenames (much less say newlines):

                                                                                            rm $(ls | grep foo)
                                                                                            

                                                                                            I prefer this as a practical solution (that handles every char except newlines in filenames):

                                                                                            ls | grep foo | xargs -d $'\n' -- rm
                                                                                            

                                                                                            You can also pipe find . -print0 to xargs -0 if you want to handle newlines (untrusted data).

                                                                                            (Although then you have the problem that there’s no grep -0, which is why Oil has QSN. grep still works on QSN, and QSN can represent every string, even those with NULs!)


                                                                                            One nice thing about xargs is that you can preview the commands by adding ‘echo’ on the front:

                                                                                            ls | grep foo | xargs -d $'\n' -- echo rm
                                                                                            

                                                                                            That will help get the tokenization right, so you don’t feed the wrong thing into the commands!

                                                                                            I never use xargs -L, and I sometimes use xargs -I {} for simple invocations. But even better than that is using xargs with the $0 Dispatch pattern, which I still need properly write about.

                                                                                            Basically instead of the mini language of -I {}, just use shell by recursively invoking shell functions. I use this all the time, e.g. all over Oil and elsewhere.

                                                                                            do_one() {
                                                                                               # It's more flexible to use a function with $1 instead of -I {}
                                                                                               echo "Do something with $1"  
                                                                                               echo mv $1 /tmp
                                                                                            }
                                                                                            
                                                                                            do_all() {
                                                                                              # call the do_one function for each item.  Also add -P to make it parallel
                                                                                              cat tasks.txt | grep foo | xargs -n 1 -d $'\n' -- $0 do_one
                                                                                            }
                                                                                            
                                                                                            "$@"  # dispatch on $0; or use 'runproc' in Oil
                                                                                            

                                                                                            Now run with

                                                                                            • myscript.sh do_all, or
                                                                                            • my_script.sh do_one to test out the “work” function (very handy! you need to make this work first)

                                                                                            This separates the problem nicely – make it work on one thing, and then figure out which things to run it on. When you combine them, they WILL work, unlike the “sed into bash” solution.


                                                                                            Reading up on what xargs -L does, I have avoided it because it’s a custom mini-language. It says that trailing blanks cause line continuations. Those sort of rules are silly to me.

                                                                                            I also avoid -I {} because it’s a custom mini-language.

                                                                                            IMO it’s better to just use the shell, and one of these three invocations:

                                                                                            • xargs – when you know your input is “words” like myhost otherhost
                                                                                            • xargs -d $'\n' – when you want lines
                                                                                            • xargs -0 – when you want to handle untrusted data (e.g. someone putting a newline in a filename)

                                                                                            Those 3 can be combined with -n 1 or -n 42, and they will do the desired grouping. I’ve never needed anything more than that.

                                                                                            So yes xargs is weird, but I don’t agree with the author’s suggestions. sed piped into bash means that you’re manipulating bash code with sed, which is almost impossible to do correctly.

                                                                                            Instead I suggest combining xargs and shell, because xargs works with arguments and not strings. You can make that correct and reason about what it doesn’t handle (newlines, etc.)

                                                                                            (OK I guess this is a start of a blog post, I also gave a 5 minute presentation 3 years ago about this: http://www.oilshell.org/share/05-24-pres.html)

                                                                                            1. 9

                                                                                              pipe find . -print0 to xargs -0

                                                                                              I use find . -exec very often for running a command on lots of files. Why would you choose to pipe into xargs instead?

                                                                                              1. 11

                                                                                                It can be much faster (depending on the use case). If you’re trying to rm 100,000 files, you can start one process instead of 100,000 processes! (the max number of args to a process on Linux is something like 131K as far as I remember).

                                                                                                It’s basically

                                                                                                rm one two three
                                                                                                

                                                                                                vs.

                                                                                                rm one
                                                                                                rm two
                                                                                                rm three
                                                                                                

                                                                                                Here’s a comparison showing that find -exec is slower:

                                                                                                https://www.reddit.com/r/ProgrammingLanguages/comments/frhplj/some_syntax_ideas_for_a_shell_please_provide/fm07izj/

                                                                                                Another reference: https://old.reddit.com/r/commandline/comments/45xxv1/why_find_stat_is_much_slower_than_ls/

                                                                                                Good question, I will add this to the hypothetical blog post! :)

                                                                                                1. 14

                                                                                                  @andyc Wouldn’t the find + (rather than ;) option solve this problem too?

                                                                                                  1. 4

                                                                                                    Oh yes, it does! I don’t tend to use it, since I use xargs for a bunch of other stuff too, but that will also work. Looks like busybox supports it to in addition to GNU (I would guess it’s in POSIX).

                                                                                                  2. 10

                                                                                                    the max number of args to a process on Linux is something like 131K as far as I remember

                                                                                                    Time for the other really, really useful feature of xargs. ;)

                                                                                                    $ echo | xargs --show-limits
                                                                                                    Your environment variables take up 2222 bytes
                                                                                                    POSIX upper limit on argument length (this system): 2092882
                                                                                                    POSIX smallest allowable upper limit on argument length (all systems): 4096
                                                                                                    Maximum length of command we could actually use: 2090660
                                                                                                    Size of command buffer we are actually using: 131072
                                                                                                    Maximum parallelism (--max-procs must be no greater): 2147483647
                                                                                                    

                                                                                                    It’s not a limit on the number of arguments, it’s a limit on the total size of environment variables + command-line arguments (+ some other data, see getauxval(3) on a Linux machine for details). Apparently Linux defaults to a quarter of the available stack allocated for new processes, but it also has a hard limit of 128KiB on the size of each individual argument (MAX_ARG_STRLEN). There’s also MAX_ARG_STRINGS which limits the number of arguments, but it’s set to 2³¹-1, so you’ll hit the ~2MiB limit first.

                                                                                                    Needless to say, a lot of these numbers are much smaller on other POSIX systems, like BSDs or macOS.

                                                                                                  3. 1

                                                                                                    find . -exec blah will fork a process for each file, while find . | xargs blah will fork a process per X files (where X is the system wide argument length limit). The later could run quite a bit faster. I will typically do find . -name '*.h' | xargs grep SOME_OBSCURE_DEFINE and depending upon the repo, that might only expand to one grep.

                                                                                                    1. 4

                                                                                                      As @jonahx mentions, there is an option for that in find too:

                                                                                                           -exec utility [argument ...] {} +
                                                                                                                   Same as -exec, except that ``{}'' is replaced with as many pathnames as possible for each invocation of utility.  This
                                                                                                                   behaviour is similar to that of xargs(1).
                                                                                                      
                                                                                                        1. 3

                                                                                                          That is the real beauty of xargs. I didn’t know about using + with find, and while that’s quite useful, remembering it means I need to remember something that only works with find. In contrast, xargs works with anything they can supply a newline-delimited list of filenames as input.

                                                                                                          1. 2

                                                                                                            Yes, this. Even though the original post complains about too many features in xargs, find is truly the worst with a million options.

                                                                                                  4. 6

                                                                                                    This comment was a great article in itself.

                                                                                                    Conceptually, I think of xargs primarily as a wrapper that enables tools that don’t support stdin to support stdin. Is this a good way to think about it?

                                                                                                    1. 8

                                                                                                      Yes I’d think of it as an “adapter” between text streams (stdin) and argv arrays. Both of those are essential parts of shell and you need ways to move back and forth. To move the other way you can simply use echo (or write -- @ARGV in Oil).

                                                                                                      Another way I think of it is to replace xargs with the word “each” mentally, as in Ruby, Rust, and some common JS idioms.

                                                                                                      You’re basically separating iteration from the logic of what to do on each thing. It’s a special case of a loop.

                                                                                                      In a loop, the current iteration can depend on the previous iteration, and sometimes you need that. But in xargs, every iteration is independent, which is good because you can add xargs -P to automatically parallelize it! You can’t do that with a regular loop.


                                                                                                      I would like Oil to grow an each builtin that is a cleaned up xargs, following the guidelines I enumerated.

                                                                                                      I’ve been wondering if it should be named each and every?

                                                                                                      • each – like xargs -n 1, and find -exec foo \; – call a process on each argument
                                                                                                      • every – like xargs, and find -exec foo +` – call the minimal number of processes, but exhaust all arguments

                                                                                                      So something like

                                                                                                      proc myproc { echo $1 }   # passed one arg
                                                                                                      find . | each -- myproc  # call a proc/shell function on each file, newlines are the default
                                                                                                      
                                                                                                      proc otherproc { echo @ARGV }  # passed many args
                                                                                                      find . | every -- otherproc  # call the minimal number of processes
                                                                                                      

                                                                                                      If anyone has feedback I’m interested. Or wants to implement it :)


                                                                                                      Probably should add this to the blog post: Why use xargs instead of a loop?

                                                                                                      1. It’s easier to preview what you’re doing by sticking echo on the beginning of the command. You’re decomposing the logic of which things to iterate on, and what work to do.
                                                                                                      2. When the work is independent, you can parallelize with xargs -P
                                                                                                      3. You can filter the work with grep. Instead of find | xargs, do find | grep | xargs. This composes very nicely
                                                                                                  1. 8

                                                                                                    I’ve been under the impression that Racket-on-Chez is mostly a refactoring project to reduce the amount of C code. There are some low-level stuff that you have to do in C, but they want to do as much intermediate-level stuff as possible in a high-level Lisp language.

                                                                                                    Chez Scheme happens to provide a good foundation for all the low-level stuff and some intermediate-level stuff, so they chose it as a basis.

                                                                                                    If you look at the comparison diagram on page 78:3 of https://www.cs.utah.edu/plt/publications/icfp19-fddkmstz.pdf, they have clearly achieved that goal.

                                                                                                    1. 4

                                                                                                      Right but the point of the article is that they have to maintain Chez scheme fork now too? They don’t get that for free, but rather “inherit” it. That could be better than not doing it, of course.

                                                                                                      I looked at the Racket source code when this project started, and I was surprised to learn that it was indeed hundreds of thousands of lines of C code, just like Python, Ruby, Perl, and R! For some reason I had assumed that Lisp dialects would be more bootstrapped :)

                                                                                                      I guess you can’t underestimate the speed of C, and if you want to get rid of that, you have to replace it with something pretty advanced like Chez Scheme, which has its own way of solving the portability problem. AFAIU that’s why Cisco uses it and acquired it… because the Chez Scheme compiler supports a lot of weird hardware variants.

                                                                                                      1. 8

                                                                                                        Right but the point of the article is that they have to maintain Chez scheme fork now too?

                                                                                                        Yes, but this article seems to suggest that this is not much better than before. In fact it is much better than before:

                                                                                                        • There is much less C code
                                                                                                        • A lot of intermediate-level stuff is now in Scheme
                                                                                                        • Even though it’s a fork, it still largely resembles the Chez Scheme codebase, so anyone who has worked in Chez Scheme’s codebase will have an easier time contributing to Racket’s codebase

                                                                                                        Compare this the BSD situation: the 3 major BSDs (FreeBSD, OpenBSD, FreeBSD) forked a long time ago, and they never intend to keep full compatibility with each other; but their code is still similar enough for developers to cherrypick changes, and for contributors to move from one BSD to another.

                                                                                                        1. 1

                                                                                                          SBCL is self hosted iirc

                                                                                                      1. 8

                                                                                                        Wow it’s crazy that Chez Scheme is from 1984 and Racket is from 1995!

                                                                                                        I can definitely see that maintaining a big language implementation is a huge thing to ask, especially for 5 CS professors who are doing many other things like teaching, reviewing papers, serving on committees, etc.

                                                                                                        1. 6

                                                                                                          Even better, Chez Scheme goes back to Z80 processors. Many folks said you must stick entirely with C on weak hardware. They schemed up something better. Their historical paper (pdf) was full of interesting stuff.

                                                                                                        1. 3

                                                                                                          I use this, it’s very effective!

                                                                                                          I wrote my own graphical scripts at some point with a treemap, and then tried this and realized it’s good enough in practice :) It’s fast.

                                                                                                          1. 2

                                                                                                            This is a long article and I didn’t digest all of it, partly because I’m not great at reading Ruby (or SQL for that matter).

                                                                                                            This bit jumped out at me, and seems similar to the points in the recent Against SQL about the SQL language being non-compositional:

                                                                                                            Sequel’s join does not correspond to a algebraic join of its operands. Instead, its specification looks like “adds a term to the SQL query’s FROM clause”

                                                                                                            To be fair… There is a way to use SQL (and, sometimes, those libraries) so as to avoid the problem described here. It amounts at using SQL in a purely algebraic way. Unfortunately, that way is not idiomatic and leads to complex SQL queries, that may have bad execution plans (at least in major open-source DBMSs

                                                                                                            I wonder if @jamii is famliar with this work and has any comments on it?

                                                                                                            Related past posts:

                                                                                                            Successor to Alf which still seems active: https://github.com/enspirit/bmg

                                                                                                            1. 2

                                                                                                              The interface is roughly similar to spark, flink, kafka streams, differential dataflow etc. It seems like the main contribution here is being able to compile that to sql without introducing crazy nested queries that the planner handles poorly. I’d be interested to see how that’s done.

                                                                                                              This kind of query-as-a-library interface is nice because you get to pawn a lot of the work off on to the host language. The ‘Reconciling heterogeneous type systems’ section near the bottom hints at the downsides. Functions in the host language are totally opaque, so they can’t be optimized or sent to the db. This also means that you can’t really do subqueries, which I find are often the easiest way to write many queries.

                                                                                                              I still haven’t decided whether it’s worth it in the long run to have to build a whole language just to get subqueries though. I’m experimenting with both approaches - language-integrated in https://github.com/jamii/dida and a full language in https://github.com/jamii/imp.

                                                                                                              1. 1

                                                                                                                Thanks for the info, the language vs. library issues make sense. I was more interested in how they avoid composition pitfalls of other DB access layers / SQL compilers.

                                                                                                                AFAIK spark and kafka don’t compile to SQL and instead implement their own subset. (There were several quirky SQL subsets at Google too for big data, and at some point people tried to unify them under a single language. Not sure if that succeeded.)

                                                                                                                But probably the barrier to that is my Ruby and SQL knowledge, as mentioned.

                                                                                                                BTW I loved the “Against SQL” article – here was my comment, pointing to some resources about dplyr, Tidy Data, and Data Frames, including critiques by the database community (the article mentioned pandas but R is where those ideas originated):

                                                                                                                https://news.ycombinator.com/item?id=27795877

                                                                                                                I only use dplyr with the “native” engine; the SQL engine does seem to be a bit hacky. They don’t really deal with the semantic issues.

                                                                                                                https://dbplyr.tidyverse.org/articles/sql-translation.html

                                                                                                                Perfect translation is not possible because databases don’t have all the functions that R does. The goal of dplyr is to provide a semantic rather than a literal translation: what you mean rather than what is done. In fact, even for functions that exist both in databases and R, you shouldn’t expect results to be identical; database programmers have different priorities than R core programmers. For example, in R in order to get a higher level of numerical accuracy, mean() loops through the data twice. R’s mean() also provides a trim option for computing trimmed means; this is something that databases do not provide.


                                                                                                                BTW Oil will likely grow a type for table-like data: https://github.com/oilshell/oil/wiki/Structured-Data-in-Oil

                                                                                                                However I would view it as “pre” structured or “pre” relational – i.e. a way of cleaning/preparing/filtering data. Analysis can be done by “proper” relational systems or R / Pandas.

                                                                                                                1. 2

                                                                                                                  here was my comment,

                                                                                                                  Ah, it’s impossible to read hn comments without the ‘unread’ marker that lobsters has. All the people who comment before reading get in there first and clutter up the comments and then it’s too late to find the thoughtful comments.

                                                                                                                  I was more interested in how they avoid composition pitfalls

                                                                                                                  That does seem like the rub for compiling to sql. For my own projects I think it will be easier to write my own execution engine but target eg sqlite storage. But I can see how compiling to sql is appealing.

                                                                                                                  AFAIK spark and kafka don’t compile to SQL and instead implement their own subset.

                                                                                                                  Yeah, the library version is roughly equivalent to directly specifying a query plan, and their sql subsets compile down to that. I think the library versions are interesting, because they are so much lower effort both to implement and adopt compared to a new query language. But flink et al are limited to jvm languages. I’m playing around in dida with trying to be a good citizen in many runtimes - there are zig, node and wasm<->js bindings so far. I’m not sure how practical this is compared to starting an entire new language but then at least getting to share more code between projects.

                                                                                                                  pointing to some resources about dplyr, Tidy Data, and Data Frames

                                                                                                                  I’ve used r dataframes a tiny amount, but I’m much more familiar with pandas. I have seen the “Is a dataframe just a table” and your “What is a dataframe” before - both were very useful.