1. 3

    I had trouble following all this (you’ve read the Common Lisp spec way more closely than I ever bothered to), but you might be interested in John Shutt’s Kernel language. To avoid unhygienic macros, Kernel basically outlaws quasiquote and unquote and constructs all macros out of list, cons and so on. Which has the same effect as unquoting everything. A hyperstatic system where symbols in macros always expand to their binding at definition time, never to be overridden. Implying among other things that you can never use functions before defining them.

    There’s a lot I love about Kernel (it provides a uniform theory integrating functions and macros and intermediate beasts) but the obsession with hygiene is not one of them. I took a lot of inspiration from Kernel in my Lisp with first-class macros, but I went all the way in the other direction and supported only macros with quasiquote and unquote. You can define symbols in any order in Wart, and override any symbols at any time, including things like if and cons. The only things you can’t override are things that look like punctuation. Parens, quote, quasiquote, unquote, unquote-splice, and a special symbol @ for apply analogous to unquote-splice. Wart is even smart enough to support apply on macros, something Kernel couldn’t do – as long as your macros are defined out of quasiquote and unquote. I find this to be a sort of indirect sign that it gets closer to the essence of macros by decoupling them into their component pieces like Kernel did, but without complecting them with concerns of hygiene.

    (Bel also doesn’t care about hygienic macros and claims to support fully first-class apply on macros. Though I don’t understand how Bel’s macroexpand works in spite of some effort in that direction.)

    1. 2

      To avoid unhygienic macros, Kernel basically outlaws quasiquote and unquote and constructs all macros out of list, cons and so on.

      It’s easy to write unhygenic macros without quasiquote. Does Kernel also outlaw constructing symbols?

      1. 3

        No, looks like page 165 of the Kernel spec does provide string->symbol.

        1. 1

          Doesn’t that seem like a big loophole that would make it easy to be unhygenic?

          1. 2

            Depends on what you’re protecting against. Macros are fundamentally a convenience. As I understand the dialectic around hygienic macros, the goal is always just to add guardrails to the convenient path, not to make the guardrails mandatory. Most such systems deliberately provide escape hatches for things like anaphoric macros. So I don’t think I’ve ever heard someone say hygiene needs to be an ironclad guarantee.

            1. 1

              Honestly I agree with the inclusion of escape hatches if they are unlikely to be hit accidentally; I’m just surprised that the Kernel developers also agree, since they took such a severe move as to disallow quasiquote altogether.

              So I don’t think I’ve ever heard someone say hygiene needs to be an ironclad guarantee.

              I don’t want to put words in peoples’ mouths, but I’m pretty sure this is the stance of most Racket devs.

              1. 3

                Not true, because Scheme’s syntax-rules explicitly provides an escape hatch for literals, which can be used to violate hygiene in a deliberate manner. Racket implements syntax-rules.

                On the other hand, you’re absolutely right that they don’t make it easy. I have no idea what to make of anaphoric macros like this one from the anaphoric package.

                1. 3

                  Racket doesn’t forbid string->symbol either, it just provides it with some type-safe scaffolding called syntax objects. We can definitely agree that makes it more difficult to use. But the ‘loophole’ does continue to exist.

                  I’m not aware of any macro in Common Lisp that cannot be implemented in Racket (modulo differences in the runtimes like Lisp-1 vs Lisp-2, property lists, etc.) It just gets arbitrarily gnarly.

                  1. 2

                    Thanks for the clarification. I have attempted several times to understand Racket macros but never really succeeded because it’s just so much more complicated compared to the systems I’m familiar with.

                    1. 3

                      Yeah, I’m totally with you. They make it so hard that macros are used a lot less in the Scheme world. If you’re looking to understand macros, I’d recommend a Lisp that’s not a Scheme. I cut my teeth on them using Arc Lisp, which was a great experience even though Arc is a pretty thin veneer over Racket.

                      1. 2

                        Have you read Fear of Macros? Also there is Macros and Languages in Racket which takes a more exercise based approach.

                        1. 5

                          Have you read Fear of Macros?

                          At least twice.

                          Nowadays when I need a Racket macro I just show up in #racket and say “boy, this sure is easy to write using defmacro, too bad hygenic macros are so confusing” and someone will be like “they’re not confusing! all you have to do is $BLACK_MAGIC” and then boom; I have the macro I need.

          2. 1

            To avoid unhygienic macros

            Kernel does not avoid unhygienic macros. Whereas Scheme R6RS syntax-case makes it more difficult to write unhygienic macros but still possible. It possible to write unhygienic code with Kernel, such defining define-macro without using or the need for quasiquote et al.

            Kernel basically outlaws quasiquote and unquote

            Kernel does not outlaw quasiquote and unquote semantic. There is $quote and unquote is merely (eval symbol env), whereas quasiquote is just a reader trick inside Scheme (also see [0]).

            and constructs all macros out of list, cons and so on.

            Yes an no.

            Scheme macros, and even CL macros are meant a) a hook into the compiler to speed things up e.g. compose, or clojure’s =>, or b) change the prefix-based evaluation strategy to build, so called, Domain Specific Languages such as records eg. SRFI-9.

            Kernel eliminates the need to think “this a macro or is this procedure”, instead everything is an operative, it is up the interpreter or compiler to figure what can be compiled (ahead-of-time) or not, which is slightly more general that everything is a macro, at least because an operative as access to the dynamic scope.

            Based on your comment description, Wart is re-inventing Kernel or something like that (without formal description unlike John Shutt).

            re apply for macros: read page 67 at https://ftp.cs.wpi.edu/pub/techreports/pdf/05-07.pdf

            [0] https://github.com/cisco/ChezScheme/blob/main/s/syntax.ss#L7644

            1. 1

              Page 67 of the Kernel Report says macros don’t need apply because they don’t evaluate their arguments. I think that’s wrong because macros can evaluate their arguments when unquoted. Indeed, most macro args are evaluated eventually, using unquote. In the caller’s environment. Most of the value of macros lies in selectively turning off eval for just the odd arg. And macros are most of the use of fexprs, as far as I’ve been able to glean.

              Kernel eliminates the need to think “this a macro or is this procedure”

              Yes, that’s the goal. But it doesn’t happen for apply. I kept running into situations where I had to think about whether the variable was a macro. Often, within the body of a higher-order function/macro, I just didn’t know. So the apply restriction spread through my codebase until I figured this out.

              I spent some time trying to find a clean example where I use @ on macros in Wart. Unfortunately this capability is baked into Wart so deeply (and Wart is so slow, suffering from the combinatorial explosion of every fexpr-based Lisp) that it’s hard to explain. But Wart provides the capability to cleanly extend even fundamental operations like if and def and mac, and all these use the higher-order functions on macros deep inside their implementations.

              For example, here’s a definition where I override the pre-existing with macro to add new behavior when it’s called with (with table ...): https://github.com/akkartik/wart/blob/main/054table.wart#L54

              The backtick syntax it uses there is defined in https://github.com/akkartik/wart/blob/main/047generic.wart, which defines these advanced forms for defining functions and macros:

              def (_function_ ... _args_) :case _predicate_
                _body_
              
              mac (_function_ ... _args_) :case _predicate_
                _body_
              
              mac (_function_ `_literal_symbol_ ... _args_) :case _predicate_
                _body_
              

              That file overrides this basic definition of mac: https://github.com/akkartik/wart/blob/main/040.wart#L30

              Which is defined in terms of mac!: https://github.com/akkartik/wart/blob/main/040.wart#L1

              When I remove apply for macros, this definition no longer runs, for reasons I can’t easily describe.

              As a simpler example that doesn’t use apply for macros, here’s where I extend the primitive two-branch if to support multiple branches: https://github.com/akkartik/wart/blob/main/045check.wart#L1

              Based on your comment description, Wart is re-inventing Kernel or something like that (without formal description unlike John Shutt).

              I would like to think I reimplemented the core idea of Kernel ($vau) while decoupling it from considerations of hygiene. And fixed apply in the process. Because my solution to apply can’t work in hygienic Kernel.

              I don’t making any claim of novelty here. I was very much inspired by the Kernel dissertation. But I found the rest of its language spec.. warty :D

            2. 1

              Promoting solely unhygenic macros, is similar as far as I understand, to promote “code formal proof are useless” or something similar about ACID or any kind guarantees a software might provide.

              Both Scheme, and Kernel offer the ability to bypass the default hygienic behavior, and hence promote, first, a path of least surprise (and hard to find bugs), and allow the second (aka. prolly shoot yourself in the foot at some point).

              1. 1

                At least for me, the value of Lisp is in its late bound nature during the prototyping phase. So the useability is top priority. Compromising useability with more complicated macro syntax (resulting in far fewer people defining macros, as happens in the scheme world) for better properties for mature programs seems a poor trade-off. And yes, I don’t use formal methods while prototyping either.

                1. 1

                  Syntax rules are not much more complicated to use than define-macro, ref: https://www.gnu.org/software/guile/manual/html_node/Syntax-Rules.html

                  The only drawback of hygienic macro that I know about is that is more difficult to implement than define-macro, but again I do know everything about macros.

                  ref: https://gitlab.com/nieper/unsyntax/

                  1. 1

                    We’ll have to agree to disagree about syntax-rules. Just elsewhere on this thread there’s someone describing their various attempts to unsuccessfully use macros in Scheme. I have had the same experience. It’s not just the syntax of syntax-rules. Scheme is pervasively designed (like Kernel) with hygiene in mind. It makes for a very rigid language, with things like the phase separation rules, that is the antithesis of the sort of “sketching” I like to use Lisp for.

            1. 4

              This is probably really out of date now, but it is an implementation of javascript in Racket (https://docs.racket-lang.org/javascript/index.html) written by Dave Herman

              1. 2

                Thanks! Added!

                1. 2

                  In a similar vein, check out JSCert, JS-2-GIL, and KJS. I believe Gillian is the only actively developed semantics….

                  1. 2

                    Amazing! I was getting so few replies with research implementations. Thank you!

              1. 3

                I’m genuinely interested if that GUI can be used with framebuffer “backend” on Linux for embedded devices, using the DRM only.

                1. 2

                  Currently it isn’t possible. It would require implementing the base widgets (rendering and input events.) Part of an implementation could be simplified by using the existing racket/draw library which sits on top of cairo.

                1. 42

                  Eh, there are some problems with xargs, but this isn’t a good critique. First off it proposes a a “solution” that doesn’t even handle spaces in filenames (much less say newlines):

                  rm $(ls | grep foo)
                  

                  I prefer this as a practical solution (that handles every char except newlines in filenames):

                  ls | grep foo | xargs -d $'\n' -- rm
                  

                  You can also pipe find . -print0 to xargs -0 if you want to handle newlines (untrusted data).

                  (Although then you have the problem that there’s no grep -0, which is why Oil has QSN. grep still works on QSN, and QSN can represent every string, even those with NULs!)


                  One nice thing about xargs is that you can preview the commands by adding ‘echo’ on the front:

                  ls | grep foo | xargs -d $'\n' -- echo rm
                  

                  That will help get the tokenization right, so you don’t feed the wrong thing into the commands!

                  I never use xargs -L, and I sometimes use xargs -I {} for simple invocations. But even better than that is using xargs with the $0 Dispatch pattern, which I still need properly write about.

                  Basically instead of the mini language of -I {}, just use shell by recursively invoking shell functions. I use this all the time, e.g. all over Oil and elsewhere.

                  do_one() {
                     # It's more flexible to use a function with $1 instead of -I {}
                     echo "Do something with $1"  
                     echo mv $1 /tmp
                  }
                  
                  do_all() {
                    # call the do_one function for each item.  Also add -P to make it parallel
                    cat tasks.txt | grep foo | xargs -n 1 -d $'\n' -- $0 do_one
                  }
                  
                  "$@"  # dispatch on $0; or use 'runproc' in Oil
                  

                  Now run with

                  • myscript.sh do_all, or
                  • my_script.sh do_one to test out the “work” function (very handy! you need to make this work first)

                  This separates the problem nicely – make it work on one thing, and then figure out which things to run it on. When you combine them, they WILL work, unlike the “sed into bash” solution.


                  Reading up on what xargs -L does, I have avoided it because it’s a custom mini-language. It says that trailing blanks cause line continuations. Those sort of rules are silly to me.

                  I also avoid -I {} because it’s a custom mini-language.

                  IMO it’s better to just use the shell, and one of these three invocations:

                  • xargs – when you know your input is “words” like myhost otherhost
                  • xargs -d $'\n' – when you want lines
                  • xargs -0 – when you want to handle untrusted data (e.g. someone putting a newline in a filename)

                  Those 3 can be combined with -n 1 or -n 42, and they will do the desired grouping. I’ve never needed anything more than that.

                  So yes xargs is weird, but I don’t agree with the author’s suggestions. sed piped into bash means that you’re manipulating bash code with sed, which is almost impossible to do correctly.

                  Instead I suggest combining xargs and shell, because xargs works with arguments and not strings. You can make that correct and reason about what it doesn’t handle (newlines, etc.)

                  (OK I guess this is a start of a blog post, I also gave a 5 minute presentation 3 years ago about this: http://www.oilshell.org/share/05-24-pres.html)

                  1. 10

                    pipe find . -print0 to xargs -0

                    I use find . -exec very often for running a command on lots of files. Why would you choose to pipe into xargs instead?

                    1. 12

                      It can be much faster (depending on the use case). If you’re trying to rm 100,000 files, you can start one process instead of 100,000 processes! (the max number of args to a process on Linux is something like 131K as far as I remember).

                      It’s basically

                      rm one two three
                      

                      vs.

                      rm one
                      rm two
                      rm three
                      

                      Here’s a comparison showing that find -exec is slower:

                      https://www.reddit.com/r/ProgrammingLanguages/comments/frhplj/some_syntax_ideas_for_a_shell_please_provide/fm07izj/

                      Another reference: https://old.reddit.com/r/commandline/comments/45xxv1/why_find_stat_is_much_slower_than_ls/

                      Good question, I will add this to the hypothetical blog post! :)

                      1. 15

                        @andyc Wouldn’t the find + (rather than ;) option solve this problem too?

                        1. 5

                          Oh yes, it does! I don’t tend to use it, since I use xargs for a bunch of other stuff too, but that will also work. Looks like busybox supports it to in addition to GNU (I would guess it’s in POSIX).

                        2. 11

                          the max number of args to a process on Linux is something like 131K as far as I remember

                          Time for the other really, really useful feature of xargs. ;)

                          $ echo | xargs --show-limits
                          Your environment variables take up 2222 bytes
                          POSIX upper limit on argument length (this system): 2092882
                          POSIX smallest allowable upper limit on argument length (all systems): 4096
                          Maximum length of command we could actually use: 2090660
                          Size of command buffer we are actually using: 131072
                          Maximum parallelism (--max-procs must be no greater): 2147483647
                          

                          It’s not a limit on the number of arguments, it’s a limit on the total size of environment variables + command-line arguments (+ some other data, see getauxval(3) on a Linux machine for details). Apparently Linux defaults to a quarter of the available stack allocated for new processes, but it also has a hard limit of 128KiB on the size of each individual argument (MAX_ARG_STRLEN). There’s also MAX_ARG_STRINGS which limits the number of arguments, but it’s set to 2³¹-1, so you’ll hit the ~2MiB limit first.

                          Needless to say, a lot of these numbers are much smaller on other POSIX systems, like BSDs or macOS.

                        3. 1

                          find . -exec blah will fork a process for each file, while find . | xargs blah will fork a process per X files (where X is the system wide argument length limit). The later could run quite a bit faster. I will typically do find . -name '*.h' | xargs grep SOME_OBSCURE_DEFINE and depending upon the repo, that might only expand to one grep.

                          1. 5

                            As @jonahx mentions, there is an option for that in find too:

                                 -exec utility [argument ...] {} +
                                         Same as -exec, except that ``{}'' is replaced with as many pathnames as possible for each invocation of utility.  This
                                         behaviour is similar to that of xargs(1).
                            
                              1. 4

                                That is the real beauty of xargs. I didn’t know about using + with find, and while that’s quite useful, remembering it means I need to remember something that only works with find. In contrast, xargs works with anything they can supply a newline-delimited list of filenames as input.

                                1. 3

                                  Yes, this. Even though the original post complains about too many features in xargs, find is truly the worst with a million options.

                        4. 7

                          This comment was a great article in itself.

                          Conceptually, I think of xargs primarily as a wrapper that enables tools that don’t support stdin to support stdin. Is this a good way to think about it?

                          1. 9

                            Yes I’d think of it as an “adapter” between text streams (stdin) and argv arrays. Both of those are essential parts of shell and you need ways to move back and forth. To move the other way you can simply use echo (or write -- @ARGV in Oil).

                            Another way I think of it is to replace xargs with the word “each” mentally, as in Ruby, Rust, and some common JS idioms.

                            You’re basically separating iteration from the logic of what to do on each thing. It’s a special case of a loop.

                            In a loop, the current iteration can depend on the previous iteration, and sometimes you need that. But in xargs, every iteration is independent, which is good because you can add xargs -P to automatically parallelize it! You can’t do that with a regular loop.


                            I would like Oil to grow an each builtin that is a cleaned up xargs, following the guidelines I enumerated.

                            I’ve been wondering if it should be named each and every?

                            • each – like xargs -n 1, and find -exec foo \; – call a process on each argument
                            • every – like xargs, and find -exec foo +` – call the minimal number of processes, but exhaust all arguments

                            So something like

                            proc myproc { echo $1 }   # passed one arg
                            find . | each -- myproc  # call a proc/shell function on each file, newlines are the default
                            
                            proc otherproc { echo @ARGV }  # passed many args
                            find . | every -- otherproc  # call the minimal number of processes
                            

                            If anyone has feedback I’m interested. Or wants to implement it :)


                            Probably should add this to the blog post: Why use xargs instead of a loop?

                            1. It’s easier to preview what you’re doing by sticking echo on the beginning of the command. You’re decomposing the logic of which things to iterate on, and what work to do.
                            2. When the work is independent, you can parallelize with xargs -P
                            3. You can filter the work with grep. Instead of find | xargs, do find | grep | xargs. This composes very nicely
                        1. 2

                          Cool. A bit like the old MH mail client system.