Why oh why did they waste the opportunity to name the language Ni?
…srsly, it looks pretty cool. Shrubbery notation addresses the syntactic ugliness that’s kept me away from Lisps. A lot of the macro stuff in the paper goes over my head — I can barely figure out how to use “…” in C++ templates — but it looks very powerful. Definitely going to try out Ni Rhombus when I get a chance.
[update: turns out I made the same joke a year ago in the thread @5d22b linked to. What can I say, I watched too much Monty Python in my impressionable youth.]
Thank you for linking the paper: I saw this on the OOPSLA site yesterday but couldn’t find a link; where was this sourced (so i know where to look in the future)?
I think that link was via some sleuthing someone did on Reddit. Here is a version that is close to what will be published. The only changes should be spelling and grammar.
Hoot depends on both the tail call and GC Wasm extensions. On the GC side, Hoot will emit extra instructions to describe its types according to the Wasm GC spec and then the host VM can do the collecting.
I’m not sure where to put it, but the entire subthread is missing a look at the roadmap, which clearly shows that most runtimes do not have TC or GC available yet. Only Chrome can do it, and only with a special flag.
Yes, that’s the current status today. Firefox is in the process of actively implementing both. The proposals themselves appear to be progressing well through the standards process. Two web VMs are required to be implementing a proposal to progress through the later stages. The features should be default enabled once standardised, assuming no obstacles appear. People involved in the proposals process have estimated they should be available by the end of the year.
I believe other non-web engines are also working on these proposals, though I forget which at the moment.
Scheme makes use of tail calls and GC, and those extensions are on track to be generally available in common Wasm engines this year, so it seems like a reasonable design choice to target them now.
IIRC rolling your own GC in WASM is quite awkward/difficult, especially if you consider object references between containers. There was a post about it a month or so ago (i don’t remember the details; maybe it was the previous Spritely post?)
Also, the major WASM runtimes already contain world-class GCs, and run a GC’d language, so exposing those GCs to WASM seems a good idea for performance and interop.
(But I do get your point about piling on features that other WASM runtimes now need to add! Fortunately GC isn’t hard to implement, if you don’t care about world-class performance. I’ve done it twice this year for my smol_world project.)
Ah hmm… For these particular abilities though, it’s at the very least quite hard or potentially impossible to get the same result without some kind of engine support.
For the case of tail calls, it’s much more natural to express recursive programs (esp. ones in Scheme written to expect tail calls) in this style. Perhaps custom stack management at run-time or other hefty program transformations can be used as a workaround, but the concept of tail calls is fairly straightforward and the host engine complexity appears to be small.
For the case of GC, the proposal is far more complex, so I can understand hesitation in terms of complexity… At the same time, allowing Wasm programs to leverage the existing engine GC does make implementation drastically simpler (for languages expecting GC). It importantly also makes it possible to describe cycles between host engines data and Wasm program data which just wasn’t possible before.
I suppose a generalised version of your concern might be that you don’t want every language feature to become a Wasm extension, and I agree with that general sentiment. In the case of tail calls and GC though, they feel (to me at least) sufficiently useful to a variety of languages and allow the Wasm engine to enable a use cases that may be impossible (or very hard) otherwise.
Tail call elimination simplifies things a lot, and there’s very little reason not to have it… how to do it has been known for a long time. GCC even supports it for C in many cases, IIRC.
At any rate, I won’t go into the GC proposal stuff in depth, but here are some good motivators to see that work advance:
It means many more languages being able to become first class citizens in browser-space
It adds certain reference-integrity safety abilities to WASM languages, important and useful for ocap reasons
Usually some kind of efficient GC is already available. In the browser especially. Why not expose it?
It means being able to have a shared heap. This is really useful for garbage collection reasons across programs. Javascript and other languages can instantiate and share references without needing to duplicate or having very difficult to deal with cycle detection and elimination problems.
I’m not super up-to-date, but as I understand it, GC will be in consumer browsers by Q4 2023 (according to Andy Wingo.) And is available in development builds currently, so language implementers may want to start targeting it now.
“When someone calls a language modern, it tells you next to nothing about the language, but it tells you a fair bit about the person who said it.”
That said, Racket has a few clunky features due to its age. The class system feels very dated, and the fact that most short list operations only work on lists and not general sequence types isn’t great. The latter is somewhat addressed by the “for” family of macros but IIRC it’s something the maintainers would have done differently if they had a do over.
“When someone calls a language modern, it tells you next to nothing about the language, but it tells you a fair bit about the person who said it.”
YES! I think the term “modern” is a thought terminating cliche. What does it really mean? If you had a “modern” language, write a book about it, describe it as “modern”, and 20 years passes what does the term “modern” mean to readers?
It just shuts down conversations because no one wants argue against it.
Agreed. I’ve dug through too many used bookstores and old libraries full of books with titles like “Modern Pascal Programming For MS-DOS 4.0” to want to use it as a term.
Most of my time is now spent using Racket in places where I could use a shell script. It’s easier to write a Racket program that invokes other programs and work with their error codes and re-direct their output to the right places. Truly a joy for me, personally, as I do like writing Lisp.
For the most part, a lot of features in the Racket library do not need sub-processes to do those types of jobs.
For grep we have regexp objects which employ either racket-match or racket-match? to match across strings or filter.
seq can be mimicked by using a range function to iterate combined using expressions like for.
sort is done by using the appropriately named Racket function sort and changing the comparison function and input list.
If you want to sub-process invoke programs, then the output of a subprocess call can only be sent to a file stream like stdout or a plain file. To invoke multiple sub-processes one after another and continuously pass their outputs to one another involves a little bit of trickery which might be a bit complex to talk about in a comment, but it is do-able. The gist is to try to write tasks using the Racket standard library, then use subprocess when you need something not covered by it.
; display all files in pwd
(for-each displayln (map path->string (directory-list)))
; display all files sorted
(for-each displayln
(sort (map path->string (directory-list)) string<?))
; regexp match over a list of sorted files
(for-each displayln
(filter (λ (fname) (regexp-match? #rx".*png" fname))
(sort (map path->string (directory-list)) string<?)))
As posted in a sibling message, it’s much easier to use built-in functions than to shell out and call another program. Personally, I find Racket more convenient for writing scripts that need to work in parallel. For example, a script gets the load average from several machines in parallel over ssh.
Best way I can quickly sum it up is clever use of the function subprocess in Racket.
(define (start-and-run bin . args)
(define-values (s i o e)
(apply subprocess
`(,(current-output-port) ,(current-input-port) stdout
,(find-executable-path "seq")
,@args)))
(subprocess-wait s))
(start-and-run "seq" "1" "10")
This outputs the seq command to stdout, and allows for arbitrary commands so you can do zero-arg sub-processes or however many you need/like. The current-output-port and current-input-port calls are parameters that you can adjust by using a parameterize block to control the input/output from the exterior.
The output port must be set to a file, it cannot be set to an output string like with call-with-output-string, so output is either going to go straight to stdout, or you can use call-with-output-file to control the current-output-port parameter and store the output wherever you please.
I had trouble following all this (you’ve read the Common Lisp spec way more closely than I ever bothered to), but you might be interested in John Shutt’s Kernel language. To avoid unhygienic macros, Kernel basically outlaws quasiquote and unquote and constructs all macros out of list, cons and so on. Which has the same effect as unquoting everything. A hyperstatic system where symbols in macros always expand to their binding at definition time, never to be overridden. Implying among other things that you can never use functions before defining them.
There’s a lot I love about Kernel (it provides a uniform theory integrating functions and macros and intermediate beasts) but the obsession with hygiene is not one of them. I took a lot of inspiration from Kernel in my Lisp with first-class macros, but I went all the way in the other direction and supported only macros with quasiquote and unquote. You can define symbols in any order in Wart, and override any symbols at any time, including things like if and cons. The only things you can’t override are things that look like punctuation. Parens, quote, quasiquote, unquote, unquote-splice, and a special symbol @ for apply analogous to unquote-splice. Wart is even smart enough to support apply on macros, something Kernel couldn’t do – as long as your macros are defined out of quasiquote and unquote. I find this to be a sort of indirect sign that it gets closer to the essence of macros by decoupling them into their component pieces like Kernel did, but without complecting them with concerns of hygiene.
(Bel also doesn’t care about hygienic macros and claims to support fully first-class apply on macros. Though I don’t understand how Bel’s macroexpand works in spite of some effort in that direction.)
Depends on what you’re protecting against. Macros are fundamentally a convenience. As I understand the dialectic around hygienic macros, the goal is always just to add guardrails to the convenient path, not to make the guardrails mandatory. Most such systems deliberately provide escape hatches for things like anaphoric macros. So I don’t think I’ve ever heard someone say hygiene needs to be an ironclad guarantee.
Honestly I agree with the inclusion of escape hatches if they are unlikely to be hit accidentally; I’m just surprised that the Kernel developers also agree, since they took such a severe move as to disallow quasiquote altogether.
So I don’t think I’ve ever heard someone say hygiene needs to be an ironclad guarantee.
I don’t want to put words in peoples’ mouths, but I’m pretty sure this is the stance of most Racket devs.
Racket doesn’t forbid string->symbol either, it just provides it with some type-safe scaffolding called syntax objects. We can definitely agree that makes it more difficult to use. But the ‘loophole’ does continue to exist.
I’m not aware of any macro in Common Lisp that cannot be implemented in Racket (modulo differences in the runtimes like Lisp-1 vs Lisp-2, property lists, etc.) It just gets arbitrarily gnarly.
Thanks for the clarification. I have attempted several times to understand Racket macros but never really succeeded because it’s just so much more complicated compared to the systems I’m familiar with.
Yeah, I’m totally with you. They make it so hard that macros are used a lot less in the Scheme world. If you’re looking to understand macros, I’d recommend a Lisp that’s not a Scheme. I cut my teeth on them using Arc Lisp, which was a great experience even though Arc is a pretty thin veneer over Racket.
Nowadays when I need a Racket macro I just show up in #racket and say “boy, this sure is easy to write using defmacro, too bad hygenic macros are so confusing” and someone will be like “they’re not confusing! all you have to do is $BLACK_MAGIC” and then boom; I have the macro I need.
Kernel does not avoid unhygienic macros. Whereas Scheme R6RS syntax-case makes it more difficult to write unhygienic macros but still possible. It possible to write unhygienic code with Kernel, such defining define-macro without using or the need for quasiquote et al.
Kernel basically outlaws quasiquote and unquote
Kernel does not outlaw quasiquote and unquote semantic. There is $quote and unquote is merely (eval symbol env), whereas quasiquote is just a reader trick inside Scheme (also see [0]).
and constructs all macros out of list, cons and so on.
Yes an no.
Scheme macros, and even CL macros are meant a) a hook into the compiler to speed things up e.g. compose, or clojure’s =>, or b) change the prefix-based evaluation strategy to build, so called, Domain Specific Languages such as records eg. SRFI-9.
Kernel eliminates the need to think “this a macro or is this procedure”, instead everything is an operative, it is up the interpreter or compiler to figure what can be compiled (ahead-of-time) or not, which is slightly more general that everything is a macro, at least because an operative as access to the dynamic scope.
Based on your comment description, Wart is re-inventing Kernel or something like that (without formal description unlike John Shutt).
Page 67 of the Kernel Report says macros don’t need apply because they don’t evaluate their arguments. I think that’s wrong because macros can evaluate their arguments when unquoted. Indeed, most macro args are evaluated eventually, using unquote. In the caller’s environment. Most of the value of macros lies in selectively turning off eval for just the odd arg. And macros are most of the use of fexprs, as far as I’ve been able to glean.
Kernel eliminates the need to think “this a macro or is this procedure”
Yes, that’s the goal. But it doesn’t happen for apply. I kept running into situations where I had to think about whether the variable was a macro. Often, within the body of a higher-order function/macro, I just didn’t know. So the apply restriction spread through my codebase until I figured this out.
I spent some time trying to find a clean example where I use @ on macros in Wart. Unfortunately this capability is baked into Wart so deeply (and Wart is so slow, suffering from the combinatorial explosion of every fexpr-based Lisp) that it’s hard to explain. But Wart provides the capability to cleanly extend even fundamental operations like if and def and mac, and all these use the higher-order functions on macros deep inside their implementations.
Based on your comment description, Wart is re-inventing Kernel or something like that (without formal description unlike John Shutt).
I would like to think I reimplemented the core idea of Kernel ($vau) while decoupling it from considerations of hygiene. And fixed apply in the process. Because my solution to apply can’t work in hygienic Kernel.
I don’t making any claim of novelty here. I was very much inspired by the Kernel dissertation. But I found the rest of its language spec.. warty :D
Promoting solely unhygenic macros, is similar as far as I understand, to promote “code formal proof are useless” or something similar about ACID or any kind guarantees a software might provide.
Both Scheme, and Kernel offer the ability to bypass the default hygienic behavior, and hence promote, first, a path of least surprise (and hard to find bugs), and allow the second (aka. prolly shoot yourself in the foot at some point).
At least for me, the value of Lisp is in its late bound nature during the prototyping phase. So the useability is top priority. Compromising useability with more complicated macro syntax (resulting in far fewer people defining macros, as happens in the scheme world) for better properties for mature programs seems a poor trade-off. And yes, I don’t use formal methods while prototyping either.
The only drawback of hygienic macro that I know about is that is more difficult to implement than define-macro, but again I do know everything about macros.
We’ll have to agree to disagree about syntax-rules. Just elsewhere on this thread there’s someone describing their various attempts to unsuccessfully use macros in Scheme. I have had the same experience. It’s not just the syntax of syntax-rules. Scheme is pervasively designed (like Kernel) with hygiene in mind. It makes for a very rigid language, with things like the phase separation rules, that is the antithesis of the sort of “sketching” I like to use Lisp for.
Currently it isn’t possible. It would require implementing the base widgets (rendering and input events.) Part of an implementation could be simplified by using the existing racket/draw library which sits on top of cairo.
Eh, there are some problems with xargs, but this isn’t a good critique. First off it proposes a a “solution” that doesn’t even handle spaces in filenames (much less say newlines):
rm $(ls | grep foo)
I prefer this as a practical solution (that handles every char except newlines in filenames):
ls | grep foo | xargs -d $'\n' -- rm
You can also pipe find . -print0 to xargs -0 if you want to handle newlines (untrusted data).
(Although then you have the problem that there’s no grep -0, which is why Oil has QSN. grep still works on QSN, and QSN can represent every string, even those with NULs!)
One nice thing about xargs is that you can preview the commands by adding ‘echo’ on the front:
ls | grep foo | xargs -d $'\n' -- echo rm
That will help get the tokenization right, so you don’t feed the wrong thing into the commands!
I never use xargs -L, and I sometimes use xargs -I {} for simple invocations. But even better than that is using xargs with the $0 Dispatch pattern, which I still need properly write about.
Basically instead of the mini language of -I {}, just use shell by recursively invoking shell functions. I use this all the time, e.g. all over Oil and elsewhere.
do_one() {
# It's more flexible to use a function with $1 instead of -I {}
echo "Do something with $1"
echo mv $1 /tmp
}
do_all() {
# call the do_one function for each item. Also add -P to make it parallel
cat tasks.txt | grep foo | xargs -n 1 -d $'\n' -- $0 do_one
}
"$@" # dispatch on $0; or use 'runproc' in Oil
Now run with
myscript.sh do_all, or
my_script.sh do_one to test out the “work” function (very handy! you need to make this work first)
This separates the problem nicely – make it work on one thing, and then figure out which things to run it on. When you combine them, they WILL work, unlike the “sed into bash” solution.
Reading up on what xargs -L does, I have avoided it because it’s a custom mini-language. It says that trailing blanks cause line continuations. Those sort of rules are silly to me.
I also avoid -I {} because it’s a custom mini-language.
IMO it’s better to just use the shell, and one of these three invocations:
xargs – when you know your input is “words” like myhost otherhost
xargs -d $'\n' – when you want lines
xargs -0 – when you want to handle untrusted data (e.g. someone putting a newline in a filename)
Those 3 can be combined with -n 1 or -n 42, and they will do the desired grouping. I’ve never needed anything more than that.
So yes xargs is weird, but I don’t agree with the author’s suggestions. sed piped into bash means that you’re manipulating bash code with sed, which is almost impossible to do correctly.
Instead I suggest combining xargs and shell, because xargs works with arguments and not strings. You can make that correct and reason about what it doesn’t handle (newlines, etc.)
It can be much faster (depending on the use case). If you’re trying to rm 100,000 files, you can start one process instead of 100,000 processes! (the max number of args to a process on Linux is something like 131K as far as I remember).
It’s basically
rm one two three
vs.
rm one
rm two
rm three
Here’s a comparison showing that find -exec is slower:
Oh yes, it does! I don’t tend to use it, since I use xargs for a bunch of other stuff too, but that will also work. Looks like busybox supports it to in addition to GNU (I would guess it’s in POSIX).
the max number of args to a process on Linux is something like 131K as far as I remember
Time for the other really, really useful feature of xargs. ;)
$ echo | xargs --show-limits
Your environment variables take up 2222 bytes
POSIX upper limit on argument length (this system): 2092882
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2090660
Size of command buffer we are actually using: 131072
Maximum parallelism (--max-procs must be no greater): 2147483647
It’s not a limit on the number of arguments, it’s a limit on the total size of environment variables + command-line arguments (+ some other data, see getauxval(3) on a Linux machine for details). Apparently Linux defaults to a quarter of the available stack allocated for new processes, but it also has a hard limit of 128KiB on the size of each individual argument (MAX_ARG_STRLEN). There’s also MAX_ARG_STRINGS which limits the number of arguments, but it’s set to 2³¹-1, so you’ll hit the ~2MiB limit first.
Needless to say, a lot of these numbers are much smaller on other POSIX systems, like BSDs or macOS.
find . -exec blah will fork a process for each file, while find . | xargs blah will fork a process per X files (where X is the system wide argument length limit). The later could run quite a bit faster. I will typically do find . -name '*.h' | xargs grep SOME_OBSCURE_DEFINE and depending upon the repo, that might only expand to one grep.
As @jonahx mentions, there is an option for that in find too:
-exec utility [argument ...] {} +
Same as -exec, except that ``{}'' is replaced with as many pathnames as possible for each invocation of utility. This
behaviour is similar to that of xargs(1).
I didn’t know about the ‘+’ option to find, but I also use xargs with a custom script that scans for source files in a directory (not in sh or bash as I personally find shell scripting abhorrent).
That is the real beauty of xargs. I didn’t know about using + with find, and while that’s quite useful, remembering it means I need to remember something that only works with find. In contrast, xargs works with anything they can supply a newline-delimited list of filenames as input.
Conceptually, I think of xargs primarily as a wrapper that enables tools that don’t support stdin to support stdin. Is this a good way to think about it?
Yes I’d think of it as an “adapter” between text streams (stdin) and argv arrays. Both of those are essential parts of shell and you need ways to move back and forth. To move the other way you can simply use echo (or write -- @ARGV in Oil).
Another way I think of it is to replace xargs with the word “each” mentally, as in Ruby, Rust, and some common JS idioms.
You’re basically separating iteration from the logic of what to do on each thing. It’s a special case of a loop.
In a loop, the current iteration can depend on the previous iteration, and sometimes you need that. But in xargs, every iteration is independent, which is good because you can add xargs -P to automatically parallelize it! You can’t do that with a regular loop.
I would like Oil to grow an each builtin that is a cleaned up xargs, following the guidelines I enumerated.
I’ve been wondering if it should be named each and every?
each – like xargs -n 1, and find -exec foo \; – call a process on each argument
every – like xargs, and find -exec foo +` – call the minimal number of processes, but exhaust all arguments
So something like
proc myproc { echo $1 } # passed one arg
find . | each -- myproc # call a proc/shell function on each file, newlines are the default
proc otherproc { echo @ARGV } # passed many args
find . | every -- otherproc # call the minimal number of processes
If anyone has feedback I’m interested. Or wants to implement it :)
Probably should add this to the blog post: Why use xargs instead of a loop?
It’s easier to preview what you’re doing by sticking echo on the beginning of the command. You’re decomposing the logic of which things to iterate on, and what work to do.
When the work is independent, you can parallelize with xargs -P
You can filter the work with grep. Instead of find | xargs, do find | grep | xargs. This composes very nicely
Why oh why did they waste the opportunity to name the language Ni?
…srsly, it looks pretty cool. Shrubbery notation addresses the syntactic ugliness that’s kept me away from Lisps. A lot of the macro stuff in the paper goes over my head — I can barely figure out how to use “…” in C++ templates — but it looks very powerful. Definitely going to try out
NiRhombus when I get a chance.[update: turns out I made the same joke a year ago in the thread @5d22b linked to. What can I say, I watched too much Monty Python in my impressionable youth.]
Rhombus is still the interim name. IIRC, in the process picking the “official” name is one of the last steps before the language is “done.”
I LOVE IT!
I wonder if their work is also related to WISP (whitespace lisp) https://www.draketo.de/software/wisp and https://srfi.schemers.org/srfi-110/srfi-110.html and https://readable.sourceforge.io/
IIRC all of those read to a general S-expression form. Shrubbery, the surface syntax that Rhombus uses, reads to a more constrained form of S-expressions.
Some rationale and comparison of the design choices made for Shrubbery can be read here https://docs.racket-lang.org/shrubbery/Design_Considerations.html
Thank you for linking the paper: I saw this on the OOPSLA site yesterday but couldn’t find a link; where was this sourced (so i know where to look in the future)?
I think that link was via some sleuthing someone did on Reddit. Here is a version that is close to what will be published. The only changes should be spelling and grammar.
https://users.cs.utah.edu/plt/publications/oopsla23-faadffggkkmppst.pdf
Cool stuff! I take it the WASM code uses tail-call optimization?
From earlier posts I’ve seen about Lisp on WASM, it sounds like GC will be a significant hurdle. Hoot will have to implement that itself, right?
Hoot depends on both the tail call and GC Wasm extensions. On the GC side, Hoot will emit extra instructions to describe its types according to the Wasm GC spec and then the host VM can do the collecting.
I’m not sure where to put it, but the entire subthread is missing a look at the roadmap, which clearly shows that most runtimes do not have TC or GC available yet. Only Chrome can do it, and only with a special flag.
Yes, that’s the current status today. Firefox is in the process of actively implementing both. The proposals themselves appear to be progressing well through the standards process. Two web VMs are required to be implementing a proposal to progress through the later stages. The features should be default enabled once standardised, assuming no obstacles appear. People involved in the proposals process have estimated they should be available by the end of the year.
I believe other non-web engines are also working on these proposals, though I forget which at the moment.
Hmm, that’s unfortunate
Unfortunate in what way…?
Scheme makes use of tail calls and GC, and those extensions are on track to be generally available in common Wasm engines this year, so it seems like a reasonable design choice to target them now.
Ideally those things would be handled internally, though, not rely on extra features being added to every WASM engine.
IIRC rolling your own GC in WASM is quite awkward/difficult, especially if you consider object references between containers. There was a post about it a month or so ago (i don’t remember the details; maybe it was the previous Spritely post?)
Also, the major WASM runtimes already contain world-class GCs, and run a GC’d language, so exposing those GCs to WASM seems a good idea for performance and interop.
(But I do get your point about piling on features that other WASM runtimes now need to add! Fortunately GC isn’t hard to implement, if you don’t care about world-class performance. I’ve done it twice this year for my smol_world project.)
Ah hmm… For these particular abilities though, it’s at the very least quite hard or potentially impossible to get the same result without some kind of engine support.
For the case of tail calls, it’s much more natural to express recursive programs (esp. ones in Scheme written to expect tail calls) in this style. Perhaps custom stack management at run-time or other hefty program transformations can be used as a workaround, but the concept of tail calls is fairly straightforward and the host engine complexity appears to be small.
For the case of GC, the proposal is far more complex, so I can understand hesitation in terms of complexity… At the same time, allowing Wasm programs to leverage the existing engine GC does make implementation drastically simpler (for languages expecting GC). It importantly also makes it possible to describe cycles between host engines data and Wasm program data which just wasn’t possible before.
I suppose a generalised version of your concern might be that you don’t want every language feature to become a Wasm extension, and I agree with that general sentiment. In the case of tail calls and GC though, they feel (to me at least) sufficiently useful to a variety of languages and allow the Wasm engine to enable a use cases that may be impossible (or very hard) otherwise.
Tail call elimination simplifies things a lot, and there’s very little reason not to have it… how to do it has been known for a long time. GCC even supports it for C in many cases, IIRC.
At any rate, I won’t go into the GC proposal stuff in depth, but here are some good motivators to see that work advance:
Oh, so WASM GC is available already? I had the impression it was a ways out.
Wasm GC is experimentally available via flags or being implemented in at least Chrome and Firefox, perhaps some non-browser implementations as well.
It’s believed to be on track to be generally available in stable browsers and engines sometime this year IIRC.
I’m not super up-to-date, but as I understand it, GC will be in consumer browsers by Q4 2023 (according to Andy Wingo.) And is available in development builds currently, so language implementers may want to start targeting it now.
Janet is definitely the most modern looking Lisp I’ve seen.
What about Racket?
“When someone calls a language modern, it tells you next to nothing about the language, but it tells you a fair bit about the person who said it.”
That said, Racket has a few clunky features due to its age. The class system feels very dated, and the fact that most short list operations only work on lists and not general sequence types isn’t great. The latter is somewhat addressed by the “for” family of macros but IIRC it’s something the maintainers would have done differently if they had a do over.
YES! I think the term “modern” is a thought terminating cliche. What does it really mean? If you had a “modern” language, write a book about it, describe it as “modern”, and 20 years passes what does the term “modern” mean to readers?
It just shuts down conversations because no one wants argue against it.
Agreed. I’ve dug through too many used bookstores and old libraries full of books with titles like “Modern Pascal Programming For MS-DOS 4.0” to want to use it as a term.
I’m tempted to create a terrible programming language and name it “Modern” just to try to get people to stop saying this.
You could take an amalgamation of bad features from the last 30 years of “modern” languages. It would probably be a great language!
I’m curious how you would change the Racket class system? Besides the Beta features, it’s not too different from Java or Smalltalk.
Omit it entirely. Classes were a mistake.
Most of my time is now spent using Racket in places where I could use a shell script. It’s easier to write a Racket program that invokes other programs and work with their error codes and re-direct their output to the right places. Truly a joy for me, personally, as I do like writing Lisp.
Could you provide a few idiomatic examples of replacements of typical shellscript pipelines featuring grep, sek, sort, etc?
For the most part, a lot of features in the Racket library do not need sub-processes to do those types of jobs.
grep
we haveregexp
objects which employ eitherracket-match
orracket-match?
to match across strings or filter.seq
can be mimicked by using arange
function to iterate combined using expressions likefor
.sort
is done by using the appropriately named Racket functionsort
and changing the comparison function and input list.If you want to sub-process invoke programs, then the output of a
subprocess
call can only be sent to a file stream likestdout
or a plain file. To invoke multiple sub-processes one after another and continuously pass their outputs to one another involves a little bit of trickery which might be a bit complex to talk about in a comment, but it is do-able. The gist is to try to write tasks using the Racket standard library, then usesubprocess
when you need something not covered by it.As posted in a sibling message, it’s much easier to use built-in functions than to shell out and call another program. Personally, I find Racket more convenient for writing scripts that need to work in parallel. For example, a script gets the load average from several machines in parallel over ssh.
https://gist.github.com/6c7ab225610bc50a3bb4be35f8e46f18
Would also love to see examples.
Best way I can quickly sum it up is clever use of the function
subprocess
in Racket.This outputs the
seq
command tostdout
, and allows for arbitrary commands so you can do zero-arg sub-processes or however many you need/like. Thecurrent-output-port
andcurrent-input-port
calls are parameters that you can adjust by using aparameterize
block to control the input/output from the exterior.The output port must be set to a file, it cannot be set to an output string like with
call-with-output-string
, so output is either going to go straight tostdout
, or you can usecall-with-output-file
to control thecurrent-output-port
parameter and store the output wherever you please.Adding or setting key=value in a file idempotently: There is a utility for that: setconf
There is also augeas.
I had trouble following all this (you’ve read the Common Lisp spec way more closely than I ever bothered to), but you might be interested in John Shutt’s Kernel language. To avoid unhygienic macros, Kernel basically outlaws quasiquote and unquote and constructs all macros out of
list
,cons
and so on. Which has the same effect as unquoting everything. A hyperstatic system where symbols in macros always expand to their binding at definition time, never to be overridden. Implying among other things that you can never use functions before defining them.There’s a lot I love about Kernel (it provides a uniform theory integrating functions and macros and intermediate beasts) but the obsession with hygiene is not one of them. I took a lot of inspiration from Kernel in my Lisp with first-class macros, but I went all the way in the other direction and supported only macros with quasiquote and unquote. You can define symbols in any order in Wart, and override any symbols at any time, including things like
if
andcons
. The only things you can’t override are things that look like punctuation. Parens, quote, quasiquote, unquote, unquote-splice, and a special symbol@
forapply
analogous to unquote-splice. Wart is even smart enough to supportapply
on macros, something Kernel couldn’t do – as long as your macros are defined out of quasiquote and unquote. I find this to be a sort of indirect sign that it gets closer to the essence of macros by decoupling them into their component pieces like Kernel did, but without complecting them with concerns of hygiene.(Bel also doesn’t care about hygienic macros and claims to support fully first-class
apply
on macros. Though I don’t understand how Bel’s macroexpand works in spite of some effort in that direction.)It’s easy to write unhygenic macros without quasiquote. Does Kernel also outlaw constructing symbols?
No, looks like page 165 of the Kernel spec does provide
string->symbol
.Doesn’t that seem like a big loophole that would make it easy to be unhygenic?
Depends on what you’re protecting against. Macros are fundamentally a convenience. As I understand the dialectic around hygienic macros, the goal is always just to add guardrails to the convenient path, not to make the guardrails mandatory. Most such systems deliberately provide escape hatches for things like anaphoric macros. So I don’t think I’ve ever heard someone say hygiene needs to be an ironclad guarantee.
Honestly I agree with the inclusion of escape hatches if they are unlikely to be hit accidentally; I’m just surprised that the Kernel developers also agree, since they took such a severe move as to disallow quasiquote altogether.
I don’t want to put words in peoples’ mouths, but I’m pretty sure this is the stance of most Racket devs.
Not true, because Scheme’s
syntax-rules
explicitly provides an escape hatch for literals, which can be used to violate hygiene in a deliberate manner. Racket implementssyntax-rules
.On the other hand, you’re absolutely right that they don’t make it easy. I have no idea what to make of anaphoric macros like this one from the
anaphoric
package.Racket doesn’t forbid
string->symbol
either, it just provides it with some type-safe scaffolding called syntax objects. We can definitely agree that makes it more difficult to use. But the ‘loophole’ does continue to exist.I’m not aware of any macro in Common Lisp that cannot be implemented in Racket (modulo differences in the runtimes like Lisp-1 vs Lisp-2, property lists, etc.) It just gets arbitrarily gnarly.
Thanks for the clarification. I have attempted several times to understand Racket macros but never really succeeded because it’s just so much more complicated compared to the systems I’m familiar with.
Yeah, I’m totally with you. They make it so hard that macros are used a lot less in the Scheme world. If you’re looking to understand macros, I’d recommend a Lisp that’s not a Scheme. I cut my teeth on them using Arc Lisp, which was a great experience even though Arc is a pretty thin veneer over Racket.
Have you read Fear of Macros? Also there is Macros and Languages in Racket which takes a more exercise based approach.
At least twice.
Nowadays when I need a Racket macro I just show up in #racket and say “boy, this sure is easy to write using defmacro, too bad hygenic macros are so confusing” and someone will be like “they’re not confusing! all you have to do is $BLACK_MAGIC” and then boom; I have the macro I need.
Kernel does not avoid unhygienic macros. Whereas Scheme R6RS syntax-case makes it more difficult to write unhygienic macros but still possible. It possible to write unhygienic code with Kernel, such defining
define-macro
without using or the need for quasiquote et al.Kernel does not outlaw quasiquote and unquote semantic. There is
$quote
andunquote
is merely(eval symbol env)
, whereas quasiquote is just a reader trick inside Scheme (also see [0]).Yes an no.
Scheme macros, and even CL macros are meant a) a hook into the compiler to speed things up e.g.
compose
, or clojure’s=>
, or b) change the prefix-based evaluation strategy to build, so called, Domain Specific Languages such as records eg. SRFI-9.Kernel eliminates the need to think “this a macro or is this procedure”, instead everything is an operative, it is up the interpreter or compiler to figure what can be compiled (ahead-of-time) or not, which is slightly more general that everything is a macro, at least because an operative as access to the dynamic scope.
Based on your comment description, Wart is re-inventing Kernel or something like that (without formal description unlike John Shutt).
re apply for macros: read page 67 at https://ftp.cs.wpi.edu/pub/techreports/pdf/05-07.pdf
[0] https://github.com/cisco/ChezScheme/blob/main/s/syntax.ss#L7644
Page 67 of the Kernel Report says macros don’t need
apply
because they don’t evaluate their arguments. I think that’s wrong because macros can evaluate their arguments when unquoted. Indeed, most macro args are evaluated eventually, usingunquote
. In the caller’s environment. Most of the value of macros lies in selectively turning off eval for just the odd arg. And macros are most of the use of fexprs, as far as I’ve been able to glean.Yes, that’s the goal. But it doesn’t happen for
apply
. I kept running into situations where I had to think about whether the variable was a macro. Often, within the body of a higher-order function/macro, I just didn’t know. So theapply
restriction spread through my codebase until I figured this out.I spent some time trying to find a clean example where I use
@
on macros in Wart. Unfortunately this capability is baked into Wart so deeply (and Wart is so slow, suffering from the combinatorial explosion of every fexpr-based Lisp) that it’s hard to explain. But Wart provides the capability to cleanly extend even fundamental operations likeif
anddef
andmac
, and all these use the higher-order functions on macros deep inside their implementations.For example, here’s a definition where I override the pre-existing
with
macro to add new behavior when it’s called with(with table ...)
: https://github.com/akkartik/wart/blob/main/054table.wart#L54The backtick syntax it uses there is defined in https://github.com/akkartik/wart/blob/main/047generic.wart, which defines these advanced forms for defining functions and macros:
That file overrides this basic definition of
mac
: https://github.com/akkartik/wart/blob/main/040.wart#L30Which is defined in terms of
mac!
: https://github.com/akkartik/wart/blob/main/040.wart#L1When I remove apply for macros, this definition no longer runs, for reasons I can’t easily describe.
As a simpler example that doesn’t use apply for macros, here’s where I extend the primitive two-branch
if
to support multiple branches: https://github.com/akkartik/wart/blob/main/045check.wart#L1I would like to think I reimplemented the core idea of Kernel (
$vau
) while decoupling it from considerations of hygiene. And fixedapply
in the process. Because my solution toapply
can’t work in hygienic Kernel.I don’t making any claim of novelty here. I was very much inspired by the Kernel dissertation. But I found the rest of its language spec.. warty :D
Promoting solely unhygenic macros, is similar as far as I understand, to promote “code formal proof are useless” or something similar about ACID or any kind guarantees a software might provide.
Both Scheme, and Kernel offer the ability to bypass the default hygienic behavior, and hence promote, first, a path of least surprise (and hard to find bugs), and allow the second (aka. prolly shoot yourself in the foot at some point).
At least for me, the value of Lisp is in its late bound nature during the prototyping phase. So the useability is top priority. Compromising useability with more complicated macro syntax (resulting in far fewer people defining macros, as happens in the scheme world) for better properties for mature programs seems a poor trade-off. And yes, I don’t use formal methods while prototyping either.
Syntax rules are not much more complicated to use than define-macro, ref: https://www.gnu.org/software/guile/manual/html_node/Syntax-Rules.html
The only drawback of hygienic macro that I know about is that is more difficult to implement than define-macro, but again I do know everything about macros.
ref: https://gitlab.com/nieper/unsyntax/
We’ll have to agree to disagree about
syntax-rules
. Just elsewhere on this thread there’s someone describing their various attempts to unsuccessfully use macros in Scheme. I have had the same experience. It’s not just the syntax ofsyntax-rules
. Scheme is pervasively designed (like Kernel) with hygiene in mind. It makes for a very rigid language, with things like the phase separation rules, that is the antithesis of the sort of “sketching” I like to use Lisp for.This is probably really out of date now, but it is an implementation of javascript in Racket (https://docs.racket-lang.org/javascript/index.html) written by Dave Herman
Thanks! Added!
In a similar vein, check out JSCert, JS-2-GIL, and KJS. I believe Gillian is the only actively developed semantics….
Amazing! I was getting so few replies with research implementations. Thank you!
I’m genuinely interested if that GUI can be used with framebuffer “backend” on Linux for embedded devices, using the DRM only.
Currently it isn’t possible. It would require implementing the base widgets (rendering and input events.) Part of an implementation could be simplified by using the existing
racket/draw
library which sits on top of cairo.Eh, there are some problems with xargs, but this isn’t a good critique. First off it proposes a a “solution” that doesn’t even handle spaces in filenames (much less say newlines):
I prefer this as a practical solution (that handles every char except newlines in filenames):
You can also pipe
find . -print0
toxargs -0
if you want to handle newlines (untrusted data).(Although then you have the problem that there’s no
grep -0
, which is why Oil has QSN. grep still works on QSN, and QSN can represent every string, even those with NULs!)One nice thing about xargs is that you can preview the commands by adding ‘echo’ on the front:
That will help get the tokenization right, so you don’t feed the wrong thing into the commands!
I never use xargs -L, and I sometimes use
xargs -I {}
for simple invocations. But even better than that is using xargs with the$0
Dispatch pattern, which I still need properly write about.Basically instead of the mini language of
-I {}
, just use shell by recursively invoking shell functions. I use this all the time, e.g. all over Oil and elsewhere.Now run with
myscript.sh do_all
, ormy_script.sh do_one
to test out the “work” function (very handy! you need to make this work first)This separates the problem nicely – make it work on one thing, and then figure out which things to run it on. When you combine them, they WILL work, unlike the “sed into bash” solution.
Reading up on what
xargs -L
does, I have avoided it because it’s a custom mini-language. It says that trailing blanks cause line continuations. Those sort of rules are silly to me.I also avoid
-I {}
because it’s a custom mini-language.IMO it’s better to just use the shell, and one of these three invocations:
$'\n'
– when you want linesxargs -0
– when you want to handle untrusted data (e.g. someone putting a newline in a filename)Those 3 can be combined with
-n 1
or-n 42
, and they will do the desired grouping. I’ve never needed anything more than that.So yes xargs is weird, but I don’t agree with the author’s suggestions.
sed
piped intobash
means that you’re manipulating bash code with sed, which is almost impossible to do correctly.Instead I suggest combining xargs and shell, because xargs works with arguments and not strings. You can make that correct and reason about what it doesn’t handle (newlines, etc.)
(OK I guess this is a start of a blog post, I also gave a 5 minute presentation 3 years ago about this: http://www.oilshell.org/share/05-24-pres.html)
I use
find . -exec
very often for running a command on lots of files. Why would you choose to pipe intoxargs
instead?It can be much faster (depending on the use case). If you’re trying to
rm
100,000 files, you can start one process instead of 100,000 processes! (the max number of args to a process on Linux is something like 131K as far as I remember).It’s basically
vs.
Here’s a comparison showing that
find -exec
is slower:https://www.reddit.com/r/ProgrammingLanguages/comments/frhplj/some_syntax_ideas_for_a_shell_please_provide/fm07izj/
Another reference: https://old.reddit.com/r/commandline/comments/45xxv1/why_find_stat_is_much_slower_than_ls/
Good question, I will add this to the hypothetical blog post! :)
@andyc Wouldn’t the find
+
(rather than;
) option solve this problem too?Oh yes, it does! I don’t tend to use it, since I use xargs for a bunch of other stuff too, but that will also work. Looks like busybox supports it to in addition to GNU (I would guess it’s in POSIX).
Time for the other really, really useful feature of xargs. ;)
It’s not a limit on the number of arguments, it’s a limit on the total size of environment variables + command-line arguments (+ some other data, see
getauxval(3)
on a Linux machine for details). Apparently Linux defaults to a quarter of the available stack allocated for new processes, but it also has a hard limit of 128KiB on the size of each individual argument (MAX_ARG_STRLEN
). There’s alsoMAX_ARG_STRINGS
which limits the number of arguments, but it’s set to 2³¹-1, so you’ll hit the ~2MiB limit first.Needless to say, a lot of these numbers are much smaller on other POSIX systems, like BSDs or macOS.
find . -exec blah
will fork a process for each file, whilefind . | xargs blah
will fork a process per X files (where X is the system wide argument length limit). The later could run quite a bit faster. I will typically dofind . -name '*.h' | xargs grep SOME_OBSCURE_DEFINE
and depending upon the repo, that might only expand to one grep.As @jonahx mentions, there is an option for that in
find
too:I didn’t know about the ‘+’ option to
find
, but I also usexargs
with a custom script that scans for source files in a directory (not insh
orbash
as I personally find shell scripting abhorrent).That is the real beauty of xargs. I didn’t know about using + with find, and while that’s quite useful, remembering it means I need to remember something that only works with find. In contrast, xargs works with anything they can supply a newline-delimited list of filenames as input.
Yes, this. Even though the original post complains about too many features in
xargs
,find
is truly the worst with a million options.This comment was a great article in itself.
Conceptually, I think of xargs primarily as a wrapper that enables tools that don’t support stdin to support stdin. Is this a good way to think about it?
Yes I’d think of it as an “adapter” between text streams (stdin) and argv arrays. Both of those are essential parts of shell and you need ways to move back and forth. To move the other way you can simply use
echo
(orwrite -- @ARGV
in Oil).Another way I think of it is to replace
xargs
with the word “each” mentally, as in Ruby, Rust, and some common JS idioms.You’re basically separating iteration from the logic of what to do on each thing. It’s a special case of a loop.
In a loop, the current iteration can depend on the previous iteration, and sometimes you need that. But in
xargs
, every iteration is independent, which is good because you can addxargs -P
to automatically parallelize it! You can’t do that with a regular loop.I would like Oil to grow an
each
builtin that is a cleaned up xargs, following the guidelines I enumerated.I’ve been wondering if it should be named
each
andevery
?each
– like xargs -n 1, andfind -exec foo \;
– call a process on each argumentevery
– likexargs
, andfind -exec foo
+` – call the minimal number of processes, but exhaust all argumentsSo something like
If anyone has feedback I’m interested. Or wants to implement it :)
Probably should add this to the blog post: Why use
xargs
instead of a loop?echo
on the beginning of the command. You’re decomposing the logic of which things to iterate on, and what work to do.xargs -P
grep
. Instead offind | xargs
, dofind | grep | xargs
. This composes very nicelyCool. A bit like the old MH mail client system.