1. 36

    Hello, I am here to derail the Rust discussion before it gets started. The culprit behind sudo’s vast repertoire of vulnerabilities, and more broadly of bugs in general, is accountable almost entirely to one matter: its runaway complexity.

    We have another tool which does something very similar to sudo which we can compare with: doas. The portable version clocks in at about 500 lines of code, its man pages are a combined 157 lines long, and it has had two CVEs (only one of which Rust would have prevented), or approximately one every 30 months.

    sudo is about 120,000 lines of code (100x more), its had 140 CVEs, or about one every 2 months since the CVE database came into being 21 years ago. Its man pages are about 10,000 lines and include the following:

    $ man sudoers | grep -C1 despair
    The sudoers file grammar will be described below in Extended Backus-Naur
    Form (EBNF).  Don't despair if you are unfamiliar with EBNF; it is fairly
    simple, and the definitions below are annotated.
    

    If you want programs to be more secure, stable, and reliable, the key metric to address is complexity. Rewriting it in Rust is not the main concern.

    1. 45

      its had 140 CVEs

      Did you even look at that list? Most of those are not sudo vulnerabilities but issues in sudo configurations distros ship with. The actual list is more like 39, and a number of them are “disputed” and most are low-impact. I didn’t do a full detailed analysis of the issues, but the implication that it’s had “140 security problems” is simply false.

      sudo is about 120,000 lines of code

      More like 60k if you exclude the regress (tests) and lib directories, and 15k if you exclude the plugins (although the sudoers plugin is 40k lines, which most people use). Either way, it’s at least half of 120k.

      Its man pages are about 10,000 lines and include the following:

      12k, but this also includes various technical documentation (like the plugin API); the main documentation in sudoers(1) is 741 lines, and sudoers(5) is 3,255 lines. Well under half of 10,000.

      We have another tool which does something very similar to sudo which we can compare with: doas.

      Except that it only has 10% of the features, or less. This is good if you don’t use them, and bad if you do. But I already commented on this at HN so no need to repeat that here.

      1. 12

        You’re right about these numbers being a back-of-the-napkin analysis. But even your more detailed analysis shows that the situation is much graver with sudo. I am going to include plugins, becuase if they ship, they’re a liability. And their docs, because they felt the need to write them. You can’t just shove the complexity you don’t use and/or like under the rug. Heartbleed brought the internet to its knees because of a vulnerability in a feature no one uses.

        And yes, doas has 10% of the features by count - but it has 99% of the features by utility. If you need something in the 1%, what right do you have to shove it into my system? Go make your own tool! Your little feature which is incredibly useful to you is incredibly non-useful to everyone else, which means fewer eyes on it, and it’s a security liability to 99% of systems as such. Not every feature idea is meritous. Scope management is important.

        1. 9

          it has 99% of the features by utility

          Citation needed.

          what right do you have to shove it into my system?

          Nobody is shoving anything into your system. The sudo maintainers have the right to decide to include features, and they’ve been exercising that right. You have the right to skip sudo and write your own - and you’ve been exercising that right too.

          Go make your own tool!

          You’re asking people to undergo the burden of forking or re-writing all of the common functionality of an existing tool just so they can add their one feature. This imposes a great cost on them. Meanwhile, including that code or feature into an existing tool imposes only a small (or much smaller) cost, if done correctly - the incremental cost of adding a new feature to an existing system.

          The key phrase here is “if done correctly”. The consensus seems to be that sudo is suffering from poor engineering practices - few or no tests, including with the patch that (ostensibly) fixes this bug. If your software engineering practices are bad, then simpler programs will have fewer bugs only because there’s less code to have bugs in. This is not a virtue. Large, complex programs can be built to be (relatively) safe by employing tests, memory checkers, good design practices, good architecture (which also reduces accidental complexity) code reviews, and technologies that help mitigate errors (whether that be a memory-safe GC-less language like Rust or a memory-safe GC’ed language like Python). Most features can (and should) be partitioned off from the rest of the design, either through compile-time flags or runtime architecture, which prevents them from incurring security or performance penalties.

          Software is meant to serve the needs of users. Users have varied use-cases. Distinct use-cases require more code to implement, and thereby incur complexity (although, depending on how good of an engineer one is, additional accidental complexity above the base essential complexity may be added). If you want to serve the majority of your users, you must incur some complexity. If you want to still serve them, then start by removing the accidental complexity. If you want to remove the essential complexity, then you are no longer serving your users.

          The sudo project is probably designed to serve the needs of the vast majority of the Linux user-base, and it succeeds at that, for the most part. doas very intentionally does not serve the needs of the vast majority of the linux user-base. Don’t condemn a project for trying to serve more users than you are.

          Not every feature idea is meritous.

          Serving users is meritous - or do you disagree?

          1. 6

            Heartbleed brought the internet to its knees because of a vulnerability in a feature no one uses.

            Yes, but the difference is that these are features people actually use, which wasn’t the case with Heartleed. Like I mentioned, I think doas is great – I’ve been using it for years and never really used (or liked) sudo because I felt it was far too complex for my needs, before doas I just used su. But I can’t deny that for a lot of other people (mainly organisations, which is the biggest use-case for sudo in the first place) these features are actually useful.

            Go make your own tool! Your little feature which is incredibly useful to you is incredibly non-useful to everyone else

            A lot of these things aren’t “little” features, and many interact with other features. What if I want doas + 3 flags from sudo + LDAP + auditing? There are many combinations possible, and writing a separate tool for every one of them isn’t really realistic, and all of this also required maintenance and reliable consistent long-term maintainers are kind of rare.

            Scope management is important.

            Yes, I’m usually pretty explicit about which use cases I want to solve and which I don’t want to solve. But “solving all the use cases” is also a valid scope. Is this a trade-off? Sure. But everything here is.

            The real problem isn’t so much sudo; but rather that sudo is the de-facto default in almost all Linux distros (often installed by default, too). Ideally, the default should be the simplest tool which solves most of the common use cases (i.e. doas), and people with more complex use cases can install sudo if they need it. I don’t know why there aren’t more distros using doas by default (probably just inertia?)

            1. 0

              What if I want doas + 3 flags from sudo + LDAP + auditing?

              Tough shit? I want a pony, and a tuba, and barbie doll…

              But “solving all the use cases” is also a valid scope.

              My entire thesis is that it’s not a valid scope. This fallacy leads to severe and present problems like the one we’re discussing today. You’re begging the question here.

              1. 3

                Tough shit? I want a pony, and a tuba, and barbie doll…

                This is an extremely user-hostile attitude to have (and don’t try claiming that telling users with not-even-very-obscure use-cases to write their own tools isn’t user-hostile).

                I’ve noticed that some programmers are engineers that try to build tools to solve problems for users, and some are artists that build programs that are beautiful or clever, or just because they can. You appear to be one of the latter, with your goal being crafting simple, beautiful systems. This is fine. However, this is not the mindset that allows you to build either successful systems (in a marketshare sense) or ones that are useful for many people other than yourself, for previously-discussed reasons. The sudo maintainers are trying to build software for people to use. Sure, there’s more than one way to do that (integration vs composition), but there are ways to do both poorly, and claiming the moral high ground for choosing simplicity (composition) is not only poor form but also kind of bad optics when you haven’t even begun to demonstrate that it’s a better design strategy.

                My entire thesis is that it’s not a valid scope.

                A thesis which you have not adequately defended. Your statements have amounted to “This bug is due to sudo’s complexity which is driven by the target scope/number of features that it has”, while both failing to provide any substantial evidence that this is the case (e.g. showing that sudo’s bugs are due to feature-driven essential complexity alone, and not use of a memory-unsafe language, poor software engineering practices (which could lead to either accidental complexity or directly to bugs themselves), or simple chance/statistics) and not actually providing any defense for the thesis as stated. Assume that @arp242 didn’t mean “all” the usecases, but instead “the vast majority” of them - say, enough that it works for 99.9% of users. Why is this “invalid”, exactly? It’s easy for me to imagine the argument being “this is a bad idea”, but I can’t imagine why you would think that it’s logically incoherent.

                Finally, you have repeatedly conflated “complexity” and “features”. Your entire argument is, again, invalid if you can’t show that sudo’s complexity is purely (or even mostly) essential complexity, as opposed to accidental complexity coming from being careless etc.

          2. 9

            I dont’t think “users (distros) make a lot of configuration mistakes” is a good defence when arguing if complexity is the issue.

            But I do agree about feature set. And I feel like arguing against complexity for safety is wrong (like ddevault was doing), because systems inevitably grow complex. We should still be able to build safe, complex systems. (Hence why I’m a proponent of language innovation and ditching C.)

            1. 11

              I dont’t think “users (distros) make a lot of configuration mistakes” is a good defence when arguing if complexity is the issue.

              It’s silly stuff like (ALL : ALL) NOPASSWD: ALL. “Can run sudo without a password” seems like a common theme: some shell injection is found in the web UI and because the config is really naïve (which is definitely not the sudo default) it’s escalated to root.

              Others aren’t directly related to sudo configuration as such; for example this one has a Perl script which is run with sudo that can be exploited to run arbitrary shell commands. This is also a common theme: some script is run with sudo, but the script has some vulnerability and is now escalated to root as it’s run with sudo.

              I didn’t check all of the issues, but almost all that I checked are one of the above; I don’t really see any where the vulnerability is caused directly by the complexity of sudo or its configuration; it’s just that running anything as root is tricky: setuid returns 432 results, three times that of sudo, and I don’t think that anyone can argue that setuid is complex or that setuid implementations have been riddled with security bugs.

              Other just mention sudo in passing by the way; this one is really about an unrelated remote exec vulnerability, and just mentions “If QCMAP_CLI can be run via sudo or setuid, this also allows elevating privileges to root”. And this one isn’t even about sudo at all, but about a “sudo mode” plugin for TYPO3, presumably to allow TYPO3 users some admin capabilities without giving away the admin password. And who knows why this one is even returned in a search for “sudo” as it’s not mentioned anywhere.

              1. 3

                it’s just that running anything as root is tricky: setuid returns 432 results, three times that of sudo

                This is comparing apples to oranges. setuid affects many programs, so obviously it would have more results than a single program would. If you’re going to attack my numbers than at least run the same logic over your own.

                1. 2

                  It is comparing apples to apples, because many of the CVEs are about other program’s improper sudo usage, similar to improper/insecure setuid usage.

                  1. 2

                    Well, whatever we’re comparing, it’s not making much sense.

                    1. If sudo is hard to use and that leads to security problems through its misusage, that’s sudo’s fault. Or do you think that the footguns in C are not C’s fault, either? I thought you liked Rust for that very reason. For this reason the original CVE count stands.
                    2. But fine, let’s move on on the presumption that the original CVE count is not appropriate to use here, and instead reference your list of 39 Ubuntu vulnerabilities. 39 > 2, Q.E.D. At this point we are comparing programs to programs.
                    3. You now want to compare this with 432 setuid results. You are comparing programs with APIs. Apples to oranges.

                    But, if you’re trying to bring this back and compare it with my 140 CVE number, it’s still pretty damning for sudo. setuid is an essential and basic feature of Unix, which cannot be made any smaller than it already is without sacrificing its essential nature. It’s required for thousands of programs to carry out their basic premise, including both sudo and doas! sudo, on the other hand, can be made much simpler and still address its most common use-cases, as demonstrated by doas’s evident utility. It also has a much smaller exposure: one non-standard tool written in the 80’s and shunted along the timeline of Unix history every since, compared to a standardized Unix feature introduced by DMR himself in the early 70’s. And setuid somehow has only 4x the number of footgun incidents? sudo could do a hell of a lot better, and it can do so by trimming the fat - a lot of it.

                    1. 3

                      If sudo is hard to use and that leads to security problems through its misusage, that’s sudo’s fault.

                      It’s not because it’s hard to use, it’s just that its usage can escalate other more (relatively) benign security problems, just like setuid can. This is my point, as a reply to stephank’s comment. This is inherent to running anything as root, with setuid, sudo, or doas, and why we have capabilities on Linux now. I bet that if doas would be the default instead of sudo we’d have a bunch of CVEs about improper doas usage now, because people do stupid things like allowing anyone to run anything without password and then write a shitty web UI in front of that. That particular problem is not doas’s (or sudo’s) fault, just as cutting myself with the kitchen knife isn’t the knife’s fault.

                      reference your list of 39 Ubuntu vulnerabilities. 39 > 2, Q.E.D.

                      Yes, sudo has had more issues in total; I never said it doesn’t. It’s just a lot lower than what you said, and quite a number are very low-impact, so I just disputed the implication that sudo is a security nightmare waiting to happen: it’s track record isn’t all that bad. As always, more features come with more (security) bugs, but use cases do need solving somehow. As I mentioned, it’s a trade-off.

                      sudo, on the other hand, can be made much simpler and still address its most common use-cases, as demonstrated by doas’s evident utility

                      We already agreed on this yesterday on HN, which I repeated here as well; all I’m adding is “but sudo is still useful, as it solves many more use cases” and “sudo isn’t that bad”.

                      Interesting thing to note: sudo was removed from OpenBSD by millert@openbsd.org; who is also the sudo maintainer. I think he’ll agree that “sudo is too complex for it to the default”, which we already agree on, but not that sudo is “too complex to exist”, which is where we don’t agree.

                      Could sudo be simpler or better architectured to contain its complexity? Maybe. I haven’t looked at the source or use cases in-depth, and I’m not really qualified to make this judgement.

              2. 5

                I think arguing against complexity is one of the core principles of UNIX philosophy, and it’s gotten us quite far on the operating system front.

                If simplicity was used in sudo, this particular vulnerability would not have been possible to trigger it: why have sudoedit in the first place, which just implies the -e flag? This statement is a guarantee.

                If it would’ve ditched C, there is no guarantee that this issue wouldn’t have happened.

              3. 2

                Did you even look at that list? Most of those are not sudo vulnerabilities but issues in sudo configurations distros ship with.

                If even the distros can’t understand the configuration well enough to get it right, what hope do I have?

              4. 16

                OK maybe here’s a more specific discussion point:

                There can be logic bugs in basically any language, of course. However, the following classes of bugs tend to be steps in major exploits:

                • Bounds checking issues on arrays
                • Messing around with C strings at an extremely low level

                It is hard to deny that, in a universe where nobody ever messed up those two points, there are a lot less nasty exploits in the world in systems software in particular.

                Many other toolchains have decided to make the above two issues almost non-existent through various techniques. A bunch of old C code doesn’t handle this. Is there not something that can be done here to get the same productivity and safety advantages found in almost every other toolchain for tools that form the foundation of operating computers? Including a new C standard or something?

                I can have a bunch of spaghetti code in Python, but turning that spaghetti into “oh wow argv contents ran over some other variables and messed up the internal state machine” is a uniquely C problem, but if everyone else can find solutions, I feel like C could as well (including introducing new mechanisms to the language. We are not bound by what is printed in some 40-year-old books, and #ifdef is a thing).

                EDIT: forgot to mention this, I do think that sudo is a bit special given that its default job is to take argv contents and run them. I kinda agree that sudo is a bit special in terms of exploitability. But hey, the logic bugs by themselves weren’t enough to trigger the bug. When you have a multi-step exploit, anything on the path getting stopped is sufficient, right?

                1. 14

                  +1. Lost in the noise of “but not all CVEs…” is the simple fact that this CVE comes from an embarrassing C string fuckup that would be impossible, or at least caught by static analysis, or at very least caught at runtime, in most other languages. If “RWIIR” is flame bait, then how about “RWIIP” or at least “RWIIC++”?

                  1. 1

                    I be confused… what does the P in RWIIP mean?

                    1. 3

                      Pascal?

                      1. 1

                        Python? Perl? Prolog? PL/I?

                      2. 2

                        Probably Python, given the content of the comment by @rtpg. Python is also memory-safe, while it’s unclear to me whether Pascal is (a quick search reveals that at least FreePascal is not memory-safe).

                        Were it not for the relative (accidental, non-feature-providing) complexity of Python to C, I would support RWIIP. Perhaps Lua would be a better choice - it has a tiny memory and disk footprint while also being memory-safe.

                        1. 2

                          Probably Python, given the content of the comment by @rtpg. Python is also memory-safe, while it’s unclear to me whether Pascal is (a quick search reveals that at least FreePascal is not memory-safe).

                          That’s possibly it.

                          Perhaps Lua would be a better choice - it has a tiny memory and disk footprint while also being memory-safe.

                          Not to mention that Lua – even when used without LuaJIT – is simply blazingly fast compared to other scripting languages (Python, Perl, &c)!

                          For instance, see this benchmark I did sometime ago: https://0x0.st/--3s.txt. I had implemented Ackermann’s function in various languages (the “./ack” file is the one in C) to get a rough idea on their execution speed, and lo and behold Lua turned out to be second only to the C implementation.

                  2. 15

                    I agree that rewriting things in Rust is not always the answer, and I also agree that simpler software makes for more secure software. However, I think it is disingenuous to compare the overall CVE count for the two programs. Would you agree that sudo is much more widely installed than doas (and therefore is a larger target for security researchers)? Additionally, most of the 140 CVEs linked were filed before October 2015, which is when doas was released. Finally, some of the linked CVEs aren’t even related to code vulnerabilities in sudo, such as the six Quest DR Series Disk Backup CVEs (example).

                    1. 4

                      I would agree that sudo has a bigger target painted on its back, but it’s also important to acknowledge that it has a much bigger back - 100× bigger. However, I think the comparison is fair. doas is the default in OpenBSD and very common in NetBSD and FreeBSD systems as well, which are at the heart of a lot of high-value operations. I think it’s over the threshold where we can consider it a high-value target for exploitation. We can also consider the kinds of vulnerabilities which have occured internally within each project, without comparing their quantity to one another, to characterize the sorts of vulnerabilities which are common to each project, and ascertain something interesting while still accounting for differences in prominence. Finally, there’s also a bias in the other direction: doas is a much simpler tool, shipped by a team famed for its security prowess. Might this not dissuade it as a target for security researchers just as much?

                      Bonus: if for some reason we believed that doas was likely to be vulnerable, we could conduct a thorough audit on its 500-some lines of code in an hour or two. What would the same process look like for sudo?

                      1. -1

                        but it’s also important to acknowledge that it has a much bigger back - 100× bigger.

                        Sorry but I miss the mass of users pretesting on the streets for tools that have 100x code compare to other tools providing similar functionality.

                        1. 10

                          What?

                    2. 10

                      So you’re saying that 50% of the CVEs in doas would have been prevented by writing it in Rust? Seems like a good reason to write it in Rust.

                      1. 11

                        Another missing point is that Rust is only one of many memory safe languages. Sudo doesn’t need to be particularly performant or free of garbage collection pauses. It could be written in your favorite GCed language like Go, Java, Scheme, Haskell, etc. Literally any memory safe language would be better than C for something security-critical like sudo, whether we are trying to build a featureful complex version like sudo or a simpler one like doas.

                        1. 2

                          Indeed. And you know, Unix in some ways have been doing this for years anyway with Perl, python and shell scripts.

                          1. 2

                            I’m not a security expert, so I’m be happy to be corrected, but if I remember correctly, using secrets safely in a garbage collected language is not trivial. Once you’ve finished working with some secret, you don’t necessarily know how long it will remain in memory before it’s garbage collected, or whether it will be securely deleted or just ‘deallocated’ and left in RAM for the next program to read. There are ways around this, such as falling back to manual memory control for sensitive data, but as I say, it’s not trivial.

                            1. 2

                              That is true, but you could also do the secrets handling in a small library written in C or Rust and FFI with that, while the rest of your bog-standard logic not beholden to the issues that habitually plague every non-trivial C codebase.

                              1. 2

                                Agreed.

                                Besides these capabilities, ideally a language would also have ways of expressing important security properties of code. For example, ways to specify that a certain piece of data is secret and ensure that it can’t escape and is properly overwritten when going out of scope instead of simply being dropped, and ways to specify a requirement for certain code to use constant time to prevent timing side channels. Some languages are starting to include things like these.

                                Meanwhile when you try to write code with these invariants in, say, C, the compiler might optimize these desired constraints away (overwriting secrets is a dead store that can be eliminated, the password checker can abort early when the Nth character of the hash is wrong, etc) because there is no way to actually express those invariants in the language. So I understand that some of these security-critical things are written in inline assembly to prevent these problems.

                                1. 1

                                  overwriting secrets is a dead store that can be eliminated

                                  I believe that explicit_bzero(3) largely solves this particular issue in C.

                                  1. 1

                                    Ah, yes, thanks!

                                    It looks like it was added to glibc in 2017. I’m not sure if I haven’t looked at this since then, if the resources I was reading were just not up to date, or if I just forgot about this function.

                        2. 8

                          I do think high complexity is the source of many problems in sudo and that doas is a great alternative to avoid many of those issues.

                          I also think sudo will continue being used by many people regardless. If somebody is willing to write an implementation in Rust which might be just as complex but ensures some level of safety, I don’t see why that wouldn’t be an appropriate solution to reducing the attack surface. I certainly don’t see why we should avoid discussing Rust just because an alternative to sudo exists.

                          1. 2

                            Talking about Rust as an alternative is missing the forest for the memes. Rust is a viral language (in the sense of internet virality), and a brain worm that makes us all want to talk about it. But in actual fact, C is not the main reason why anything is broken - complexity is. We could get much more robust and reliable software if we focused on complexity, but instead everyone wants to talk about fucking Rust. Rust has its own share of problems, chief among them its astronomical complexity. Rust is not a moral imperative, and not even the best way of solving these problems, but it does have a viral meme status which means that anyone who sees through its bullshit has to proactively fend off the mob.

                            1. 32

                              But in actual fact, C is not the main reason why anything is broken - complexity is.

                              Offering opinions as facts. The irony of going on to talk about seeing through bullshit.

                              1. 21

                                I don’t understand why you hate Rust so much but it seems as irrational as people’s love for it. Rust’s main value proposition is that it allows you to write more complex software that has fewer bugs, and your point is that this is irrelevant because the software should just be less complex. Well I have news for you, software is not going to lose any of its complexity. That’s because we want software to do stuff, the less stuff it does the less useful it becomes, or you have to replace one tool with two tools. The ecosystem hasn’t actually become less complex when you do that, you’re just dividing the code base into two chunks that don’t really do what you want. I don’t know why you hate Rust so much to warrant posting anywhere the discussion might come up, but I would suggest if you truly cannot stand it that you use some of your non-complex software to filter out related keywords in your web browser.

                                1. 4

                                  Agree with what you’ve wrote, but just to pick at a theme that’s bothering me on this thread…

                                  I don’t understand why you hate Rust so much but it seems as irrational as people’s love for it.

                                  This is obviously very subjective, and everything below is anecdotal, but I don’t agree with this equivalence.

                                  In my own experience, everyone I’ve met who “loves” or is at least excited about rust seems to feel so for pretty rational reasons: they find the tech interesting (borrow checking, safety, ML-inspired type system), or they enjoy the community (excellent documentation, lots of development, lots of online community). Or maybe it’s their first foray into open source, and they find that gratifying for a number of reasons. I’ve learned from some of these people, and appreciate the passion for what they’re doing. Not to say they don’t exist, but I haven’t really seen anyone “irrationally” enjoy rust - what would that mean? I’ve seen floating around a certain spiteful narrative of the rust developer as some sort of zealous online persona that engages in magical thinking around the things rust can do for them, but I haven’t really seen this type of less-than-critical advocacy any more for rust than I have seen for other technologies.

                                  On the other hand I’ve definitely seen solid critiques of rust in terms of certain algorithms being tricky to express within the constraints of the borrow checker, and I’ve also seen solid pushback against some of the guarantees that didn’t hold up in specific cases, and to me that all obviously falls well within the bounds of “rational”. But I do see a fair amount of emotionally charged language leveled against not just rust (i.e. “bullshit” above) but the rust community as well (“the mob”), and I don’t understand what that’s aiming to accomplish.

                                  1. 3

                                    I agree with you, and I apologize if it came across that I think rust lovers are irrational - I for one am a huge rust proselytizer. I intended for the irrationality I mentioned to be the perceived irrationality DD attributes to the rust community

                                    1. 2

                                      Definitely no apology needed, and to be clear I think the rust bashing was coming from elsewhere, I just felt like calling it to light on a less charged comment.

                                    2. 1

                                      I think the criticism isn’t so much that people are irrational in their fondness of Rust, but rather that there are some people who are overly zealous in their proselytizing, as well as a certain disdain for everyone who is not yet using Rust.

                                      Here’s an example comment from the HN thread on this:

                                      Another question is who wants to maintain four decades old GNU C soup? It was written at a different time, with different best practices.

                                      In some point someone will rewrite all GNU/UNIX user land in modern Rust or similar and save the day. Until this happens these kind of incidents will happen yearly.

                                      There are a lot of things to say about this comment, and it’s entirely false IMO, but it’s not exactly a nice comment, and why Rust? Why not Go? Or Python? Or Zig? Or something else.

                                      Here’s another one:

                                      Rust is modernized C. You are looking for something that already exists. If C programmers would be looking for tools to help catch bugs like this and a better culture of testing and accountability they would be using Rust.

                                      The disdain is palatable in this one, and “Rust is modernized C” really misses the mark IMO; Rust has a vastly different approach. You can consider this a good or bad thing, but it’s really not the only approach towards memory-safe programming languages.


                                      Of course this is not representative for the entire community; there are plenty of Rust people that I like and have considerably more nuanced views – which are also expressed in that HN thread – but these comments certainly are frequent enough to give a somewhat unpleasant taste.

                                    3. 2

                                      Rust’s main value proposition is that it allows you to write more complex software that has fewer bugs

                                      I argue that it’s actually that it allows you to write fast software with fewer bugs. I’m not entirely convinced that Rust allows you to manage complexity better than, say, Common Lisp.

                                      That’s because we want software to do stuff, the less stuff it does the less useful it becomes

                                      Exactly. Software is written for people to use. (technically, only some software - other software (such as demoscenes) is written for the beauty of it, or the enjoyment of the programmer; but in this discussion we only care about the former)

                                      The ecosystem hasn’t actually become less complex when you do that

                                      Even worse - it becomes more complex. Now that you have two tools, you have two userbases, two websites, two source repositories, two APIs, two sets of file formats, two packages, and more. If the designs of the tools begin to differ substantially, you have significantly more ecosystem complexity.

                                      1. 2

                                        You’re right about Rust value proposition, I should have added performance to that sentence. Or, I should have just said managed language, because as another commenter pointed out Rust is almost irrelevant to this whole conversation when it comes to preventing these type of CVEs

                                      2. 2

                                        While I don’t approve of the deliberately inflammatory form of the comments, and don’t agree with the general statement that all complexity is eliminateable, I personally agree that, in this particular case, simplicity > Rust.

                                        As a thought experiment, world 1 uses sudo-rs as a default implementation of sudo, while world 2 uses 500 lines of C which is doas. I do think that world 2 would be generally more secure. Sure, it’ll have more segfaults, but fewer logical bugs.

                                        I also think that the vast majority of world 2 populace wouldn’t notice the absence of advanced sudo features. To be clear, the small fraction that needs those features would have to install sudo, and they’ll use the less tested implementation, so they will be less secure. But that would be more than offset by improved security of all the rest.

                                        Adding a feature to a program always has a cost for those who don’t use this feature. If the feature is obscure, it might be overall more beneficial to have a simple version which is used by the 90% of the people, and a complex for the rest 10%. The 10% would be significantly worse off in comparison to the unified program. The 90% would be slightly better off. But 90% >> 10%.

                                        1. 1

                                          The other issue is that it is a huge violation of principle of least privilege. Those other features are fine, but do they really need to be running as root?

                                    4. 7

                                      Just to add to that: In addition to having already far too much complexity, it seems the sudo developers have a tendency to add even more features: https://computingforgeeks.com/better-secure-new-sudo-release/

                                      Plugins, integrated log server, TLS support… none of that are things I’d want in a tool that should be simple and is installed as suid root.

                                      (Though I don’t think complexity vs. memory safety are necessarily opposed solutions. You could easily imagine a sudo-alike too that is written in rust and does not come with unnecessary complexity.)

                                      1. 4

                                        What’s wrong with EBNF and how is it related to security? I guess you think EBNF is something the user shouldn’t need to concern themselves with?

                                        1. 6

                                          There’s nothing wrong with EBNF, but there is something wrong with relying on it to explain an end-user-facing domain-specific configuration file format for a single application. It speaks to the greater underlying complexity, which is the point I’m making here. Also, if you ever have to warn your users not to despair when reading your docs, you should probably course correct instead.

                                          1. 2

                                            Rewrite: The point that you made in your original comment is that sudo has too many features (disguising it as a point about complexity). The manpage snippet that you’re referring to has nothing to do with features - it’s a mix between (1) the manpage being written poorly and (2) a bad choice of configuration file format resulting in accidental complexity increase (with no additional features added).

                                          2. 1

                                            EBNF as a concept aside; the sudoers manpage is terrible.

                                          3. 3

                                            Hello, I am here to derail the Rust discussion before it gets started.

                                            I am not sure what you are trying to say, let me guess with runaway complexity.

                                            • UNIX is inherently insecure and it cannot be made secure by any means
                                            • sudo is inherently insecure and it cannot be made secure by any means

                                            Something else maybe?

                                            1. 4

                                              Technically I agree with both, though my arguments for the former are most decidedly off-topic.

                                              1. 5

                                                Taking Drew’s statement at face value: There’s about to be another protracted, pointless argument about rewriting things in rust, and he’d prefer to talk about something more practically useful?

                                                1. 7

                                                  I don’t understand why you would care about preventing a protracted, pointless argument on the internet. Seems to me like trying to nail jello to a tree.

                                              2. 3

                                                This is a great opportunity to promote doas. I use it everywhere these days, and though I don’t consider myself any sort of Unix philosophy purist, it’s a good example of “do one thing well”. I’ll call out Ted Unangst for making great software. Another example is signify. Compared to other signing solutions, there is much less complexity, much less attack surface, and a far shallower learning curve.

                                                I’m also a fan of tinyssh. It has almost no knobs to twiddle, making it hard to misconfigure. This is what I want in security-critical software.

                                                Relevant link: Features Are Faults.

                                                All of the above is orthogonal to choice of implementation language. You might have gotten a better response in the thread by praising doas and leaving iron oxide out of the discussion. ‘Tis better to draw flies with honey than with vinegar. Instead, you stirred up the hornets’ nest by preemptively attacking Rust.

                                                PS. I’m a fan of your work, especially Sourcehut. I’m not starting from a place of hostility.

                                                1. 3

                                                  If you want programs to be more secure, stable, and reliable, the key metric to address is complexity. Rewriting it in Rust is not the main concern.

                                                  Why can’t we have the best of both worlds? Essentially a program copying the simplicity of doas, but written in Rust.

                                                  1. 2

                                                    Note that both sudo and doas originated in OpenBSD. :)

                                                    1. 9

                                                      Got a source for the former? I’m pretty sure sudo well pre-dates OpenBSD.

                                                      Sudo was first conceived and implemented by Bob Coggeshall and Cliff Spencer around 1980 at the Department of Computer Science at SUNY/Buffalo. It ran on a VAX-11/750 running 4.1BSD. An updated version, credited to Phil Betchel, Cliff Spencer, Gretchen Phillips, John LoVerso and Don Gworek, was posted to the net.sources Usenet newsgroup in December of 1985.

                                                      The current maintainer is also an OpenBSD contributor, but he started maintaining sudo in the early 90s, before OpenBSD forked from NetBSD. I don’t know when he started contributing to OpenBSD.

                                                      So I don’t think it’s fair to say that sudo originated in OpenBSD :)

                                                      1. 1

                                                        Ah, looks like I was incorrect. I misinterpreted OpenBSD’s innovations page. Thanks for the clarification!

                                                  1. 3

                                                    When I think of the phrase “spooky action at a distance” with respect to programming, the thing that always comes to my mind is mutable state. I know of no better analogy within programming to quantum mechanics’ “spooky action at a distance” than mutable state, though admittedly I know next to nothing about quantum mechanics. Mutable objects in programming seem a lot like objects that have been “quantum entangled”. I think about this a lot when I have reason to share a piece of mutable state between two objects, such as when writing a unification-based type system (which one of my current projects includes). I find it interesting to think about how the type of some expression, as a type variable, can propagate down different branches of a program tree. Then at some point the type checker unifies the type on one branch to something (more) fully specified, and suddenly, spookily, the type of expressions in a distant branch are similarly specified.

                                                    Not that I think mutable state is inherently bad – it’s a great engineering tool that is often overly maligned by functional programming purists. But I do think, just like good design of countless other things in programming, exactly how to use mutable state with clarity rather than confusion requires good taste and judgment. (This is not a claim that I necessarily have the best taste and judgment.)

                                                    Now I’d like to nitpick a statement from the OP, even though others are commenting on the same statement:

                                                    x + y

                                                    If this were written in C, without knowing anything other than the fact that this code compiles correctly, I can tell you that x and y are numeric types, and the result is their sum.

                                                    In C, without using a particular compiler that specifies particular semantics beyond the standard, you can not know the result (or even the behavior of the surrounding code or entire program!) of a + b without knowing the dynamic state of the running program, because a + b can result in undefined behavior. There is no programming language more spooky than C.

                                                    1. 2

                                                      I don’t think it’s mutable state by itself that’s the problem, it’s aliased mutable state. In C, I can write code like this:

                                                      int *a = &foo;
                                                      int *b = a + 42;
                                                      b[2] = 12;
                                                      

                                                      And this changes the value of a[54], even though I never used the name a in my mutation. That’s action at a distance and, to me, a good language should provide some static type information to tell you when things might be aliased (neither C nor C++ does a good job here).

                                                      Aliasing between concurrent execution contexts is the worst case of this. In C, there’s no protection against this at all and (worse) the language says it’s undefined behaviour if two threads concurrently update a variable and it isn’t _Atomic qualified. Shared mutable state is the worst possible kind of spooky action at a distance: you can step through a thread one instruction at a time and still see a result that you couldn’t explain from looking at that thread’s code. This is why Verona makes it impossible to express concurrently mutable state.

                                                      1. 1

                                                        I agree, without mutable state a function can be efficient, or inefficient, but never really spooky.

                                                        1. 1

                                                          Regarding your mention of unification state propagating through different branches of the program tree: you may be interested in this paper, which defines a type system that’s strictly compositional (the type of an expression is based entirely on the type of its subexpressions).

                                                        1. 26

                                                          https://rash-lang.org

                                                          Full disclosure: I wrote it.

                                                          The documentation isn’t great, and I need to write a new line editor to improve interactivity. But it’s got a lot of strengths that are rare among shells or unique to Rash.

                                                          1. 2

                                                            Wow, this looks neat!

                                                            1. 2

                                                              https://rash-lang.org

                                                              I am very impressed by this concept and look forward to kicking the tires, if I can scrape together some time.

                                                              1. 1

                                                                If your line editor is generic, you could just point users to rlwrap

                                                                1. 3

                                                                  Users can use rlwrap currently, and the current repl implementation uses a libreadline ffi wrapper library that comes with Racket. So there is some line editing and even very basic completion for file names. However, neither of those is great. While rlwrap and readline provide basic editing, rlwrap provides no good completion. The readline library is more programmable, but the ffi library in Racket doesn’t expose its full power.

                                                                  Either way, I ultimately want a much more powerful line editor that’s more like a little emacs. If you look at zsh, for example, it has a fairly fancy line editor that is programmable and provides fancy programmable completion, cool filtering tools like zaw, various different ways to search through history, etc. Zsh’s line editor is basically the reason it’s way cooler than Bash. At the same time, you have to program it in the zsh shell language. Ugh.

                                                                  If I write a nice programmable line editor in Racket, it will not only be programmable in a nicer language (Racket, Rash, or most any Racket-based language), but it will be able to hook into the Rash parser, analyze the resulting syntax objects, etc, for richer editing, highlighting, and completion. And while my main use for it will be for Rash, I intend to make it usable for essentially arbitrary Racket languages. The line editor for the default Racket language isn’t particularly great, and there is currently no off-the-shelf support for repls that use non-s-exp syntax. So a nice new line editor is generally needed in the Racket world.

                                                              1. 4

                                                                While I can generate some better options than the author (unset pipefail in a subshell containing the pipeline for example) none of them actually overcome the objection that pipefail creates this footgun (in exchange for fixing the foot gun left by its absence).

                                                                As the author says, it would be really nice to have a pipefail that can be parametrised on which signals if any caused exit, or on exit codes.

                                                                1. 3

                                                                  (Edit: Oh, silly me, I realize now that SIGPIPE has a standard exit code. So… go ahead and ignore the first paragraph.)

                                                                  Having the shell track whether the failure was due to SIGPIPE unfortunately can’t generally work because it’s the generate program that has to handle it. One could imagine a protocol whereby programs respect some DONT_ERROR_ON_SIGPIPE environment variable or something, but that would be unreliable.

                                                                  For what it’s worth, my shell language Rash does have a way to specify for each pipeline whether to care about the exit codes of intermediate subprocesses. The default is to care for each subprocess that terminates by the time the last subprocess does. This, of course, leads to things like potential SIGPIPE races. However, each choice has its downsides – always caring about each subprocess means you have to wait for each to complete, which can cause some pipelines to stall while they wait for a program that was written to run forever until killed. Never caring is more obviously a bad idea. However, it also has a way for each pipeline stage to determine whether the exit code is successful (with a list, or with an arbitrary predicate function). You can define names (eg. aliases) to re-use that logic without rewriting it every time. But frankly handling the ad-hoc conventions for exit codes, termination conditions, etc, for each program you call from a shell is a bit of a mess no matter what tool you use.

                                                                1. 3

                                                                  Thanks for posting. This articulates something my own mind has been circling since reading those same threads (though I had connected fewer of the dots, so this was illuminating).

                                                                  I’d be curious to hear from anyone who concurs with the central premises of this post but has replaced their use of a more traditional shell (e.g. ksh, bash, dash, zsh, or their predecessors) with something like fish, xonsh, oil, etc.

                                                                  If you are not particularly concerned with needing the shell as a tool readily at your disposal when you connect to unknown and arbitrary remote systems, but only as a tool to converse with your primary workstation (as a “power user” first and a sysadmin only by its practical usefulness), how does the ground shift around what is valuable to invest your time and knowledge into?

                                                                  1. 4

                                                                    I’d be curious to hear from anyone who concurs with the central premises of this post but has replaced their use of a more traditional shell (e.g. ksh, bash, dash, zsh, or their predecessors) with something like fish . . . how does the ground shift around what is valuable to invest your time and knowledge into?

                                                                    👋 I’ve been using Fish for about 8 years, now, I guess. I’ve always had an intuitive understanding of “the shell” that’s aligned with the description in this post. That is: a terse way to orchestrate interactions with the OS — typically, one interaction at a time. But I can’t say that I make a deliberate effort to learn anything about it, because I’m almost always task-driven.

                                                                    My usage is usually iterative: run this test. Okay, now run it with the following env vars set, to change it’s behavior. Now again, capturing profiles, running pprof, and rg’ing for total CPU time used. Now again, but add a memory profile. Now again, but output all of the relevant information in a single line with printf. Now again, but vary this parameter over these options. Now again, but vary this other parameter over these other options. Now again, repeating everything 3 times, tabulate the output with column -t, and sort on this column. Oops, tee to a file, so I can explore the data without re-running the tests.

                                                                    Each of these steps is hitting up-arrow and editing the prompt. Fish is a blessing because it makes this so nice: the actual editing is pleasant, and the smart history means even if I don’t save this stuff in a file, I can easily recall it and run it again, even months later, with no effort.

                                                                    I don’t know if this actually answers your question… maybe it does?

                                                                    1. 2

                                                                      During my last year as an undergrad, 2004-2005, I used a Perl-based shell. It was very much like a REPL: both a REPL for Perl and a REPL for Linux. I loved it. I don’t recall why I stopped using it, though the reason is probably as simple as “I lost my primary workstation to a burglar who took almost everything of value”. I was also starting to migrate away from Perl at the time. At the time, it was great, because my deep knowledge of Perl was directly translatable to shell use.

                                                                      What I’d really love is scsh with interactive-friendly features.

                                                                      1. 4

                                                                        What I’d really love is scsh with interactive-friendly features.

                                                                        Hi, I’m the author of Rash. It’s a shell embedded in Racket. It’s designed for interactive use and scripting. It’s super extensible and flexible. And I need to rewrite all of my poor documentation for it.

                                                                        Currently the line editor I’m using for interactive Rash sessions leaves a lot to be desired, but eventually I plan to write a better line editor (basically a little emacs) that should allow nice programmable completion and such.

                                                                        Also, salient to the OP, job control is not yet supported, though that has more to do with setting the controlling process group of the host terminal. You can still run pipelines in the background and wait for them, you just can’t move them between the foreground and background and have the terminal handle it nicely.

                                                                        Replying more to the parent post, for the few scripts that I really need to run on a system that I haven’t installed Rash on, I write very short scripts in Bash. But realistically I just treat Rash as one of the things I need to get installed to use my normal computing environment with extra scripts. And a lot of scripts are intended to run in a specific context anyway – on some particular machine set up for a given purpose where I have already ensured things are installed correctly for that purpose. Writing scripts in Rash instead of Bash is nice because my scripts can grow up without a rewrite, and because as soon as I hit any part of the program that can benefit from a “real” programming language I can effortlessly escape to Racket. Using Rash instead of plain Racket (you could substitute, say, Python instead if you want) is nice because I can directly type or copy/paste the commands I would use interactively, with the pipeline syntax and everything. In practice, Rash scripts end up being a majority of normal Racket code with a few lines of shell code – most scripts ultimately revolve around a few lines doing the core thing you were doing manually that you want to automate, with the bulk of the script around it being the processing of command line arguments, munging file names, control flow, error handling… lots of things that Bash and friends do poorly.

                                                                        1. 1

                                                                          Thanks for making this, and for pointing it out here!

                                                                        2. 3

                                                                          I came to open source and the scripting world first through Perl, and that journey taught me about, and more importantly to think “in” data structures such as arrays and hashes, and combinations of those. For that I’ll be ever grateful - plus the community was absolutely wonderful (I attended The Perl Conference back in the day, and was a member of London Perl Mongers). Now I’m discovering more about the shell and related environment, such as Awk and Sed, I’m looking at Perl again through different eyes (as in some ways it’s an amalgam of those and other tools).

                                                                        3. 1

                                                                          Thanks, yes this has been brewing for a while in my head, and I finally found the opportunity to write it down. I would also be curious to hear from folks about what you say above, for sure. Always learning. Cheers.

                                                                        1. 10

                                                                          Admittedly I’m a Lisp-head, but this advice makes so little sense to me it’s almost funny. It makes more sense to me to prototype in a high-level language, why involve C at any step, unless your explicit goal is to be compatible with C (and even then…)? Most of the historical 70s and 80s AI research which included a lot of language experimentation was done in Lisp, for good reason:

                                                                          • By leveraging the Lisp reader, you get to punt initially on one of the trickiest parts of language implementation: parsing. This gives you the freedom to focus on the substance (semantics) rather than the looks (syntax). Hopefully, if you’re working on PL research, you have better ideas than just new syntax!
                                                                          • By leaning heavily on the host runtime, you get garbage collection and a nice runtime for free. Same as above. You can later gradually fill out the runtime so it can become self-hosted.
                                                                          • AST manipulation is quite easy in a language optimized for list/tree operations.

                                                                          This is a great combination to get you to the juicy part, which is the semantics. You can figure out if your language works at all, and it’s easy to swap out different parsers or change aspects of the compiler radically, which is exactly what you’d want to be able to do when you’re exploring the design space. Of course, nowadays many of the above advantages are offered by other high level languages besides Lisp. Other commenters have already mentioned Haskell, OCaml and Python, all of which are fine ideas. But this post seems to suggest doing the second (and maybe also first; yacc is mentioned) implementation in C which to me sounds awfully backwards.

                                                                          1. 1

                                                                            By leveraging the Lisp reader, you get to punt initially on one of the trickiest parts of language implementation: parsing. This gives you the freedom to focus on the substance (semantics) rather than the looks (syntax). Hopefully, if you’re working on PL research, you have better ideas than just new syntax!

                                                                            Syntax heavily influences semantics. Lisp cannot parse all syntaxes. For example, let’s say I’m writing a shader language. Those are often represented as graphs. I could for example structure my programs like this:

                                                                            ___________               ___________
                                                                            | input x |               | input y |
                                                                            ___________               ___________
                                                                                  |         ________        |
                                                                                  | - - - > |  add | <- - - |
                                                                                            ________
                                                                                                |
                                                                                                |
                                                                                                v
                                                                                           ___________
                                                                                           | output z |
                                                                                           ___________
                                                                            

                                                                            Lisp parser definitely can’t make out anything useful from that. You’ll either need to change the syntax, and with it the semantics (e.g. the pipes now need to be named, or you cannot reuse the output twice).

                                                                            C is only mentioned here as a language to create a bootstrapping compiler, the main purpose of which is to keep it easy to get your main compiler (the third one, written in the language itself) to your system, without trusting someone else’s binaries. C is quite an ubiquitous language, so probably already have a C compiler for that system, and therefore will only need to compile the second compiler to get the third and the main one to compile.

                                                                            1. 2

                                                                              Not being familiar with shaders, these diagrams don’t really ring a bell. But AFAIK normally shader languages are representable in textual form, which basically means by definition you can turn it into s-expressions. Naively, I would say your diagram show just (define z (add x y)), which allows you to re-use z as many times as you like. I’d need a whole lot more context to figure out what you mean here, so you don’t have to bother unless this is somehow central to your point.

                                                                              C is only mentioned here as a language to create a bootstrapping compiler, the main purpose of which is to keep it easy to get your main compiler (the third one, written in the language itself) to your system, without trusting someone else’s binaries.

                                                                              Having a second compiler implemented in C which is a subset of the actual language sounds like a whole lot of extra maintenance work as you would need to develop it in lock-step with your third, final compiler. Makes much more sense to have C as (one of the) compilation target, instead.

                                                                              1. 1

                                                                                Shaders are mainly used in graphics programming, and can be though as procedures that are run over a field of inputs that output another field of outputs And maybe the simple example didn’t quite show what I meant. You got it right, this shader just takes two named inputs and outputs the sum of them into a named output, and it’s quite simple to convert it to a textual representation without changing semantics. But take this example:

                                                                                ___________               ___________
                                                                                | input x |               | input y |
                                                                                ___________               ___________
                                                                                      |         ________        |
                                                                                      | - - - > |  add | <- - - |
                                                                                                ________
                                                                                                 |    |       __________         _____________
                                                                                                 |    | - - > | divide | < - - - | literal 8 |
                                                                                                 |            __________         _____________
                                                                                                 |                 |
                                                                                                 |                 |
                                                                                                 v                 v
                                                                                            ___________      ____________
                                                                                            | output z |     | output w |
                                                                                            ___________      ____________
                                                                                

                                                                                When converting this to simple textual representation, you now have to choose, should the result of add be assigned to a variable for reuse, or should we disallow the reuse of a result. The assignment to a variable changes the semantics, since in the graphical representation, the pipes are anonymous. Disallowing the reuse quite obviously changes the semantics as well.

                                                                                Having a second compiler implemented in C which is a subset of the actual language sounds like a whole lot of extra maintenance work as you would need to develop it in lock-step with your third, final compiler.

                                                                                You should not develop the second compiler “in lock step” with the third. You should keep it up to date, still implementing all the features described in the standard, but without anything fancy, just enough to compile your third compiler, because hopefully, at that point, your language itself has stopped having major changes itself and has majority of it’s changes in the standard library and similar things.

                                                                                Makes much more sense to have C as (one of the) compilation target, instead.

                                                                                Compiling things into C in my opinion is holding a shotgun to your foot. C has a lot of undefined behavior, and it can get very hard to make the compiler not hit any of that. I’m not saying it’s not possible, but it’s not practical to have a such a high maintenance feature that has so few uses. Also, notably, this doesn’t get rid of the trust problem, since this machine generated C wouldn’t be readable, so you are still trusting someone who ran the tool.

                                                                                1. 4

                                                                                  The assignment to a variable changes the semantics, since in the graphical representation, the pipes are anonymous. Disallowing the reuse quite obviously changes the semantics as well.

                                                                                  That’s not quite right. Having the parser (or some other stage of a compiler) add automatically generated, unique names to “anonymous” program points doesn’t change the semantics of the language the user is using, since there is no way for the user to write that unique name in the input program. In fact, compilers for all kinds of languages automatically generate names for intermediate values all the time, and that’s tame compared to other transformations that an industrial strength compiler will do (eg. change loop structure, convert to continuation passing style or static single assignment, etc). However, that doesn’t change the semantics of the input language because you can’t write code in your input program that targets the transformed (intermediate or output) language code, only the input language. IE you can’t target the CPS-transformed code in intermediate compiler stages any more than you can target the intermediate names given to “anonymous” values.

                                                                                  The top-level comment is absolutely right about the benefits of using something like s-expressions (and a high level language with rich data structures, garbage collection, etc) to go quickly to working on semantics. There may be many reasons you may ultimately want a syntax other than s-expressions, but s-expressions are a great intermediate format for quick compiler prototyping. Once you’ve nailed down the semantics, using the simplified intermediate syntax of s-expressions, you can add support for the “real” syntax by adding a parser to transform your input syntax into s-expressions. If you are making a new language that is really just trying to innovate syntax, then sure, you probably want to spend your time prototyping various parsers. If your language is doing something interesting and new with semantics, however, then fussing about with some complicated parser just to get started on prototyping your semantics is a waste of time. And if what you really want to do is write parsers, in 2020 I still can’t recommend Lex and Yacc to anyone.

                                                                                  Additionally, as a Racket user I’m spoiled by its #lang protocol. The #lang system allows you to specify a parser for your language provided by any arbitrary library. Thus any user can add their own new syntax, including graphical syntaxes. There is still research work to be done on aspects of mixing languages of different syntax (eg. I am writing in graphical language A but I’m getting an error from a library written in algol-ish language B, how should the error be displayed?), but it’s already pretty great. After using it for a while, not only do you get to see how shallow a lot of syntax sugar is, but you start to think of it like any other (semi-) orthogonal, replaceable component of programs you write, such as your choice in data structures, utility libraries, databases, etc. In other words, you start to consider the question of whether different syntax options will be more helpful (or harmful) for writing a particular program or component, independently of various other considerations. Spoiler alert: most general-purpose code is best just written in s-expressions, possibly with some minor extra conveniences (such as the standard quote, quasiquote, unquote shorthands), while heavier syntax really comes into its own for domain-specific languages.

                                                                                  Compiling things into C in my opinion is holding a shotgun to your foot. C has a lot of undefined behavior, and it can get very hard to make the compiler not hit any of that.

                                                                                  This is quite similar to the problem of, well, writing things in C in the first place. Except that in this case the C shotgun is only an optional target, and it is perhaps easier to guard against undefined behavior within a few code generation templates than in the thousands of lines you need to write by hand to write a compiler in C. (Actually, keeping a template UB-free is surely harder, but fixing a bug in a template fixes all uses of that template, and you will simply have less code in C-target templates than in an entire compiler written in C.)

                                                                                  1. 1

                                                                                    The given example is not possible to replicate in s-expressions without adding a variable. If my language does not have variables, then I cannot use lisps parser for prototyping my language. The fact that the compiler adds in variables is an implementation detail and should be completely irrelevant to the user. You cannot move the language I gave to s-expressions without changing the semantics.

                                                                                    Racket does have a neat system for DSL’s. That does make it different from other Lisps which from what I’ve seen just make you shove it into s-expressions. Yet I feel like your lisp love is a little bit overreaching. S-expressions are really hard to parse for a human. They might be very easy for a computer, but the parenthesis sea isn’t too easy to easily discern and requires quite a bit of tooling from the IDE to work comfortably with. I believe that there are better syntax conventions that are more suited for enabling code understanding and allowing code to be read more easily.

                                                                                    I feel like trying to guard against undefined behavior in C in an automated fashion is a lot more difficult than writing a minimal effort compiler/interpreter that is able to compile the last compiler. Also, C assumes quite a few things, and if your language design doesn’t agree with them then you will have some bad time dealing with that.

                                                                                    1. 3

                                                                                      The given example is not possible to replicate in s-expressions without adding a variable. If my language does not have variables, then I cannot use lisps parser for prototyping my language.

                                                                                      This is a distinction without a serious difference. Perhaps you will actually be prototyping an intermediate language that has a certain feature you won’t make available in the outer language, or that lacks a certain feature that you will add to the outer language via syntax sugar.

                                                                                      But at a lower level, your graph language is already stored in some serialization format. The given example is not possible to serialize without adding a variable (IE some kind of identifier that can be referenced). Maybe you serialize it to something like:

                                                                                      {"nodes": [
                                                                                        {"id": 1, "text": "input x", "vertices": [2]}
                                                                                        {"id": 2, "text": "add", "vertices": [4, 5]}
                                                                                        ...
                                                                                      ]}
                                                                                      

                                                                                      You need some kind of ID for each node in the graph so other nodes can refer to them because you can’t nest the nodes cleanly as a tree. Now let’s squint our eyes a little:

                                                                                      (1 "input x" (2))
                                                                                      (2 "add" (4 5))
                                                                                      ...
                                                                                      

                                                                                      Lo, s-expressions!

                                                                                      S-expressions are really hard to parse for a human.

                                                                                      This is the quintissential complaint against lisp. And… it’s just wrong! I’ve never met someone who has seriously used a lisp (eg. for a job, or really anything more than just “I have to use this wacko language for some university course”) that has ever struggled with this. This is discussed in bewilderment at every lisp-related meetup – “Wow, I came in to it thinking the parentheses were a big problem, but quickly realized it was no problem!”, “We hire people with no lisp experience all the time, and the syntax is never the roadblock to learning Clojure.”, etc.

                                                                                      (On the other hand, students who grudgingly learn just enough to get by for a class frequently struggle with it. As a TA for a class like this, I saw it frequently.)

                                                                                      Yes, s-expressions are hard for humans to parse when they are formatted wrong (or minified). Poorly formatted (or minified!) Javascript, C, etc is harder for humans to parse! I promise you (or almost anyone) that if you learn the standard rules for Lisp formatting (there are just a couple simple rules, far fewer than for formatting Algol-like languages), then stick to those formatting rules for reading and writing lisp, all your lisp reading problems will go away.

                                                                                      (Well, all your lisp reading problems relative to reading other languages – you still have to know how to read, probably in English, it’s easier if you can see rather than read Braille, etc. Deeply nested expressions using lots of custom-defined functions are difficult in any language, because you have to untangle what all these different functions mean. But that’s true of any language.)

                                                                                      Beginning CS courses (and books, etc) tend to cover language syntax for a lot of time. What does x[y] mean? What about curly braces? What about curly braces in another context? Where does the semicolon go? Oh, semicolons aren’t necessary after braces for a block, but they are necessary after a struct declaration? And, heaven help us, how do you parse x + y && a + b? And this is all after literally more than a decade of repeated instruction in algebra notation, order of operations, etc.

                                                                                      University Lisp courses tend to assume “you’ll just pick it up”, or maybe spend a fraction of a lecture on syntax. If people had as much instruction in lisp syntax as they get with C syntax it would not be a problem.

                                                                                      Before you say I’m just ignorant and living in a lisp bubble, let me say that I truly do sympathize with the complaint of s-expressions being hard. The first time I used a lisp was for a university programming languages course. I hated it! I really, honestly struggled with the syntax. (Of course, I tried to format it like C.) It was bad enough that I badmouthed lisp for years after that. But then I realized: “Hey, thousands of people use this, love it, and not only claim the syntax is not a problem but actually a very strong benefit. I know people who have defended lisp to me after I made fun of it, and they aren’t unrelatable super geniuses. Maybe I should actually give it serious consideration rather than adopting popular antogonistic beliefs about it after a brief, poorly instructed experience with it?” When I finally sat down to seriously learn lisp and understand its weird syntax, it ended up taking less than a day to get pretty comfortable with it.

                                                                                      I’m not claiming that your (potential) negative experience with lisp matches mine, or that the ability to read and write lisp code isn’t something that you need to take some time to learn. But learning to read and write lisp is not that hard. I submit that it’s significantly easier than many things lots of programmers learn. I think it’s significantly easier than learning how to type, learning how to use vim, or learning bash or similar shell languages (actually learning the bulk of the syntax, how substitutions work, etc, not just how to run some simple commands). Thinking about it for a moment, I think it’s actually pretty similar in dificulty to learning to use a tiling window manager. Not that hard, and well worth it.

                                                                                      They might be very easy for a computer, but the parenthesis sea isn’t too easy to easily discern and requires quite a bit of tooling from the IDE to work comfortably with.

                                                                                      It is true that manually formatting lisp code tends to be more tedious than manually formatting languages like Javascript. But lisp auto-indenters are relatively easy to write and widely available in text editors. Your favorite editor probably already has one. Auto-indentation is the single invaluable bit of editor/IDE tooling that’s required.

                                                                                      Of course, once you also start using an editor extension that automatically keeps your parentheses balanced, then start using something like Paredit (I prefer SmartParens) to get 95% of the way to structural editing, then editing other syntaxes starts to feel like a chore when you can effortlessly navigate and transform your code structurally, always preserving syntactic correctness. Admittedly Paredit and friends take some practice, but honestly you can master the essentials within a couple of hours, then just conciously make an effort to use it a little each work day over a short period so you remember it.

                                                                                      1. 1

                                                                                        I could totally write a parser that parses the text as I pasted and interprets it. Would it be easy? No. Would it be easy to use? No. But syntax is making a contract with the user, and I don’t want to make a contract that contains any kind of ID’s. No non-graphical representations can do that. The fact that the graph itself might be stored as a JSON file with ID’s is not relevant, as it is not part of the contract. Yes, you can theoretically express every data structure with s-expressions. You could just as well express every data structure with nested dictionaries or a bunch of bytes with pointers. It’s like being Turing complete: it means that you can do things, but not that it’s fit for all or any of them.

                                                                                        Parsing isn’t just reading the data that is written. It is also understanding what data is written. In lisp there are no visual distinction from instructions and data - two things that usually don’t mix in general programming. Most other languages do, because it is important for one to quickly see what is a simple data extraction, and what is processing. Most languages have distinct looks for separate ways to look up data. Most languages have distinct looks for separate ways to store data. S-expressions don’t leave any space for those distinctions.

                                                                                        The fact that you need to format the code to be readable isn’t good. Self formatting syntax, like for example Python, doesn’t leave any chances for that. Yes, I know, Python syntax has some bad parts, don’t need to mess about that.

                                                                                        About learning: sometimes, powerful tools take a lot of time to learn how to use. But when you do know how to use it, it can be tens of times more useful than a tool that’s simple to learn. Professional tools take the time to learn them.

                                                                                        If you have to write programs to help you manage the amount of parenthesis to write your code, maybe the syntax isn’t that fit for humans. As I said, great for machine consumption, not that good for humans.

                                                                                        1. 3

                                                                                          It’s unfortunate that this is turning into another Lisp/s-expression syntax flamewar. It looks like you already decided that you don’t even want a textual representation for this language. This means that the question of using s-expressions or not is completely besides the point.

                                                                                          My original point was that you can punt on the “syntax” initially (which in your case would probably be some sort of visual diagram editor), and go straight to the usually more interesting parts. In practice, for your case that would mean simply defining a format for easy representation, whether that’s using s-expressions or JSON or a serialized Python dict with intermediate IDs.

                                                                                          All this means you don’t even have to spend time to build a parser, you can go straight to the part where you are trying out compiler techniques to see if this is even a viable thing to do. It wasn’t about whether you find this useful or readable for the end-user. Remember, Lisp itself was originally supposed to get a non-parenthesized surface syntax with M-expressions, the syntax we currently use was originally no more than an intermediate syntax which made thinking about language semantics easier.

                                                                                          1. 1

                                                                                            It does have a textual representation - the ASCII art I pasted. You can write a parser to parse that. It won’t be easy, and defining the syntax will probably be a full research paper because all the current syntax definition languages only work in a single linear dimension, while this definition would have to work in 2d space. This would actually be a pretty interesting problem to take on now that I think about it.

                                                                                            Also, m-expressions actually look pretty good, and are a definite improvement from the s-expression syntax in my eyes. If only lisp actually had them implemented from the start…

                                                                                            1. 1

                                                                                              It’s unfortunate that this is turning into another Lisp/s-expression syntax flamewar.

                                                                                              Sorry about that. Somehow last night I was just in a mood to keep replying.

                                                                                            2. 2

                                                                                              About learning: sometimes, powerful tools take a lot of time to learn how to use. But when you do know how to use it, it can be tens of times more useful than a tool that’s simple to learn. Professional tools take the time to learn them.

                                                                                              I 100% agree.

                                                                                              Of course, I think s-expressions are one of those powerful 10x professional tools (for language prototyping, or for more general programming when paired with macros).

                                                                                              (10x is probably an exaggeration, or you only get 10x for certain applications that are a minority of what you do, for s-expressions and each of those other things listed. The most productive programmer I know doesn’t use vim, barely uses any features of emacs, doesn’t use a tiling window manager, and doesn’t even type properly. Of course, I still think all of those things are worth learning and make me more efficient than I would otherwise be.)

                                                                                              I could totally write a parser that parses the text as I pasted and interprets it. Would it be easy? No. Would it be easy to use? No. But syntax is making a contract with the user, and I don’t want to make a contract that contains any kind of ID’s.

                                                                                              I thought we were discussing building a first prototype of a new programming language, not making contracts to users about an esolang.

                                                                                              The fact that you need to format the code to be readable isn’t good.

                                                                                              Every language needs formatting to be readable. In general, every programmer either lets a program (their editor) automatically format (indent, mostly) their code, or they do it manually. We give it little thought and do it nearly unconciously in C and in Lisp.

                                                                                              Do you like to read code like this[1]:

                                                                                              struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
                                                                                              __acquires(rq->lock)
                                                                                              {
                                                                                              struct rq *rq;
                                                                                              lockdep_assert_held(&p->pi_lock);
                                                                                              for (;;) {
                                                                                              rq = task_rq(p);
                                                                                              raw_spin_lock(&rq->lock);
                                                                                              if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) {
                                                                                              rq_pin_lock(rq, rf);
                                                                                              return rq;
                                                                                              }
                                                                                              raw_spin_unlock(&rq->lock);
                                                                                              while (unlikely(task_on_rq_migrating(p)))
                                                                                              cpu_relax();
                                                                                              }
                                                                                              }
                                                                                              

                                                                                              Nobody writes code like that, everybody indents their code. Furthermore, people indent their code in standard ways, not willy-nilly adding arbitrary amounts of indentation to whatever line, they follow logical rules.

                                                                                              Learning the standard way to indent (and use indentation for reading) Lisp code is no different than learning the standard way to indent Algol (C, Javascript, Python, …) code. It’s just a different set of rules that programmers follow practically unconciously once they’ve internalized them.

                                                                                              Self formatting syntax, like for example Python, doesn’t leave any chances for that.

                                                                                              Python just makes some instances of poor formatting a syntax error. You still have to format it!

                                                                                              If you have to write programs to help you manage the amount of parenthesis to write your code, maybe the syntax isn’t that fit for humans. As I said, great for machine consumption, not that good for humans.

                                                                                              1: The only program you “need” is the auto-indenter. This simple program was written decades ago. The other programs just make you more efficient. IE they are unnecessary but useful “professional 10x tools” that you can learn, similar to to Vim. 2: You need a program (text editor) to write <C, Javascript, Python, whatever> code? It must not be fit for humans. You need a program (compiler, interpreter) to be sure your types (or even your syntax) are correct? It must not be fit for humans. You need a program to actually run your program? Not fit for humans. You need a program to send instant messages? Instant messaging must not be fit for humans. This is a silly argument. You need programs to do anything with a computer, and it’s just as easy to argue that some program (or part of a program) in some other PL toolchain is a boondoggle that makes it somehow unfit for humans. Having a machine check your parentheses is no different than having a machine check your that your program is correct in other ways, and every programmer uses such programs in an edit/run/debug loop.

                                                                                              Again, these arguments about lisp syntax being difficult or unsuited for humans are only made by people who have never seriously used a lisp. Thousands of normal, everyday humans read and write lisp code quite naturally. We read and write it on white boards, we read and write it in any and every text editor, we read and write it in REPL sessions, we read and write it in IDEs that draw fancy arrows. We make funny music videos about reading and writing lisp[2]. Children read and write lisp. Humans have been quite capably reading and writing Lisp since before 1960. Lisp is very suited to human use. Reading arguments about how lisp is not suited for humans is like reading arguments that human flight is impossible, or how you can’t make a wind powered vehicle that goes directly down wind faster than the wind. People have made many polished arguments to prove those points. But in the end you don’t need to wade through lots of fancy arguments to see that they are demonstrably false – you can just look at all the people that are doing these things.

                                                                                              [1] This is a slight modification of GPL-2 licensed code available in the Linux kernel in the kernel/sched/core.c file.

                                                                                              [2] This music video deserves to be shoehorned into programming conversations more frequently than it is. It’s great. https://www.youtube.com/watch?v=HM1Zb3xmvMc

                                                                                              1. 1

                                                                                                Every language needs formatting to be readable. In general, every programmer either lets a program (their editor) automatically format (indent, mostly) their code, or they do it manually. We give it little thought and do it nearly unconciously in C and in Lisp. Do you like to read code like this[1]:

                                                                                                No I do not. I also did not say that C syntax is good for this reason, I said it about Python. Python is a huge syntactical leap from other C-style languages (though Python syntax isn’t perfect, and I particularly don’t like that you can mix tabs and spaces which can bring a bunch of ambiguity). Languages with the Off-side rule are indented correctly by default. So I read code like this[1]:

                                                                                                if get_load_dotenv(load_dotenv):
                                                                                                    cli.load_dotenv()
                                                                                                
                                                                                                    # if set, let env vars override previous values
                                                                                                    if "FLASK_ENV" in os.environ:
                                                                                                        self.env = get_env()
                                                                                                        self.debug = get_debug_flag()
                                                                                                    elif "FLASK_DEBUG" in os.environ:
                                                                                                        self.debug = get_debug_flag()
                                                                                                

                                                                                                Yes, can remove some whitespace, but removing the whitespace between the operators don’t hurt the readability as much as the lack of indentation. This is not provided with languages that don’t do that so the programmers have to rely on conventions, which is not always possible. In lisp this problem is exacerbated by the fact that there is only a single syntactical character, blurring the lines between control flow and method calls even more. Scheme did recognize this and have a very good write-up of the problems in one of the proposals to add indentation-based syntax to Scheme.

                                                                                                My dad still has his old notebook with hand-written FORTRAN code from his university times. It’s not too rare for me to scribble some code on my notebook when I don’t have a computer around(not too often of an occasion with the pandemic around). I can read, understand, and quickly check Python, C, FORTRAN for syntactical correctness on paper. With Lisp, I have to resort to the long and error-prone process of counting the parentheses. To be completely honest, I don’t understand how Lisp has survived in those times, when access to computers wasn’t as universal and most programmers didn’t have their own computers and wrote code by hand.

                                                                                                [1]: This code is from Flask src/flask/app.py file, licensed under BSD-3 clause license.

                                                                              1. 11

                                                                                It would be nice if the ACM could make its library completely open access (i.e even after the pandemic). Is there a reason ACM should charge access fees for its collection?

                                                                                1. 5

                                                                                  The reason they’ve stated in past discussions is that a significant amount of funding comes from institutional ACM DL subscriptions, which pay for the DL and generate some surplus to fund ACM itself. They’re worried everyone will cancel if it becomes open access.

                                                                                  A lot of individual members aren’t very happy about it, and my guess is it’s inevitable that it will eventually become open access. But it will require some changes to the organization’s finances. Either more and/or more expensive individual memberships, or more fundraising from elsewhere (big donations? grants?), or finding a way to cut costs and run the operation on less money.

                                                                                  1. 2

                                                                                    The best information I’ve seen about this is here: https://sigchi.org/2020/01/sigchi-open-access/

                                                                                    The tl;dr is that publication is apparently their most profitable activity, but they use the proceeds from publication to subsidize other activities. So open access comes down to a choice between cutting other activities that aren’t self-sustaining or obtaining other revenue streams.

                                                                                    I think they could probably do a little bit of both. But I’m biased in that I care more about the publications than their other activities and I think open access is the only conscionable path forward, especially for publicly funded research. I hope they go full open access as soon as possible.

                                                                                    1. 2

                                                                                      ACM is talking about the community in its values section. If ACM is truly representative of the community, then they should be relying on the community supporting its activities right (through membership fees)? Rather than forcing a paywall. If the community doesn’t care about their other activities to support them, are they truly representative of the community?

                                                                                1. 18

                                                                                  These are interesting articles, and they highlight real issues about developing software for free software operating systems. However, I strongly disagree with almost all of the conclusions and many of the premises in this blog post series. In particular I think the author very seriously downplays the benefits that distributions bring to the free software world. Independent packaging by distributions is an essential balancing check in the user’s favor in several ways, and adds a lot of trust, robustness, and innovation to the free software world.

                                                                                  Independent packaging means that there are extra, third-party eyes on the source, even if this is often only to a shallow degree. Also, the fact that distributions run their own build infrastructure increases trust in binary distribution. Without trusted third-party machinery building packages from source, there would be much less reason to trust that opaque binaries you download actually correspond to the source you see. Reproducible builds are the logical conclusion of this line of work, and I would stress that the push for reproducible builds was done largely by distributions rather than by application programmers who wanted to target a platform.

                                                                                  The post decries the fact that distribution packagers change upstream defaults, but I want my distribution to patch bad ideas, or even user-hostile anti-features, out of the software they deliver me. An extra layer of vetting and configuring of software is strongly in the interests of end-users.

                                                                                  Additionally, distributions packaging the software in various different environments helps software to be more robust, so that it can more easily be built for new, yet unknown environments in the future rather than being weighed down by deep assumptions about its current target.

                                                                                  Independent packaging by distributions provides different options to different users with different needs. There are good reasons why some people want to use distributions that hold software back for months or years. And there are good reasons why some people use distributions that ship the freshest software.
                                                                                  Importantly, the independence of different OS components to be developed independently and redundantly gives us the freedom to explore new and different directions independently. This unified vision leaves less room for projects like NixOS and Guix packaging, which are a much better, brighter vision of the future than Flatpack, for example.

                                                                                  Ultimately, the author and I clearly value very different things when it comes to computing, and have very different visions of what “healthy” computing platforms look like. But I think he’s really overlooking a lot of strengths that are unique to the free software world of distributions while wishing it were more like the big guys. The big guys have a lot of things we lack, that’s for certain. But the strengths provided by independent distribution, such has auditing, configuring, changing release cycles, etc by third-party packagers working as a agents for the user are strengths that the big players can’t have, because they are some of the very core strengths provided by free software.

                                                                                  1. 1

                                                                                    Thanks for this. Very well said.

                                                                                  1. 2

                                                                                    The main things that I note about this solution are:

                                                                                    1. At each point you run into a left-recursive parse you have to keep recurring until your recursion depth gets to the length limit. So you get an O(input-length) extra complexity factor.
                                                                                    2. You need to know the size of the input. So you can’t use it for streaming parsers.

                                                                                    It’s a cool and simple idea, but unless your inputs will all be short, it’s own it’s not very usable because of the extra O(l) factor.

                                                                                    Shameless self-promotion regarding methods of resolving left-recursion:

                                                                                    I’m actually currently writing a paper about a novel way of handling left-recursion and an expressive parsing system built with it: parsing with delimited continuations. The implementation is here if you’re curious, though it doesn’t have documentation of any kind yet aside from my in-progress academic paper draft. But the basic idea is that when you are about to enter a left-recursive infinite loop… just don’t! Instead, capture the continuation of the current parser, set it aside, and choose a different alternate to work on. Once you have a result from some other alternate, you can feed it to the old continuation you captured and make progress on it. When you get to a point where you can’t make progress that resolves a left-recursion dependency cycle, the system can just feed it a failure result that stops the recursion.

                                                                                    The full system is basically an extended GLL, but instead of the primitives being literals, alternation, and concatenation, the primitives are alternation and procedures that takes the input stream as an argument. Using those you can build up your usual suspects of parser combinators, a BNF interface, etc. But since you can use arbitrary parsing procedures you can have productions that call pre-existing parsers built with other frameworks, dynamically extend your parser at run-time for ad-hoc language composition, filter away ambiguity, do data-dependent and other kinds of non-context-free parsing, etc. It’s basically the bee’s knees for expressive parsing. On the other hand, my implementation is currently, uh, a bit too slow for any practical use. I’m hoping I can optimize it a bit more – with a factor of 10 improvement it’ll be worthwhile for Racket language projects where you care enough about parser composability and extensibility that you don’t mind the parser taking as much time as the macro expander. If I could squeeze a factor of 100 I think it would be an easy sell to the Racket community as a go-to parser for writing new languages, since at that point it would only be a small fraction of the overall compilation time. I think 10 is possible, but I’m not holding my breath for 100 without a major re-design of my implementation.

                                                                                    If you want the full details, you can email me for a (still fairly rough) paper draft. Or you can simply hope it gets accepted for OOPSLA 2020 ;).

                                                                                    1. 2

                                                                                      Thank you for the insights, and indeed, you are right about the O(l) factor, and the fact that this can not be used for a streaming parser.

                                                                                      I would love to get a draft (sent a message to your account with my email).

                                                                                      I had a simple GLL like implementation in Python here. Doesn’t contain any noteworthy idea other than prioritizing the alternatives based on how far along they are in parsing, and how deeply nested they are (so left recursive items have the least priority).

                                                                                    1. 11

                                                                                      This starts to hint towards the idea of Semantic Highlighting. Examples of Semantic highlighting include having each variable as a different color– so if the one you were working with for a while suddenly is a different color, you made a typo somewhere.

                                                                                      Or this blog post, which introduces the idea of coloring scopes differently.

                                                                                      1. 8

                                                                                        JetBrains IDEs (like CLion) color different variables depending on if they’re local or not, and the Rust plugin can be adjusted to color mutable ones different than non-mutable ones.

                                                                                        Rainbow brackets is an amazing tool to have braces and parentheses cycle colors based on scope. The other feature I’ve found incredibly helpful and which I miss now that I’m not on a Mac is the XCode option to dim everything of your current scope.

                                                                                        1. 5

                                                                                          JetBrains IDEs even have the “semantic highlighting” where each variable name gives different colors, so you can more easily track specific variables.

                                                                                          It’s good for tracking variables and catching typos, at least

                                                                                        2. 6

                                                                                          In practice, semantic highlighting via the rainbow-identifiers and rainbow-parens packages in emacs is the best and most useful syntax highlighting scheme I’ve used.

                                                                                          However, I can’t see a discussion about syntax highlighting being backwards and semantic highlighting without remembering some excellent academic work on the subject: WysiScript: Programming via direct syntax highlighting by Gunther and Kell in Sigbovik 2017 (page 123 of the PDF).

                                                                                          1. 4

                                                                                            It’s a nice idea but it’s something I’d want as a mode, not something I’d want all the time. It’s very distracting.

                                                                                          1. 9

                                                                                            To preface, I really care about autonomy, self-direction, and control with my devices. So obviously I’m a big fan of freedom-respecting software. This attitude, uh, colors the following response a bit.

                                                                                            Current

                                                                                            I use an LG G5 with Lineage OS (and no Google anything). I can’t recommend my setup if you need “official” support of anything. But for what it’s worth, it’s the least bad setup I’ve found so far, and I decided it’s sufficiently on-topic to be interesting to some readers.

                                                                                            It’s an old phone, but it’s been my go-to for some time to recommend to others who want something cheap. If you want to use the stock ROM, it’s probably getting old. I worry about the security, the kernel is pretty old. But I also don’t want to ride a $500-1000 upgrade train every year, so for now I just live on the edge like that[1]. But because it has no software controlled by Google and the likes (beyond Google’s inescapable design and steering of Android development), it has no built-in software to spy on me, hijack my brain via the subconcious assault that is advertising, or maximize engagement with irrelevant notifications. But then again, I also use a web browser.

                                                                                            I think its hardware is great. It’s physically a good size, it has enough RAM, a good enough processor, the screen is fine, the cameras are good enough for my purposes, you can (easily) replace the battery, it has an SD card slot that supports the latest standard (which supports up to 2TB, if I recall correctly), and it has a USB type C port that you can use with a dongle to use USB devices (keyboard, mouse, etc), HDMI, etc. I’m sure newer hardware is much better, but honestly the G5 is a great phone.

                                                                                            Software-wise, it’s a disappointment, but I feel that way about all phones available today. It’s no longer officially supported by LineageOS, but a few people have made some updates and it works with the latest version of LineageOS if you build it yourself (or download an image from some forum, if you like to roll the dice that way). A few things seem a little wonky on my device currently, but it’s not so bad[2]. My biggest regret is that HDMI only seems to work on the official ROM.

                                                                                            I use F-droid to get Android apps, and what I really need is there. Open Street Map is good enough for navigation, there are serviceable music players, and there are apps for reading text in any format I need. Any commercial services that require an app I just do without. I don’t feel like I’m missing much so far.

                                                                                            I use a chroot environment with Arch Linux on it to be able to use various other pieces of software I rely on (they aren’t available through Termux, or I might just use it). I like to use a lot of custom scripts and such, and doing that in a chroot environment is really the only reasonable way forward for me right now. The boilerplate and tooling required to make a GUI program for Android is intolerable to me[3], although I would actually like to create some simple GUIs on a phone where I don’t usually care for them on a desktop with a keyboard.

                                                                                            I somewhat recently replaced my old G5 with a new one after I broke the camera glass somehow. I considered researching what newer phone to get, but I decided that getting a cheap G5 on ebay was the simplest thing to do, as I knew what I would be getting (eg. that it’s rootable and how to root it), etc. My (hopeful) intention is to hold out with my current phone for a year or two and then get a Librem phone or Pinephone with sufficiently working software.

                                                                                            Future, hopefully

                                                                                            This wasn’t really requested, but I decided to write about it anyway.

                                                                                            Ultimately I want to run the same OS (NixOS, for the foreseeable future) on all my computers. I want to be able to easily write custom software (“scripts”, mostly) that will work on all my computers, using whatever language I want. And I want to have all of my custom configuration, including the list of installed programs, checked into a git repository and easily, automatically reproducible on a new machine. I have all of this already on all of my computers but my phone, and I find it, uh, incredibly frustrating that I can’t do it on my phone. Particularly, I see the smartphone as the most personal computer yet, and for most people it’s their primary computing platform, and by far the most important in many ways.

                                                                                            Once I can use my phone as a first class computer, I intend to start using my phone as my primary “human interface” computer. In other words, not only will it be my go-to device that fits in my pocket and goes everywhere with me, but I’ll dock it to laptop shell or desktop for serious work. Modern smartphones are sufficiently powerful for the majority of my computing use cases, and ssh to a more powerful remote server is always there.

                                                                                            Laptop shells like I want are not currently on the market as far as I’m aware. However, all the tech is there to build a DIY version, and I want a good ergonomic keyboard anyway. So I’ll probably find a way to strap a USB-powered monitor to a folding arm attached to a Kinesis keyboard with USB dongle and battery packs attached. I might use the phone display as a trackpad, mounted in the middle space of the kinesis.

                                                                                            Docking as a desktop already works quite well. Once a couple years ago my laptop SSD died, and I decided to give my phone a try docked as a desktop to do a day’s work while waiting for the replacement. Android is a terrible OS to try to do desktop work, but the hardware was sufficiently capable. More specifically, using local files or compilation was a little slow (file access was slow but not too bad, compilation was a little painful), but for just writing or working remotely it was great.

                                                                                            [1] - I’m a grad student, so I’m not rolling in money like many professional programmers are. And also on principle I don’t want to pay hundreds to thousands of dollars for hardware that’s locked down to (mostly proprietary) garbage software[1.5] and that’s built and “maintained” for intentional, rapid obsolescence. A lightly used G5 sells on Ebay these days for $60 or so.

                                                                                            [1.5] - I’ve already toned down what I’m calling the software available on phones a few times. To be more fair, there are various measures by which a lot of today’s phone software is good. But I hate it.

                                                                                            [2] - OK, most people would probably not accept the level of wonkiness. Brightness detection doesn’t seem to work, which is mostly annoying when the touch screen doesn’t turn off when I push the phone to my face while on a call, but I’ve learned to turn the screen off while taking calls. Auto-rotation doesn’t work. But it gave me an excuse to write manual rotation scripts, which I actually wanted anyway because auto-rotation never does what I want. And HDMI doesn’t work, but I suspect it’ll never work on anything but the official ROM due to some driver nonsense that nobody will ever bother to reverse-engineer. As far as I can tell, everything else works fine.

                                                                                            [3] - I’ve created a couple of Android packages, and I never want to do it again. Or even update and re-build the ones I have made. Building software for smartphones is just insanity. I routinely write software for my other computers by writing one single file, and maybe compiling it. Building anything for Android is a relatively monumental task.

                                                                                            1. 2

                                                                                              So is there really no higher level semantic markup that compiles to TeX? That seems like the obvious solution here.

                                                                                              You can reuse the pretty-printing while also giving other tools a chance.

                                                                                              1. 3

                                                                                                Racket’s Scribble documentation system can be used in this way. Scribble can compile to TeX or HTML (and generally it’s possible to add other backends). The article is primarily about math notation, which Scribble doesn’t really do well (at least not out of the box). But Scribble does understand the structure of Racket code, and automatically generates hyperlinks in code samples. It is fully programmable (with a sane programming language, unlike LaTeX), and you can do things like literate programming with it. Someone motivated could definitely write a module that lets you write math in a reasonable manner that can be manipulated in a structured way (including compiled or interpreted) and also typeset by generating the appropriate TeX.

                                                                                                1. 1

                                                                                                  the link seems to point to this lobste.rs thread, what Scribble article did you mean to reference? thx

                                                                                                  1. 3

                                                                                                    Huh, I must have messed up my copy/paste without noticing. I meant to reference the scribble docs: https://docs.racket-lang.org/scribble/index.html

                                                                                                2. 1

                                                                                                  Pandoc allows you to convert other things to TeX but I haven’t given it a shot on anything really.

                                                                                                  1. 1

                                                                                                    Pandoc works well. The only slight friction is that if you need to embed custom LaTeX in the document then there’s no particularly nice way that I’ve found of still having the document produce good HTML (and vice-versa if you need to embed custom HTML for the HTML view).

                                                                                                  1. 11

                                                                                                    Shameless self-promotion: I use Rash, of course!

                                                                                                    It has all the power of Racket, and you can write normal (or abnormal) Racket code mixed with shell-style code. It has a line-oriented syntax that is nice for interactions (no superfluous parens while interacting with your shell), but you can drop s-expressions in anywhere, and also escape back into line-mode. Just like with Bash and friends you can copy/paste your interactions into a file to instantly have a script. But as you edit your script and generalize it, you have a REAL programming language with data structures, sane error handling, libraries, etc.

                                                                                                    Interactive completion leaves a lot to be desired, though. Some day I’ll improve that.

                                                                                                    1. 1
                                                                                                      >>> from math import pi
                                                                                                      >>> pi
                                                                                                      3.141592653589793
                                                                                                      

                                                                                                      Something doesn’t seem quite right when variables are evaluated as-is without any sigil prefix (e.g. $pi). Seems like you would often accidentally reference variables and methods. If I have an executable in my $PATH at /usr/bin/pi, what happens (or unexpectedly doesn’t happen)?

                                                                                                      1. 3

                                                                                                        Well, you just monkeypatched your pi (-:

                                                                                                        >>> from math import pi as ls
                                                                                                        >>> ls
                                                                                                        3.141592653589793
                                                                                                        >>>
                                                                                                        
                                                                                                        1. 6

                                                                                                          To take it a step further, we see what gives me an uneasy feelings about these kinds of shells:

                                                                                                          >>> which ls
                                                                                                          /nix/store/x6a3r9rsazlildaxzqay73scy6nv1inz-coreutils-8.31/bin/ls
                                                                                                          >>> ls
                                                                                                          3.141592653589793
                                                                                                          

                                                                                                          (Not disparaging the idea; it’s interesting—I can just never keep myself using these things because “I need which to not lie to me”, etc.)

                                                                                                          1. 2

                                                                                                            For what it’s worth, the ambiguity isn’t necessarily an inherent problem with shells as DSLs in a general-purpose language. In Rash (my shell in Racket) there are different pipeline operators that are explicit about taking shell commands or functions – eg. the | operator takes on its right-hand-side a shell command (or user-defined alias), while |> and |>> take Racket functions. Of course they’re all just macros, so you can define custom ones with any behavior, including convenient but ambiguous versions.

                                                                                                            1. 2

                                                                                                              Yes :-) I guess this to some extent is a matter of taste. In the case of daudin it’s just inheriting the behaviour of Python and I wanted to keep as close to the language as possible (the %cd etc commands make me cringe a bit, but they’re all easily accessible via Python). You can do all sorts of alarming things in Python if you want or need to, or unintentionally (e.g., if you use id as a variable name you lose access to the built-in function of that name). Strictly speaking it’s a potential minefield, but in practice (at least for me, and I think that goes for many/most professional Python programmers) it’s not an issue. I guess one could replace things like sys with versions that raise exceptions if you try to modify their attributes, but AFAIK no one does. BTW, I have a similar feeling about Python’s numpy package. It feels totally cryptic and magic because it hides a lot of functionality behind what looks like regular Python operators. In general I like Python’s approach to “magic” methods, but it always feels to me like numpy has pushed it too far (but I’m an irregular user of numpy and suppose I would quickly have a different feeling if I used it more).

                                                                                                              Sorry for so many words!

                                                                                                        1. 2

                                                                                                          While I haven’t yet gotten around to implementing the same features in my rash prompt, in Zsh I have a somewhat unique prompt.

                                                                                                          The most unique feature is path coloring based on ownership and permissions. In general my philosophy on prompts is that I want all info displayed that may be relevant. To that end, I display git info when in a git repo (branch name with options for coloring the name based on regexp, whether or not there are uncommitted changes or changes in submodules, and the number of commits ahead/behind upstream), I check environment variables and display the hostname when inside ssh or tmux, I check my username and display it if it is not the username I usually use, I display the return code of the previous command if it was not 0, etc.

                                                                                                          So far for Rash I’ve really only implemented getting git info (which is frequently the info I care most about), but I plan to add libraries to make a lot more situational info available, with timeouts everywhere so my shell doesn’t lag in big repositories or remote file systems.

                                                                                                          1. 1

                                                                                                            I really wish there would be a generic scheme R7RS implementation, instead of depending on Racket.

                                                                                                            1. 1

                                                                                                              Have you looked at Chibi? It’s pretty good, and has a decent amount of useful libraries bundled.

                                                                                                              1. 1

                                                                                                                but why not depend on racket? after all, the primary reason racket broke away from its scheme roots was they felt they could design a better language as an evolution of scheme.

                                                                                                                1. 2

                                                                                                                  The latest release is discouraging to me: “the use of single-flonums appears to be rare”.

                                                                                                                2. 1

                                                                                                                  Well there’s Chez Scheme, which the latest version of racket is built on. I believe there’s an R7RS implementation floating around for it.

                                                                                                                  1. 1

                                                                                                                    I guess the question is really about how much of the code depends on some Racket specific features.

                                                                                                                    1. 4

                                                                                                                      Author here. Rash does rely on several Racket-specific features. I don’t recall any standard Scheme way to launch subprocesses (though I’ll be happy to be proven wrong). Supposing there is one you could make a library to do the subprocess pipeline at the core of Rash. This allows things like (run-subprocess-pipeline '(ls -l) '(grep foo)). But the pipeline operator macro stuff and the line macro stuff all use Racket macro magic.

                                                                                                                      • Most notably it needs syntax-local-value for macros to communicate (see “Macros that Work Together”.
                                                                                                                      • Rash relies heavily on syntax-parse, which is perhaps not strictly necessary but it would be a frustrating pain to implement directly with syntax-case.
                                                                                                                      • Also Rash uses things like syntax parameters, and some pipeline operators use local-expand.
                                                                                                                      • One fork of the repo (not yet in master) includes binding constructs you can use in the middle of a pipeline which require new macro facilities (I’m waiting for the APIs around it to stabilize before I put it in master).
                                                                                                                      • And of course Rash uses Racket’s #lang facility.

                                                                                                                      Standard scheme is great, but a Rash-like language in standard Scheme would be a lot more work and would lack a lot of Rash’s features. Racket’s macro system (IE the hygiene algorithm, the module system, syntax-local-value, syntax-parse, #lang, local-expand, etc, etc) really makes more ambitious DSLs possible, realistic, and interoperable.

                                                                                                                      1. 1

                                                                                                                        I’m not sure how widely it’s implemented but there is SRFI-170 which aims to provide POSIX compatibility, which includes process spawning, but it doesn’t seem to specify the behavior of pipes, and comments on the difficulty of file descriptors vs ports. That’s likely the non-portable angle here.

                                                                                                                        However, a truly motivated individual could use cond-expand, from R7RS, to craft an implementation of pipes/ports for each supported implementation. But … based on the little I’ve followed about Rash, it’s easy to underestimate the amount of work to get this right, I think.

                                                                                                                1. 5

                                                                                                                  the language must avoid making it too easy to “silo” into a custom DSL that other programmers must learn to work on your program

                                                                                                                  Why? Why are DSLs such a bad thing? Don’t they help mitigate the “feature-grafting” you were bemoaning?

                                                                                                                  This includes a good debugger (bonus points for what Common Lisp does by not unwinding the stack!)

                                                                                                                  Anyone got a link? I’ve never heard about this before.

                                                                                                                  The inclusion of some kind of Web framework (similar to net/http in scale and simplicity) and image processing in the standard library would help with initial adoption.

                                                                                                                  This reminds me of Racket!

                                                                                                                  1. 4

                                                                                                                    I clarified my point regarding DSLs a little elsewhere in this comment thread. Yes, they do help prevent “bolt-on syndrome,” but at the same time they decrease interoperability throughout the stack by encouraging writing towers of incompatible abstractions. I think it hurts CL’s library ecosystem (but that could also have to do with its lack of “leadership,” which isn’t an issue for Racket or CHICKEN).

                                                                                                                    From A Road to Common Lisp:

                                                                                                                    In Common Lisp you can certainly choose to panic on or ignore errors, but there’s a better way to work. When an error is signaled in Common Lisp, it doesn’t unwind the stack. The Lisp process will pause execution at that point and open a window in your editor showing you the stack trace. Your warrior’s sword is hovering over the monster, waiting for you. At this point you can communicate with the running process at the REPL to see what’s going on. You can examine variables in the stack, or even run any arbitrary code you want.

                                                                                                                    Once you figure out the problem (“Oh, I see, the calculate-armor-percentage function returns 0 if a shielding spell ran out during the same frame”) you can fix the code, recompile the problematic function, and restart the execution of that function (or any other one!) in the call stack! Your warrior’s sword lands, and you move back to what you were doing before.

                                                                                                                    You don’t have to track down the bug from just a stack trace, like a detective trying to piece together what happened by the blood stains on the wall. You can examine the crime as it’s happening and intervene to save the victim. It’s like if you could run your code in a debugger with a breakpoint at every single line that only activates if something goes wrong!

                                                                                                                    1. 2

                                                                                                                      In Common Lisp you can certainly choose to panic on or ignore errors, but there’s a better way to work. When an error is signaled in Common Lisp, it doesn’t unwind the stack.

                                                                                                                      I’ve read about this before and understand why it would be a great feature in production software – almost like a superpower. What I don’t understand is why such power was not enough for everyone in the world to have switched to CL immediately, and also why none (afaik) of the languages created after CL replicated that feature. Is the feature very expensive or complex? Does it make error handling needlessly complicated for the common case?

                                                                                                                      1. 2

                                                                                                                        What I don’t understand is why such power was not enough for everyone in the world to have switched to CL immediately, and also why none (afaik) of the languages created after CL replicated that feature.

                                                                                                                        I think the short version is that it requires features which Lisp has but other languages don’t: dynamic scope, to enable different functions to mutate the context; closures, which were uncommon decades ago; macros, to make the whole thing legible; a runtime.

                                                                                                                        Does it make error handling needlessly complicated for the common case?

                                                                                                                        Well, I think that this (from the HyperSpec) is pretty legible:

                                                                                                                        (handler-case (read s)
                                                                                                                          (end-of-file (c)
                                                                                                                            (format nil "~&End of file on ~S." (stream-error-stream c)))))
                                                                                                                        

                                                                                                                        HANDLER-CASE executes one statement, returning its value if none of the indicated errors are signaled, or the value of the error handler if they are; any other errors bubble up as normal. This is a pretty good analogue for:

                                                                                                                          try:
                                                                                                                            s.read()
                                                                                                                          except EndOfFile, e:
                                                                                                                            print "End of file on %s" % e.stream
                                                                                                                        

                                                                                                                        Edit: part of what I’m getting at above is that this is a pretty clear case of the Blub Paradox. Folks who don’t understand dynamic scope don’t understand why they would want it, and don’t appreciate how it makes things like error handling better — to the point that they fail to consider a useful feature useful.

                                                                                                                        1. 1

                                                                                                                          APL keeps the )SI stack and you can inspect variables and make changes. In general I guess the cost of doing that is probably too high, such that dumping a core file with the help of the OS became the common thing.

                                                                                                                        2. 2

                                                                                                                          I think there are certainly cultural issues or “lack of leadership” that cause incompatibilities like using different data representations, or having poor or misleading documentation, (which both exist at the level of functions as well as the level of macros), but I think there are also technical hurdles and abstraction models that were not yet solved by the time Common Lisp’s macro system (its means of syntactic abstraction) were designed. If you compare Common Lisp and Racket, for example, Racket has huge advantages – including a hygienic macro system, a DSL for creating robust macros with good error reporting (syntax/parse), higher-order contracts to enforce invariants across boundaries, etc – that make its DSLs and abstractions much more robust and interoperable. I think there is still a lot of room to improve here, but I think Racket has already demonstrated that large-scale language-oriented programming can be interoperable, understandable, well documented, and empowering. I think powerful DSL programming has a promising future (inside and outside of Racket), and shouldn’t be ruled out due to poor results from early attempts.

                                                                                                                          1. 1

                                                                                                                            My personal view on DSLs is that they’re great for writing new applications in and not so great if people start writing libraries in them. Racket’s approach of explicitly forking off DSLs using #lang is a fairly good way of helping devs realize how big a step they’re making when they create a DSL.

                                                                                                                            Out of curiosity, does “DSL” mean the same thing to CL devs that it does to everybody else? Or does it mean something different in the context of that particular Lisp?

                                                                                                                            Also, that is an excellent feature, but… is that the default behavior for any & all CL application? Because I can see plenty of cases where opening a debugger opens you up to way more vulnerabilities than stopping everything and printing an error. Useful for development but a liability for release, IMO.

                                                                                                                            1. 2

                                                                                                                              My gut wanted to say “yes” for a moment, until I decided that I’m not at all sure what “everybody” thinks the term “DSL” means. I think it’s a very fuzzy term with many interpretations – I’ve certainly seen a lot of people disagree on or be unsure about what it means.

                                                                                                                              My guess is that CL people view DSLs roughly similar to how people in other Lisp communities view them. But there’s a lot of variance among different people I’ve talked to.

                                                                                                                              My view on what a DSL is is mostly flavored by Racket. I think there is a pretty broad spectrum of what might be considered a DSL. A simple library of functions for a particular domain may be considered a DSL – it may not change the semantics of function evaluation or anything, but it gives you a vocabulary to talk about that domain. On the heavier end you have external DSLs or heavyweight Racket #langs – full of weird syntax, weird semantics, possibly difficult to interoperate with. But I think most Lispers think primarily of DSLs somewhere in the middle – a macro (or set of macros) that compiles a custom form (tailored to some domain, even a micro-domain like “the domain of loops” or “the domain of pattern matching”) down to the base language, maybe using specialized functions built to implement the macro’s run-time semantics. This covers a broad spectum from CL’s loop to Racket’s match to Minikanren – all have semantics specialized to some degree, but all are able to be embedded and interoperate cleanly with the host language.

                                                                                                                              But I think a lot of people have other views on what usually makes a DSL – Python or Ruby people might think about dynamic class-oriented metaprogramming, such as introspecting on a class and stuffing it with a bunch of implicit methods, instead of static macro metaprogramming (which is what I tend to do and think about), some think about eval based approaches, and others may only think of external DSLs like SQL or Bash. One’s opinion of what makes a DSL, how much of a big step it is to make or use one, and how well it interoperates may vary a lot based on what kind of DSLs you’re used to.

                                                                                                                              Do others think there is broad consensus on what a DSL is? Because I think I see several camps who largely mean different things by the term. I certainly have a definition that I give people which seems to be largely shared especially by the Racket community, but most people I run into outside of Racket and other Lisps tend to have very different views.

                                                                                                                              1. 2

                                                                                                                                From my understanding the “lighter” ones are known as eDSLs (embedded DSLs) and characterised by being hosted entirely within the source language. Some Haskell libraries (the kind that live within their own types) call themselves eDSLs, and I would call Ruby blocks eDSLs as well. “DSL” includes everything else – they may or may not have their own character-level parsers. So the main difference seems to be one of implementation rather than the “size” of the domain involved.

                                                                                                                                At least, that’s the primary distinction I’ve seen, but it definitely gets fuzzy. Is Awk a DSL? Are Racket #langs all DSLs? (edit: yes and no, I reckon)

                                                                                                                                I don’t think CLers make a distinction between libraries and (e)DSLs because the act of writing a program is thought of as defining and using a problem-specific DSL.

                                                                                                                        1. 2

                                                                                                                          I was interested in using some minimal language like Go as a compilation target for other more “DSL” focused languages, curious to hear your thoughts on it OP - http://andrewjudson.com/programming/2019/04/19/read-only.html

                                                                                                                          1. 5

                                                                                                                            Interesting! I think that sounds like a very good idea. I’ve been interested in writing an S-expression syntax for Go for a while now, in a similar vein to Hy for Python and Fennel for Lua, because Go gets a lot of the work of writing a Lisp out of the way upfront (GC, type system, etc).

                                                                                                                            I wonder if this might work as a technique for quickly generating a Go codebase, which could then be turned into the canonical project so as to avoid having two codebases (a one-way operation). I know this isn’t exactly what your post proposed, but it would alleviate the concern of having to maintain two distinct codebases. Debugging might be easier that way, too.

                                                                                                                            Thank you for sharing! (And I also had no idea how popular the Minima theme for Jekyll was—I’ve now made some changes to my own CSS to spice mine up)

                                                                                                                            1. 2

                                                                                                                              It wouldn’t be the first time a Lisp has been used that way. The .NET garbage collector was prototyped in Common Lisp, with an aim to cross-compile it into C++

                                                                                                                            2. 3

                                                                                                                              I think the core issue here is more that we haven’t come as far in terms of having a philosophy of good design for macros and DSLs as we have for functions. All of these problems exist at the function level as well – do I need to hunt down and deeply understand every function call that I read in source code to understand a program? No, I can frequently elide reading function definitions because we have established conventions around function design, naming, documentation, etc that help us sufficiently understand a functional abstraction without worrying about its implementation details. In a similar way to your proposal, we could say we want to inline all function calls or see the resulting assembly code to be able to read code that uses function abstractions – and sometimes we need to do that to understand poorly written code! But the conventions we have established make this less necessary. I think with better tooling, more experience and communication about good practices, etc, we can achieve this with syntactic abstractions and DSLs as well.

                                                                                                                            1. 7

                                                                                                                              Nice, there’s also closh which is a shell based on Clojure that compiles with Graal.

                                                                                                                              1. 10

                                                                                                                                In addition to Closh, which looks pretty similar to Janetsh (both are really cool!), I’d like to recommend checking out two more related projects. First is my own project, Rash, embedded in Racket. Of the shells embedded in general-purpose languages, I think Rash has the best story for extensibility of the shell language and integration of subprocesses and host-language functions (using byte streams in the Unix tradition or host-language objects more in the vein of Powershell). The second is Xonsh, embedded in Python, which I think has the most interactive polish of these embedded shells at the moment.

                                                                                                                                1. 7

                                                                                                                                  I was indeed inspired by rash, thank you! Rash is a fantastic project.

                                                                                                                                  The two main factors in making this:

                                                                                                                                  My first lisp was clojure and I got used to the syntax, which made me dislike scheme, as petty as that sounds…

                                                                                                                                  Second thing was that launch time of small scripts was one of my concerns which racket wasn’t so good at. I admit for interactive shells and some programs it isn’t a big deal. Barring those two things, rash is a far more mature, better tool to use.

                                                                                                                                  closh itself is hundreds of megabytes to install and starts extremely slowly (multi second, just like clojure), so wasn’t something I could use.

                                                                                                                                  1. 3

                                                                                                                                    Yeah, Racket’s startup time is not great. It’s one of my least favorite things about Rash…

                                                                                                                                    I see you got job control working, and I’ll probably copy whatever you did. I tried for a day or two to get it working some time ago but didn’t find quite the right incantation of syscalls to get it to work right, so I set it aside and never got back to it.

                                                                                                                                    If you ever want to talk shop about embedded lisp shells I’m always interested to see or show cool ideas and features. The Closh author and I have talked a bit, he’s a nice guy. Also more friendly competition in the lisp shell world might motivate me to spend more time improving my own shell…

                                                                                                                                    At any rate, I look forward to seeing what you do with Janetsh.

                                                                                                                                    1. 4

                                                                                                                                      I would not have been able to make it work except for reading:

                                                                                                                                      https://www.gnu.org/software/libc/manual/html_node/Implementing-a-Shell.html

                                                                                                                                      It really is arcane stuff and I did not understand it before starting, there are probably still bugs and lots of edge cases to consider.

                                                                                                                                      1. 2

                                                                                                                                        Yeah job control is pretty ugly. There is a quote in APUE by Stevens about it, something to effect that it’s a bunch of hacks involving both the kernel, the shell, and the terminal driver… but you have to have it because it’s in POSIX :-(


                                                                                                                                        BTW I have also prototyped a non-invasive solution to the VM startup time problem, e.g. for Clojure and Rash.

                                                                                                                                        https://www.oilshell.org/blog/2018/12/05.html#toc_2

                                                                                                                                        https://github.com/oilshell/shell-protocols/tree/master/coprocess

                                                                                                                                        It would make batch invocation of such shells much faster. Basically the idea is to transparently turn a process into a coprocess (single-threaded server).

                                                                                                                                        You have a little driver that takes argv, env, and returns a status. It behaves like the original process. But then it uses some file descriptor tricks to proxy the coprocess. So it can be like a “drop-in” replacement for any tool that starts slowly.

                                                                                                                                        If you’re interested in prototyping it in Rash or any program, let me know! It still needs implementations to work out the details, although I believe the prototype proves it’s feasible.

                                                                                                                                        1. 1

                                                                                                                                          Well, at least signals and waitpid do have the beauty that they let you do a lot things with a single thread.

                                                                                                                                          My criticism of coprocess launchers is that we know it is possible to implement fast starting VMs (like janet), and adding things to the tower slows them down in the long term.

                                                                                                                                1. 1

                                                                                                                                  I think there are a lot of ideas mixed up in here. Dynamic typing beyond “everything is a string” and commands that can operate sensibly on such types, commands that can communicate to tooling about what types they need for completion, static typing, and (at least I see it subtextually) the tension in needs between interactive shell use, simple scripts, and larger scripts that want to turn into more mature “programs”. Different projects have worked on different areas of this. For instance, Powershell fixes the stringly-typed problems (Powershell may have its own issues, but I think the idea of using objects other than just strings is an important step forward). I think one of the most interesting projects with static typing for shells is Caml-Shcaml. And (Shameless self-promotion) I have a project Rash that aims to solve the issues of script growth.

                                                                                                                                  Ultimately I think the direction shells ought to go is to be swallowed into general-purpose programming languages and become extension DSLs. When you start talking about a shell using “real” data types, having “real” error handling, having components that can communicate reasonably, having “real” type guarantees, etc, you wander into the realm of needing a “real” programming language. Naturally I’m inclined to think that my project is in many ways the best incarnation of this so far, but there are many projects pushing this direction (with various different strengths), like Xonsh, Closh, eshell, a new one posted on Lobsters today called Janetsh, and several others.

                                                                                                                                  By being embedded in a full, general-purpose programming language, a shell language can inherit all of the goodies of its host – libraries of functions, data structures, error handling, etc. But the embedded shell itself can stay a small DSL that focuses just on things like process and function pipelining, interactive syntax, etc. Granted, extra stuff for nice interactive sessions make this a much bigger project (eg. Rash does not yet have good completion, live syntax highlighting, etc), but it also provides a sane base (the presumably sane host language) to build these things and their extensions on. That’s a big step up compared to the insanity of bash and zsh customization which has to be done in their respective languages.