Wishing jcowan all the best. He was fantastic and we appreciate his work enormously!
I must say I’m very upset with the scheme standardization process, I think we failed the community. Interoperability was never encoded into the spec and was never achieved. Some implements are actively against interoperability, which I believe is a very harmful mindset for scheme community and impacts the scheme language negatively.
Each version of the Scheme standard describes a family of languages with features and semantics that may or may not be present in a given implementation. So it’s hard to write portable Scheme code.
For example, in R5RS and in R7RS-small, a Scheme implementation that implements the full standard will have integers of unlimited size, floats, complex numbers and rational numbers. There are distinct syntaxes for integer, rational, float and complex literals. But the only numeric type you must implement are fixed size integers with a range large enough to describe the size of an arbitrary string or array. You don’t have to implement the full syntax for numeric literals.
The R6RS standard, which came in between R5RS and R7RS-small, mandated that you implement the full numeric tower, and had a lot of extra stuff that made Scheme much larger and more complicated than before. There was a rebellion against R6RS by Scheme implementors. So the steering committee decided to split the next Scheme standard into ‘small’ and ‘large’ variants. The R7RS-small standard was ratified in 2013 (the process started in 2009). This letter is about the collapse of the current R7RS-large standards process.
If an implementation encounters an exact numerical constant that it cannot represent as an exact number, then it may either report a violation of an implementation restriction or it may silently represent the constant by an inexact number.
Clojure has EDN but it’s similarly poorly spec’d, and not interoperable with other Lisps - https://github.com/edn-format/edn (maybe not poorly spec’d, but does not scream “friendly to implement”)
I think Lisps do not like to interoperate with other Lisps – it’s easier just to write the same code in your own Lisp (classic curse of Lisp)
Meanwhile Python and Ruby can talk to JavaScript and C++, so Unix “won” in that sense
And I also have to say that after writing a GC and looking at many Lisp GCs, it’s shocking that they don’t use serialization and processes more, because that’s the easiest and best way to ameliorate the inherent serial nature of GC – the easiest way to actually use all your cores, to use an entire machine
i.e. my conjecture is that basically any Lisp that doesn’t run on the JVM probably has contention problems … and I’m not sure if even the JVM GC can scale to 64 or 128 cores linearly
Julia is a Lisp, and very fast, but last time I checked the GC was not as strong (which is understandable because a good parallel GC is a 10 year long research project for a talented team)
I’m not going to address the majority of your comment, because I’ve spent enough time unfruitfully arguing with you about these issues, but:
inherent serial nature of GC
Due to pointer doubling, a work-inefficient parallel gc algorithm can have span in the log of the span of the live portion of the heap (which is bounded, in truly pathological cases, by the size of the heap)—so there is no want for scalability. More pertinently, a work-efficient parallel gc can have span linear in the span of the live portion of the heap; then, if you segment your heap the way you would be forced to anyway given multiple processes, the span is the same. So I’m not really sure what you’re trying to say here.
A parallel gc was recently written for sbcl. It seems to scale well. It has some minor problems that might prevent it from scaling to 64 or 128 cores, but they would not be particularly difficult to overcome.
I’m not sure why you single out gc. Most lisp compilers don’t generate particularly good code either—it may not be ‘scalability’, but large single-digit factors frequently lie there. Ultimately, attaining good performance will in general require knowledge, interest, and time, and it is rare that someone has all three.
Not sure what you mean by the pointer doubling, but yeah “inherently serial” may be a bit strong. I’m more saying that scalable and parallel GC is hard in multiple ways – GC concurrent with mutator, and even if you stop the world, parallel marking or sweeping
I guess the short version of what I’m saying is that Lisp didn’t achieve the scalability of Erlang, which at least on the face of it might have been a natural niche. (If I google “Parallel Lisp” there is lots of work on it 40 years ago, before Erlang even existed)
But maybe not on closer examination, because Lisp isn’t purely functional, while Erlang is.
Lisp implementations apparently never embraced “shared nothing” – either with a VM that copies data between tasks like Erlang, or by Unix-style serialization between processes
Pointer doubling is https://www.youtube.com/watch?v=33ZrIt-iGM4&t=1235s. A largely hypothetical concern, since you don’t actually have pathological heaps, but it demonstrates that tracing gc is asymptotically scalable even on pathological heaps.
Concurrent gc is about latency, not scalability, so beside this particular point.
I flipflop on shared-memory. But ultimately, although the fine-grained-concurrent discipline is complex and difficult, it enables the creation of very high-performance concurrent data structures. It is not a monotonic issue. But fundamentally, lisp belongs to (or, really, originates) the big-mutable-pointer-soup programming discipline, on a single thread, so it seems more coherent to me to preserve that in a move to multiple threads. You need fine-grained-concurrent data structures to implement those nice shared-nothing actor models anyway. And the BEAM has some problems with this, so I hear: nominally, large objects cannot be shared, so if you want to send a large object to another process and not suffer poor performance due to copying, you must laboriously use a separate out-of-band channel for it (c.f. https://gbracha.blogspot.com/2014/09/a-domain-of-shadows.html); also, sending a message to a process locks that process, and hence message sending is not scalable (admittedly, I don’t know of any really scalable mpsc queues satisfying the ordering properties of erlang except for an unpublished algorithm by yours truly).
I’m not sure what it means for a language to be scalable, but implementations of lisp admit the creation of applications that scale to large numbers of threads and cores, and parallel gc (such as the one I linked above) ensures that gc is not a bottleneck in the case of an application whose allocation rate is proportional to the amount of work it does.
I guess the short version of what I’m saying is that Lisp didn’t achieve the scalability of Erlang, which at least on the face of it might have been a natural niche. (If I google “Parallel Lisp” there is lots of work on it 40 years ago, before Erlang even existed)
Many of the Erlang VM design decisions require a functional language. Here’s a great HN comment that explains this in detail. It’s not an approach that works for other languages.
Yeah that makes a lot of sense - you start with the constraints of the Erlang runtime, and design a language around that. I’m not a huge fan of the language constraints that reduction scheduling miplies, but it’s a principled choice.
When introducing Mojo, Chris Lattner said that “MLIR needed a syntax” – the same logic applies there. MLIR is a huge amount of work, and giving it some Python-like syntax isn’t that much work comparatively (although Mojo in totality is an absolutely huge, big ambitious language)
And for Oils our runtime for statically typed Python is starting to be “worthwhile”. If we can do a few more optimizations, it might be better than other options for string/graph workloads, and could deserve a syntax …
You’re not wrong about the serialization format, unfortunately. I have a feeling this has to do with reader extensions (as well as built-in nonstandard read syntax), so if you were to standardize some sort of s-expression format, it’d have to use a completely separate reader, so it’s not as “obvious” to do that. Essentially like JSON started out as “just call eval on your string to get the JS object” and later refined to “actually, that’s a terrible idea, always use JSON.parse”. It took a few years for people to get the overall message, IIRC.
Regarding GC, there’s at least Cyclone which supports a threaded Cheney on the MTA. It’s still on my TODO list to take a closer look at its implementation.
Preserves (https://preserves.dev/) is better specced than JSON, while also providing Lisp-friendly symbols & records. I’d recommend it over EDN by default.
Weird that it has both records and dictionaries: records allow duplicate keys but dictionaries do not. Records do not specify which key takes priority when there are duplicates, which is a classic blunder.
Both records and dictionaries allow arbitrary values as keys, including compound values, which imposes challenging requirements on every implementation.
Strings are specified as sequences of unicode codepoints, but the textual syntax only allows for hex escapes in the BMP, and it doesn’t specify what to do about invalid surrogates.
How do records allow duplicate keys? They’re positional.
What are the challenging requirements for implementations of value-keyed dictionaries? Even Python manages it without much fuss, and languages like Java, Racket, Smalltalk and Rust without any fuss at all.
The string syntax underspecification is a bug. Thank you. I’ll amend the spec. The intent was to be compatible with JSON text string syntax, despite having an actual semantics.
Oh, sorry, I misread the spec: I thought it said records are a tuple of labelled values, rather than a labelled tuple of values. In which case, I wonder what is the difference between a record and a non-empty sequence. I can see there might be different API affordances, but the semantics don’t say what the differences should be.
For dictionaries, it is very common for languages to stringify keys. Awk, JavaScript, Perl, PHP - you mention Python as not needing much fuss, but my point is that you can’t use normal Python dictionaries because they only support string keys. It also means you can’t easily round-trip via other data formats that only support atomic dictionary keys.
Different API affordances, indeed. Good point on the semantics not properly motivating records; the idea is that they should serve as a labelled product (as in, sum-of-labelled-products). Compare the conventions overlaid on SPKI Sexps, where the first element of a list is required to be a symbol-like string. One could, like SPKI, encode this via lists, but then lists blur together with labelled products. Same goes for dictionaries etc.
Normal python dictionaries support arbitrary immutable (ish!) keys (e.g. {(1, 2, 3): (4, 5)}). I wouldn’t recommend following the chain of reasoning involved in the design being the way it is, it made my head hurt and caused me to despair a little about the state of humanity, but in a nutshell you can stick anything in there that’s hashable, because in python hashability and immutability are conflated, in this area.
Roundtripping isn’t a major concern: in general roundtripping of data formats via other data formats can’t be done. There’s usually a semantic mismatch there. Roundtripping of subsets, though, that is a fine idea, and it works well with Preserves; for example, you can stick to the subset of Preserves that matches JSON (when expressed as text), and interop that way.
For stringly-keyed dictionary languages, like JavaScript, I’ve gone with the approach of reusing the canonical serialization of a value. It works surprisingly well. (If it were lazily produced, it’d perhaps even be an asymptotically efficient way of working…)
How do you handle the fact that Python dictionaries are mutable, so they can’t be used as keys in dictionaries? I think that implies you can’t use them as is to implement Preserves dictionaries.
In JavaScript, if I serialize { "x": 1 } it sounds like that would deserialize as { "\"x\"": 1 }?
Good catch on the underspecification! I think that’s considered a spec bug.
As for records, they’re more alternatives to sequences (arrays), much like how symbols are alternatives to strings. In the binary encoding, symbols are literally just strings a different tag integer, and records are sequences with 1 required element. Both symbols and records exist for the common case of disambiguating special data constructors from your common arbitrary strings and arrays. The initial element of a record being the label is standard convention, and useful for streaming parsers / deserializers.
Dictionaries requiring arbitrary Values as keys is probably the most demanding part of the Preserves data semantics, after bytestrings. Everything else is pretty standard.
Well I think it’s safe to say that EDN is “for” Clojure – I’m not aware of it being used outside Clojure … It’s explicitly a subset of the Clojure language, and it’s not a subset of any Scheme or Lisp (due to maps and vector syntax, but not just that)
Basically every language user is equally unhappy with JSON, and that’s a feature, not a bug. A polyglot data language / interchange format is always going to be a compromise between different programming languages. (In contrast, EDN doesn’t make compromises – it follows Clojure)
Preserves makes different tradeoffs, e.g. it has a choice of Float or Double (a little odd IMO), and you can’t round trip that in Python or JavaScript with their native types
Similarly it seems to make semantic guarantees about ordering which heavily constrain the implementation.
Minor quibble, but the text format seems a bit “maximalist” with 3 formats for binary – JSON-escaped string, hex, base64
You see the polygot compromise issue in protobuf too – it has a strong C++ accent, and Java / Python / JavaScript users are all unhappy with it to varying degrees. (Nonetheless protobufs are widely used, and pretty good all things considered, or at least earlier versions were)
Another way to look at it is that an interchange format is in some sense limited to the INTERSECTION of features of all programming languages it supports, NOT the union – making it a very constrained problem, and JSON isn’t too bad from that perspective!
So my overall point is that there isn’t a culture of compromise in Lisps (JSON being a compromise) to facilitate coarse-grained code sharing – it seems they would rather not share code with other Lisps
(And also I’m unsure what compromises Preserves is making; it also seems the lack of compromise will limit cross-language interoperability)
A polyglot data language / interchange format is always going to be a compromise
I don’t think this is true. The data language can have a semantics of its own. Then it’s up to implementations to respect the semantics or not. With JSON, there’s nothing but the syntax: the semantics is trivial.
Re floats, yes, I’m not sure about that or about what is best to do, especially given the increasing prominence of eg bfloat16 etc. The key is to have a given text mean something in and of itself, independent of any particular programming language implementation. So if doubles mean something different than floats, they need to be distinguishable.
I’d like to know what kinds of implementation constraints you’re thinking of wrt ordering too. Ultimately it’s a simple binary predicate. Integration with most implementations’ native ordering system has been fairly straightforward, too. The worst ones have been Python and JavaScript. Python doesn’t expose quite the right level of abstraction, and JavaScript exposes no support for extensible equivalences or orderings at all.
Hah! Actually I mean “trivial” in the formal sense, as in the equivalence over JSON terms is a trivial syntactic relation (cf https://dl.acm.org/doi/10.1023/A%3A1007720632734) that essentially only relates identical texts. It’s exactly the bits you name “non-trivial” above that force the equivalence over JSON terms to be “trivial”. One may not in general relate encoded JSON terms that differ in any interesting way such as duplicate or alternatively-ordered dictionary keys, different presentations of semantically identical numbers, etc. I have collected a few examples here https://preserves.dev/why-not-json.html#json-syntax-doesnt-mean-anything .
There is an I-JSON profile that is stricter https://datatracker.ietf.org/doc/html/rfc7493 tho I have some quibbles with the way it is specified. I would prefer to specify the behaviour of the receiver rather than the sender, in particular wrt the values of numbers. Section 3 has some dangerous wording:
Protocols that use I-JSON messages can be written so that receiving
implementations are required to reject (or, as in the case of
security protocols, not trust) messages that do not satisfy the
constraints of I-JSON.
In a security context it is a classic blunder to accept invalid messages, even if you do not trust them (whatever that might mean). This is a good way to create a confused deputy vulnerability.
What I mean by “compromise” is exactly what you’re saying with the “worst ones have been Python and JavaScript”
All programming languages have their own semantics, which their users know and like and rely on. You can define semantics for a data language, sure, but they’re by definition not the language’s semantics, and that creates a bit of friction – or sometimes enough of a problem where they abandon the tech
No matter what you choose, some languages are winners, and some are losers, to varying degrees.
This is my observation based on seeing people rewrite the code generators for protobufs a couple times (in C++/Java/Python), and also a similar data model mismatch with XML (and also SQL if you think about it)
XML was once used for struct-array / JSON-like serialization (rather than documents, which it’s actually good for), but it also has the DOM, which is library in each language
But it’s just way less convenient to use that native types. JSON happens to map fairly approximately and comfortably to the native types of a large set of common languages (JS, Python, Ruby, PHP, Perl, - less so Lua, Lisp).
It’s a compromise that’s better than XML for many applications (not saying it’s the best compromise!)
Apparently CBOR added the unicode/bytes distinction. Well then Python, JS, and Java are winners because they have that distinction, and Go and bash are losers because they don’t.
Every decision has 2 sides – float/double, map/record, unicode/bytes, signed/unsigned, etc. You can’t make everyone happy!
I’m not subscribed to this mailing list so I will just post here to say a sincere thank you for your efforts, John!
I can certainly understand that this is an exhausting position to be in. All I can say is that I was pessimistic from the start when I saw how many features R7RS-large was trying to standardize. It is just way too vast and comprehensive, which is a recipe for burnout, especially with the super critical, perfectionistic and conservative lot that is the Scheme community (if there even is such a thing; R6RS showed how divided the community is on fundamental things).
My issue with SRFI’s has always been that, when I go to a Scheme’s website and it says It implements R5RS with SRFI 17, SRFI 91, and SRFI 26 I don’t know what that means.
But If see R6RS, or R7RS-small I have a pretty good idea what is in there. Personally I like R6RS, and Racket’s implementation the best, so that is what I use.
Some people (including myself and it would appear @Decabytes as well) find it harder to reason about and discuss ecosystems like this where there is a large universe of possible combinations. There is no single reference or document that covers what is supported. Instead there are now various opaque identifiers that must be mentally juggled and compared and remembered. (It’s great that there’s a single place to look up the meaning of each SRFI, but I don’t think that solves comprehension.)
If you are making some software, you can no longer say “works with Scheme R[number]RS implementations”, but you instead have to list out the SRFIs you use, which may or may not be supported by the user’s favoured implementation. Then you have to repeat that complexity juggling with other libraries you may also want to use.
It’s a general issue that tends to arise with any ecosystems arranged in this way. It prioritises implementer flexibility and experimentation over user comprehension of what’s supported. (Maybe that’s okay, maybe it’s not! Probably like all things … it depends on context, each person’s preference, etc.)
People have made similar complaints about the XMPP ecosystem with its XEPs, which is also an ecosystem of optional extensions.
The Haskell situation is maybe a little better in that—practically speaking—there is only one Haskell compiler in widespread use. You could argue that this is a net loss for the Haskell ecosystem! But the fact that most (all?) language extensions are enabled on a per-module basis means that compatibility comes down to asking “which version of GHC are you using?” rather than needing to ask “does your compiler support TemplateHaskell? And DerivingVia? How about MultiWayIf?”
(I’ll add that the Haskell language extensions are referred to by name. If common usage in the Scheme community is to talk about “SFRI 26,” it does seem like Haskell puts less of a burden on one’s memory when talking about these things.)
so if you write a program which uses SRFIs X Y and Z you just need to check if X and Y and Z are in that list. This is a very deterministic, black and white, well defined thing.
You don’t need to memorize the numbers btw. I don’t know why people think that.
It prioritises implementer flexibility and experimentation over user comprehension of what’s supported
It’s explicitly listed out what is is supported, it is very easy for a user to look at the list and see that the things on the list are supported.
But I also really don’t think that’s true at all. the SRFI process is to enable users to have understandable libraries of code that they can use, potentially coordinated across implementations. implementors doing exploration would not take the time to specify stable APIs like that, write documentation on them necessarily. I think you have this backwards.
Cross checking several implementations against a list of specs before I start writing a program sounds complicated to me but is “very easy” for typical scheme users tells me I’m not smart enough to enjoy this language
They are all there, but what’s there doesn’t necessarily mean the implementations are actually compliant. There’s often caveats “Our implementation just re-exports core foo in place of srfi-foo and differs in semantics” — or they won’t tell you that, and it’ll be different.
That’s a totally different question? It doesn’t make sense to write that as a reply to my comment that provides a list of the SRFIs.
If a low quality implementation is providing an incorrect implementation of a specification then that is obviously a bug. I don’t know what that has to do with me though.
Yes I don’t have them all memorized because there are so many and what Schemes implement what varies. I have much better knowledge of revised reports because they are self contained groupings of specific functionality.
SRFIs would be better if they targeted more practical problems. There’s a fairly recent JSON SRFI, which is great. But, generally, most are experimental language features that have no business being standardized, most of the time, imho.
Then, there’s the terminal interface SRFI, which sounds great, but doesn’t have a fallback pure scheme implementation. So you’re at the will of implementations / library authors to build them out, which is non-trivial, and likely not fully cross compatible anyway.
The community is too small, and the ecosystem too fractured. :/
Not every SRFI can be implemented in pure scheme, but that’s an advantage not a disadvantage: since they can specify things that would otherwise be impossible to add as an external library. It lets you know that you need an implementation to provide you SRFI-x.
The community is too small, and the ecosystem too fractured
This is ironic isn’t it. For such a small community to be so fractured…
SRFIs are more about an API spec, not the implementation. They commonly have a reference implementation, of course.
Package managers in scheme aren’t standardized… :) until r6rs you didn’t even have a common way to do libraries across implementations outside of load so… it’s all very tricky. Very very tricky. :)
Well, not my community, but I would want a standards process to end up with me being able to use code written under the standard in any compatible implementation with ideally zero changes. This is how C and Fortran compilers work.
Going beyond C, I’d expect standardisation to establish a common system for specifying modules and packages, project dependencies and their versions and a registry of compatible packages for people to use.
If you get all of that then you should be able to switch implementations quite easily and a common registry and package format would encourage wider code reuse rather than the current fracturing that lispers complain about.
Is that just not something schemers are interested in? (Genuine question) And if they’re not, then what’s the point of the standardisation process?
Agree completely: things like a JSON library are great to have. useful to specify as a SRFI but not part of the language itself. The whole value of scheme is to have a tiny core language that it is so flexible that you can add things like this as a library.
SRFIs would be better if they targeted more practical problems. There’s a fairly recent JSON SRFI, which is great. But, generally, most are experimental language features that have no business being standardized, most of the time, imho.
I feel this too. A lot of the newer SRFIs feel very “galaxy brain” slash “language design by the back door” slash “the SRFI author is probably the only one who wants this” but are unlikely to positively impact my day-to-day life writing Scheme programs tbh
In general I feel that there are many very smart people churning out API designs that I don’t actually like nor want to use. Maybe I’m not smart enough to appreciate them. If so that’s OK. Aesthetically, many of the designs feel very “R6RS / abstract data types / denotational semantics” focused. Which is fine I guess, but I don’t personally enjoy using APIs designed that way very much, nor do I think they’re going to “make fetch Scheme happen” for the modal industry programmer anyway
ultimately folks are free to do whatever they want with their free time so I’m not mad about it, I’m happy to just keep plugging along using (mostly unchanging) older R5RS implementations and porting code to my own toy module system, etc and relying on portable code as much as possible
FWIW I thought R6RS was “fine” until they broke all my code using DEFINE-RECORD-TYPE because something something PL nerds denotational semantics etc. I have appreciated that the R7RS-small work I’m aware of thus far doesn’t break my code in the same way R6RS did
I believe it was R6RS that included the first operational semantics for a standard Scheme. Previously R4RS had included a denotational semantics, which R5RS had left unchanged despite there being changes in the text that required a change to the semantics.
In neither R4-, R5- nor R6RS did one need to read the semantics to make effective use of the language.
A lot of the newer SRFIs feel very “galaxy brain” slash “language design by the back door” slash “the SRFI author is probably the only one who wants this” but are unlikely to positively impact my day-to-day life writing Scheme programs tbh
FWIW, I agree. The newer SRFIs seem very much aimed at comprehensiveness instead of ergonomics. If you look at some of the older SRFIs they seem a lot more focused and minimal.
Click here to try again.
If you’ve seen this page more than once, try switching accounts, or check with your organisation’s administrator to make sure that you have permission.
Presumably other people see something different here?
Indeed, it loads fine here. This is the original message from John Cowan:
This is my resignation letter as chair of the R7RS-large project. I have come to the conclusion that I can no longer serve as Chair. I am exhausted by the effort, and I do not think that there is any further hope that I can get sufficient agreement among the different players to have any hope of coming to a conclusion. On the contrary, agreement is further away than ever, and people’s views are more and more entrenched.
I was planning to wait until the 21st to announce this, but I don’t think there’s any further point in waiting. It will be up to the Steering Committee to find someone to replace me or to officially abandon the project. I have no recommendations at this time.
Consequently, I will not be presenting anything at ICFP this year.
Wishing jcowan all the best. He was fantastic and we appreciate his work enormously!
I must say I’m very upset with the scheme standardization process, I think we failed the community. Interoperability was never encoded into the spec and was never achieved. Some implements are actively against interoperability, which I believe is a very harmful mindset for scheme community and impacts the scheme language negatively.
Nothing we can do about it really.
What does interoperability mean here? Interoperability with c? With older versions of scheme?
Each version of the Scheme standard describes a family of languages with features and semantics that may or may not be present in a given implementation. So it’s hard to write portable Scheme code.
For example, in R5RS and in R7RS-small, a Scheme implementation that implements the full standard will have integers of unlimited size, floats, complex numbers and rational numbers. There are distinct syntaxes for integer, rational, float and complex literals. But the only numeric type you must implement are fixed size integers with a range large enough to describe the size of an arbitrary string or array. You don’t have to implement the full syntax for numeric literals.
The R6RS standard, which came in between R5RS and R7RS-small, mandated that you implement the full numeric tower, and had a lot of extra stuff that made Scheme much larger and more complicated than before. There was a rebellion against R6RS by Scheme implementors. So the steering committee decided to split the next Scheme standard into ‘small’ and ‘large’ variants. The R7RS-small standard was ratified in 2013 (the process started in 2009). This letter is about the collapse of the current R7RS-large standards process.
Nitpick: I think the syntax itself has to be supported, in that an implementation needs to recognise that it’s a numeric literal, but the implementation is free to raise an exception or coerce such literals to a supported numeric type:
Amongst different implementations, I figure
This has been said before, and won’t win me any friends, but I feel like Lisps dropped the ball on creating JSON
They have s-exps, but each dialect is incompatible
Csexp seems like the closest thing, but Rivest’s web page is down, so that tells you about how well maintained it is
https://news.ycombinator.com/item?id=36282798
Clojure has EDN but it’s similarly poorly spec’d, and not interoperable with other Lisps - https://github.com/edn-format/edn (maybe not poorly spec’d, but does not scream “friendly to implement”)
I think Lisps do not like to interoperate with other Lisps – it’s easier just to write the same code in your own Lisp (classic curse of Lisp)
Meanwhile Python and Ruby can talk to JavaScript and C++, so Unix “won” in that sense
And I also have to say that after writing a GC and looking at many Lisp GCs, it’s shocking that they don’t use serialization and processes more, because that’s the easiest and best way to ameliorate the inherent serial nature of GC – the easiest way to actually use all your cores, to use an entire machine
i.e. my conjecture is that basically any Lisp that doesn’t run on the JVM probably has contention problems … and I’m not sure if even the JVM GC can scale to 64 or 128 cores linearly
Julia is a Lisp, and very fast, but last time I checked the GC was not as strong (which is understandable because a good parallel GC is a 10 year long research project for a talented team)
I’m not going to address the majority of your comment, because I’ve spent enough time unfruitfully arguing with you about these issues, but:
Due to pointer doubling, a work-inefficient parallel gc algorithm can have span in the log of the span of the live portion of the heap (which is bounded, in truly pathological cases, by the size of the heap)—so there is no want for scalability. More pertinently, a work-efficient parallel gc can have span linear in the span of the live portion of the heap; then, if you segment your heap the way you would be forced to anyway given multiple processes, the span is the same. So I’m not really sure what you’re trying to say here.
A parallel gc was recently written for sbcl. It seems to scale well. It has some minor problems that might prevent it from scaling to 64 or 128 cores, but they would not be particularly difficult to overcome.
I’m not sure why you single out gc. Most lisp compilers don’t generate particularly good code either—it may not be ‘scalability’, but large single-digit factors frequently lie there. Ultimately, attaining good performance will in general require knowledge, interest, and time, and it is rare that someone has all three.
Not sure what you mean by the pointer doubling, but yeah “inherently serial” may be a bit strong. I’m more saying that scalable and parallel GC is hard in multiple ways – GC concurrent with mutator, and even if you stop the world, parallel marking or sweeping
I guess the short version of what I’m saying is that Lisp didn’t achieve the scalability of Erlang, which at least on the face of it might have been a natural niche. (If I google “Parallel Lisp” there is lots of work on it 40 years ago, before Erlang even existed)
But maybe not on closer examination, because Lisp isn’t purely functional, while Erlang is.
Lisp implementations apparently never embraced “shared nothing” – either with a VM that copies data between tasks like Erlang, or by Unix-style serialization between processes
Pointer doubling is https://www.youtube.com/watch?v=33ZrIt-iGM4&t=1235s. A largely hypothetical concern, since you don’t actually have pathological heaps, but it demonstrates that tracing gc is asymptotically scalable even on pathological heaps.
Concurrent gc is about latency, not scalability, so beside this particular point.
I flipflop on shared-memory. But ultimately, although the fine-grained-concurrent discipline is complex and difficult, it enables the creation of very high-performance concurrent data structures. It is not a monotonic issue. But fundamentally, lisp belongs to (or, really, originates) the big-mutable-pointer-soup programming discipline, on a single thread, so it seems more coherent to me to preserve that in a move to multiple threads. You need fine-grained-concurrent data structures to implement those nice shared-nothing actor models anyway. And the BEAM has some problems with this, so I hear: nominally, large objects cannot be shared, so if you want to send a large object to another process and not suffer poor performance due to copying, you must laboriously use a separate out-of-band channel for it (c.f. https://gbracha.blogspot.com/2014/09/a-domain-of-shadows.html); also, sending a message to a process locks that process, and hence message sending is not scalable (admittedly, I don’t know of any really scalable mpsc queues satisfying the ordering properties of erlang except for an unpublished algorithm by yours truly).
I’m not sure what it means for a language to be scalable, but implementations of lisp admit the creation of applications that scale to large numbers of threads and cores, and parallel gc (such as the one I linked above) ensures that gc is not a bottleneck in the case of an application whose allocation rate is proportional to the amount of work it does.
Many of the Erlang VM design decisions require a functional language. Here’s a great HN comment that explains this in detail. It’s not an approach that works for other languages.
Yeah that makes a lot of sense - you start with the constraints of the Erlang runtime, and design a language around that. I’m not a huge fan of the language constraints that reduction scheduling miplies, but it’s a principled choice.
When introducing Mojo, Chris Lattner said that “MLIR needed a syntax” – the same logic applies there. MLIR is a huge amount of work, and giving it some Python-like syntax isn’t that much work comparatively (although Mojo in totality is an absolutely huge, big ambitious language)
And for Oils our runtime for statically typed Python is starting to be “worthwhile”. If we can do a few more optimizations, it might be better than other options for string/graph workloads, and could deserve a syntax …
You’re not wrong about the serialization format, unfortunately. I have a feeling this has to do with reader extensions (as well as built-in nonstandard read syntax), so if you were to standardize some sort of s-expression format, it’d have to use a completely separate reader, so it’s not as “obvious” to do that. Essentially like JSON started out as “just call
eval
on your string to get the JS object” and later refined to “actually, that’s a terrible idea, always useJSON.parse
”. It took a few years for people to get the overall message, IIRC.Regarding GC, there’s at least Cyclone which supports a threaded Cheney on the MTA. It’s still on my TODO list to take a closer look at its implementation.
Yes, the whole “data language” idea is not (surprisingly to me) not obvious!
I remember a coworker who wrote JavaScript, who insisted JSON has properties of JavaScript that it doesn’t have
But now JSON is used as an interchange between C++ and Python, e.g. Clang’s compile_commands.json, which has nothing to do with JavaScript
So it has successfully escaped ! :) It is kind of interesting how computer languages are “psychological”, not just technical
Preserves (https://preserves.dev/) is better specced than JSON, while also providing Lisp-friendly symbols & records. I’d recommend it over EDN by default.
Weird that it has both records and dictionaries: records allow duplicate keys but dictionaries do not. Records do not specify which key takes priority when there are duplicates, which is a classic blunder.
Both records and dictionaries allow arbitrary values as keys, including compound values, which imposes challenging requirements on every implementation.
Strings are specified as sequences of unicode codepoints, but the textual syntax only allows for hex escapes in the BMP, and it doesn’t specify what to do about invalid surrogates.
How do records allow duplicate keys? They’re positional.
What are the challenging requirements for implementations of value-keyed dictionaries? Even Python manages it without much fuss, and languages like Java, Racket, Smalltalk and Rust without any fuss at all.
The string syntax underspecification is a bug. Thank you. I’ll amend the spec. The intent was to be compatible with JSON text string syntax, despite having an actual semantics.
Oh, sorry, I misread the spec: I thought it said records are a tuple of labelled values, rather than a labelled tuple of values. In which case, I wonder what is the difference between a record and a non-empty sequence. I can see there might be different API affordances, but the semantics don’t say what the differences should be.
For dictionaries, it is very common for languages to stringify keys. Awk, JavaScript, Perl, PHP - you mention Python as not needing much fuss, but my point is that you can’t use normal Python dictionaries because they only support string keys. It also means you can’t easily round-trip via other data formats that only support atomic dictionary keys.
Different API affordances, indeed. Good point on the semantics not properly motivating records; the idea is that they should serve as a labelled product (as in, sum-of-labelled-products). Compare the conventions overlaid on SPKI Sexps, where the first element of a list is required to be a symbol-like string. One could, like SPKI, encode this via lists, but then lists blur together with labelled products. Same goes for dictionaries etc.
Normal python dictionaries support arbitrary immutable (ish!) keys (e.g.
{(1, 2, 3): (4, 5)}
). I wouldn’t recommend following the chain of reasoning involved in the design being the way it is, it made my head hurt and caused me to despair a little about the state of humanity, but in a nutshell you can stick anything in there that’s hashable, because in python hashability and immutability are conflated, in this area.Roundtripping isn’t a major concern: in general roundtripping of data formats via other data formats can’t be done. There’s usually a semantic mismatch there. Roundtripping of subsets, though, that is a fine idea, and it works well with Preserves; for example, you can stick to the subset of Preserves that matches JSON (when expressed as text), and interop that way.
For stringly-keyed dictionary languages, like JavaScript, I’ve gone with the approach of reusing the canonical serialization of a value. It works surprisingly well. (If it were lazily produced, it’d perhaps even be an asymptotically efficient way of working…)
Oops, I had a head full of too many languages!
How do you handle the fact that Python dictionaries are mutable, so they can’t be used as keys in dictionaries? I think that implies you can’t use them as is to implement Preserves dictionaries.
In JavaScript, if I serialize
{ "x": 1 }
it sounds like that would deserialize as{ "\"x\"": 1 }
?I wrote this https://preserves.dev/python/latest/values/#preserves.values.ImmutableDict to handle immutability of dictionaries. The class inherits from ordinary dict.
Re JavaScript, no, there’s this instead: https://gitlab.com/preserves/preserves/-/blob/930964ca055f84d3cd1b520204a296f53b611907/implementations/javascript/packages/core/src/flex.ts
Good catch on the underspecification! I think that’s considered a spec bug.
As for records, they’re more alternatives to sequences (arrays), much like how symbols are alternatives to strings. In the binary encoding, symbols are literally just strings a different tag integer, and records are sequences with 1 required element. Both symbols and records exist for the common case of disambiguating special data constructors from your common arbitrary strings and arrays. The initial element of a record being the label is standard convention, and useful for streaming parsers / deserializers.
Dictionaries requiring arbitrary
Value
s as keys is probably the most demanding part of the Preserves data semantics, after bytestrings. Everything else is pretty standard.Well I think it’s safe to say that EDN is “for” Clojure – I’m not aware of it being used outside Clojure … It’s explicitly a subset of the Clojure language, and it’s not a subset of any Scheme or Lisp (due to maps and vector syntax, but not just that)
Good article which includes EDN - https://www.scattered-thoughts.net/writing/the-shape-of-data/
I had seen Preserves, and it looks nice in many ways, but I think this article is making the wrong comparison - https://preserves.dev/why-not-json.html
Basically every language user is equally unhappy with JSON, and that’s a feature, not a bug. A polyglot data language / interchange format is always going to be a compromise between different programming languages. (In contrast, EDN doesn’t make compromises – it follows Clojure)
Preserves makes different tradeoffs, e.g. it has a choice of Float or Double (a little odd IMO), and you can’t round trip that in Python or JavaScript with their native types
So it looks like there is a Float wrapper, which for most Python apps doesn’t seem like a good tradeoff: https://gitlab.com/preserves/preserves/-/blob/main/implementations/python/preserves/values.py#L42
Similarly it seems to make semantic guarantees about ordering which heavily constrain the implementation.
Minor quibble, but the text format seems a bit “maximalist” with 3 formats for binary – JSON-escaped string, hex, base64
You see the polygot compromise issue in protobuf too – it has a strong C++ accent, and Java / Python / JavaScript users are all unhappy with it to varying degrees. (Nonetheless protobufs are widely used, and pretty good all things considered, or at least earlier versions were)
Another way to look at it is that an interchange format is in some sense limited to the INTERSECTION of features of all programming languages it supports, NOT the union – making it a very constrained problem, and JSON isn’t too bad from that perspective!
So my overall point is that there isn’t a culture of compromise in Lisps (JSON being a compromise) to facilitate coarse-grained code sharing – it seems they would rather not share code with other Lisps
(And also I’m unsure what compromises Preserves is making; it also seems the lack of compromise will limit cross-language interoperability)
https://lobste.rs/s/o2qvt1/project_mentat, a Datalog layer over SQLite written in Rust, uses EDN, albeit presumably because “It draws heavily on DataScript and Datomic.”
I don’t think this is true. The data language can have a semantics of its own. Then it’s up to implementations to respect the semantics or not. With JSON, there’s nothing but the syntax: the semantics is trivial.
Re floats, yes, I’m not sure about that or about what is best to do, especially given the increasing prominence of eg bfloat16 etc. The key is to have a given text mean something in and of itself, independent of any particular programming language implementation. So if doubles mean something different than floats, they need to be distinguishable.
I’d like to know what kinds of implementation constraints you’re thinking of wrt ordering too. Ultimately it’s a simple binary predicate. Integration with most implementations’ native ordering system has been fairly straightforward, too. The worst ones have been Python and JavaScript. Python doesn’t expose quite the right level of abstraction, and JavaScript exposes no support for extensible equivalences or orderings at all.
JSON is less trivial than it was intended to be! And I think you are aware of and have avoided its mistakes.
The non-trivial parts are
what to do about duplicate keys in objects - differences in implementations have caused security vulnerabilities
what is the precision of numbers - in practice it’s IEEE float64 but that isn’t what the spec says
unicode
Hah! Actually I mean “trivial” in the formal sense, as in the equivalence over JSON terms is a trivial syntactic relation (cf https://dl.acm.org/doi/10.1023/A%3A1007720632734) that essentially only relates identical texts. It’s exactly the bits you name “non-trivial” above that force the equivalence over JSON terms to be “trivial”. One may not in general relate encoded JSON terms that differ in any interesting way such as duplicate or alternatively-ordered dictionary keys, different presentations of semantically identical numbers, etc. I have collected a few examples here https://preserves.dev/why-not-json.html#json-syntax-doesnt-mean-anything .
Heh, I see.
There is an I-JSON profile that is stricter https://datatracker.ietf.org/doc/html/rfc7493 tho I have some quibbles with the way it is specified. I would prefer to specify the behaviour of the receiver rather than the sender, in particular wrt the values of numbers. Section 3 has some dangerous wording:
In a security context it is a classic blunder to accept invalid messages, even if you do not trust them (whatever that might mean). This is a good way to create a confused deputy vulnerability.
What I mean by “compromise” is exactly what you’re saying with the “worst ones have been Python and JavaScript”
All programming languages have their own semantics, which their users know and like and rely on. You can define semantics for a data language, sure, but they’re by definition not the language’s semantics, and that creates a bit of friction – or sometimes enough of a problem where they abandon the tech
No matter what you choose, some languages are winners, and some are losers, to varying degrees.
This is my observation based on seeing people rewrite the code generators for protobufs a couple times (in C++/Java/Python), and also a similar data model mismatch with XML (and also SQL if you think about it)
But it’s just way less convenient to use that native types. JSON happens to map fairly approximately and comfortably to the native types of a large set of common languages (JS, Python, Ruby, PHP, Perl, - less so Lua, Lisp).
It’s a compromise that’s better than XML for many applications (not saying it’s the best compromise!)
And it works well enough in Go, Java, C++. It’s awkward, but those languages are the losers with respect to JSON. (FWIW a JSON Template language I designed had some influence on Go’s reflection, which is a little better than other static languages - https://www.oilshell.org/blog/2023/06/ysh-design.html#the-first-json-language-i-designed-2009 .)
https://diziet.dreamwidth.org/6568.html
Apparently CBOR added the unicode/bytes distinction. Well then Python, JS, and Java are winners because they have that distinction, and Go and bash are losers because they don’t.
Every decision has 2 sides – float/double, map/record, unicode/bytes, signed/unsigned, etc. You can’t make everyone happy!
Similar thing I wrote here – https://news.ycombinator.com/item?id=37170088
[Comment removed by author]
I’m not subscribed to this mailing list so I will just post here to say a sincere thank you for your efforts, John!
I can certainly understand that this is an exhausting position to be in. All I can say is that I was pessimistic from the start when I saw how many features R7RS-large was trying to standardize. It is just way too vast and comprehensive, which is a recipe for burnout, especially with the super critical, perfectionistic and conservative lot that is the Scheme community (if there even is such a thing; R6RS showed how divided the community is on fundamental things).
Should probably be folded into https://lobste.rs/s/jzuce1/my_resignation_letter_as_r7rs_large_chair
I never saw the point of R7RS-large when SRFIs exist. It should not be an issue to decouple the language specification from the libraries.
My issue with SRFI’s has always been that, when I go to a Scheme’s website and it says It implements R5RS with SRFI 17, SRFI 91, and SRFI 26 I don’t know what that means.
But If see R6RS, or R7RS-small I have a pretty good idea what is in there. Personally I like R6RS, and Racket’s implementation the best, so that is what I use.
[Comment removed by author]
Some people (including myself and it would appear @Decabytes as well) find it harder to reason about and discuss ecosystems like this where there is a large universe of possible combinations. There is no single reference or document that covers what is supported. Instead there are now various opaque identifiers that must be mentally juggled and compared and remembered. (It’s great that there’s a single place to look up the meaning of each SRFI, but I don’t think that solves comprehension.)
If you are making some software, you can no longer say “works with Scheme R[number]RS implementations”, but you instead have to list out the SRFIs you use, which may or may not be supported by the user’s favoured implementation. Then you have to repeat that complexity juggling with other libraries you may also want to use.
It’s a general issue that tends to arise with any ecosystems arranged in this way. It prioritises implementer flexibility and experimentation over user comprehension of what’s supported. (Maybe that’s okay, maybe it’s not! Probably like all things … it depends on context, each person’s preference, etc.)
People have made similar complaints about the XMPP ecosystem with its XEPs, which is also an ecosystem of optional extensions.
How I feel about Haskell language extensions.
The Haskell situation is maybe a little better in that—practically speaking—there is only one Haskell compiler in widespread use. You could argue that this is a net loss for the Haskell ecosystem! But the fact that most (all?) language extensions are enabled on a per-module basis means that compatibility comes down to asking “which version of GHC are you using?” rather than needing to ask “does your compiler support TemplateHaskell? And DerivingVia? How about MultiWayIf?”
(I’ll add that the Haskell language extensions are referred to by name. If common usage in the Scheme community is to talk about “SFRI 26,” it does seem like Haskell puts less of a burden on one’s memory when talking about these things.)
There is, for example, chicken has the exact list of SRFIs it implements here https://wiki.call-cc.org/supported-standards#srfis
so if you write a program which uses SRFIs X Y and Z you just need to check if X and Y and Z are in that list. This is a very deterministic, black and white, well defined thing.
You don’t need to memorize the numbers btw. I don’t know why people think that.
It’s explicitly listed out what is is supported, it is very easy for a user to look at the list and see that the things on the list are supported.
But I also really don’t think that’s true at all. the SRFI process is to enable users to have understandable libraries of code that they can use, potentially coordinated across implementations. implementors doing exploration would not take the time to specify stable APIs like that, write documentation on them necessarily. I think you have this backwards.
Your assertion that this whole system is “very easy” has convinced me to avoid Scheme ¯_(ツ)_/¯ it doesn’t sound easy to me
What problem are you having with this system?
Cross checking several implementations against a list of specs before I start writing a program sounds complicated to me but is “very easy” for typical scheme users tells me I’m not smart enough to enjoy this language
There’s been some kind of confusion, you are now talking about a different problem.
It’s changed from checking if a set of numbers is a subset of another, to finding an intersection of multiple sets.
Great observation, it immediately brought to mind OpenGL extensions back in the day. What a nightmare.
I really enjoyed this question.
Its just a monoid in the category of endofunctors. Whats the problem?
I flagged this comment as unkind.
Alright.
Not sure about Decabytes, but for me: yes. The few times I’ve touched Scheme these numbers have been opaque and confusing.
They are all here https://srfi.schemers.org/
They are all there, but what’s there doesn’t necessarily mean the implementations are actually compliant. There’s often caveats “Our implementation just re-exports core
foo
in place ofsrfi-foo
and differs in semantics” — or they won’t tell you that, and it’ll be different.Ah, the joys of SRFIng.
That’s a totally different question? It doesn’t make sense to write that as a reply to my comment that provides a list of the SRFIs.
If a low quality implementation is providing an incorrect implementation of a specification then that is obviously a bug. I don’t know what that has to do with me though.
Yes I don’t have them all memorized because there are so many and what Schemes implement what varies. I have much better knowledge of revised reports because they are self contained groupings of specific functionality.
it’s precise
unclear would be stuff like “this has a bunch of list utilities and most of the file io functions you are used to from other places”
SRFIs would be better if they targeted more practical problems. There’s a fairly recent JSON SRFI, which is great. But, generally, most are experimental language features that have no business being standardized, most of the time, imho.
Then, there’s the terminal interface SRFI, which sounds great, but doesn’t have a fallback pure scheme implementation. So you’re at the will of implementations / library authors to build them out, which is non-trivial, and likely not fully cross compatible anyway.
The community is too small, and the ecosystem too fractured. :/
Not every SRFI can be implemented in pure scheme, but that’s an advantage not a disadvantage: since they can specify things that would otherwise be impossible to add as an external library. It lets you know that you need an implementation to provide you SRFI-x.
This is ironic isn’t it. For such a small community to be so fractured…
The lisp eats itself.
https://en.wikipedia.org/wiki/Ouroboros
That comment reminded me of the German guy dreaming of that and realizing it fits the data on the structure of benzene.
Why is a JSON lib part of the standardisation process anyway? I don’t get the value-add vs adding the lib to the package manager.
Probably the answer is that compatibility isn’t reliable enough for that or there is no package manager, or something like that?
SRFIs are more about an API spec, not the implementation. They commonly have a reference implementation, of course.
Package managers in scheme aren’t standardized… :) until r6rs you didn’t even have a common way to do libraries across implementations outside of
load
so… it’s all very tricky. Very very tricky. :)@river, also.
Well, not my community, but I would want a standards process to end up with me being able to use code written under the standard in any compatible implementation with ideally zero changes. This is how C and Fortran compilers work.
Going beyond C, I’d expect standardisation to establish a common system for specifying modules and packages, project dependencies and their versions and a registry of compatible packages for people to use.
If you get all of that then you should be able to switch implementations quite easily and a common registry and package format would encourage wider code reuse rather than the current fracturing that lispers complain about.
Is that just not something schemers are interested in? (Genuine question) And if they’re not, then what’s the point of the standardisation process?
It is something some of us are very interested in.
But it is also something that some implementors are explicitly against.
Which means we all end up screwed and it makes the standards useless and harms the language and community in the long term.
The goal was the get cooperation and interoperability, but it sadly didn’t end up happening.
What a shame. Thanks for clearing that up for me.
Agree completely: things like a JSON library are great to have. useful to specify as a SRFI but not part of the language itself. The whole value of scheme is to have a tiny core language that it is so flexible that you can add things like this as a library.
I feel this too. A lot of the newer SRFIs feel very “galaxy brain” slash “language design by the back door” slash “the SRFI author is probably the only one who wants this” but are unlikely to positively impact my day-to-day life writing Scheme programs tbh
In general I feel that there are many very smart people churning out API designs that I don’t actually like nor want to use. Maybe I’m not smart enough to appreciate them. If so that’s OK. Aesthetically, many of the designs feel very “R6RS / abstract data types / denotational semantics” focused. Which is fine I guess, but I don’t personally enjoy using APIs designed that way very much, nor do I think they’re going to “make
fetchScheme happen” for the modal industry programmer anywayultimately folks are free to do whatever they want with their free time so I’m not mad about it, I’m happy to just keep plugging along using (mostly unchanging) older R5RS implementations and porting code to my own toy module system, etc and relying on portable code as much as possible
FWIW I thought R6RS was “fine” until they broke all my code using DEFINE-RECORD-TYPE because something something PL nerds denotational semantics etc. I have appreciated that the R7RS-small work I’m aware of thus far doesn’t break my code in the same way R6RS did
I believe it was R6RS that included the first operational semantics for a standard Scheme. Previously R4RS had included a denotational semantics, which R5RS had left unchanged despite there being changes in the text that required a change to the semantics.
In neither R4-, R5- nor R6RS did one need to read the semantics to make effective use of the language.
FWIW, I agree. The newer SRFIs seem very much aimed at comprehensiveness instead of ergonomics. If you look at some of the older SRFIs they seem a lot more focused and minimal.
denotational semantics is not the enemy here.
many of us were very unhappy with R6RS.
Presumably other people see something different here?
Indeed, it loads fine here. This is the original message from John Cowan:
There are various replies from others as well.