The argument I hear against kebab case is that it makes it impossible to write subtraction as foo-bar but like … that’s … good actually? Why are we designing our syntax specifically in order to accommodate bad readability patterns? Just put a space in there and be done with it. Same logic applies to question marks in identifiers. If there’s no space around it, it’s part of the identifier.
This is mentioned in the article too in a way. In addition to the readability point you make, the author makes the argument that most of us use multi-word identifiers far, far more often than we do subtractions.
I dunno, I think there’s a lot of pesky questions here. Are all mathematical operators whitespace sensitive, or just -? Is kebab-case really worth pesky errors when someone doesn’t type things correctly?
I format my mathematical operators with whitespace, but I also shotgun down code and might leave out the spaces, then rely on my formatter to correct it.
Basically, I think kebab-case is nice, but properly reserved for lisps.
Are all mathematical operators whitespace sensitive?
Yes, of course! There’s no reason to disallow tla+ as an identifier either or km/h for a variable to keep speed other than “that’s the way it’s been done for decades”.
I also shotgun down code and might leave out the spaces, then rely on my formatter to correct it.
The compiler should catch it immediately since it’d be considered an unrecognized identifier.
I’m not sure if this is an argument for or against what you’re saying here, but this discussion reminded me of the old story about how fortran 77 and earlier just ignore all spaces in code:
There is a useful lesson to be learned from the failure of one of the earliest planetary probes launched by NASA. The cause of the failure was eventually traced to a statement in its control software similar to this:
DO 15 I = 1.100
when what should have been written was:
DO 15 I = 1,100
but somehow a dot had replaced the comma. Because Fortran ignores spaces, this was seen by the compiler as:
DO15I = 1.100
which is a perfectly valid assignment to a variable called DO15I and not at all what was intended.
Use ASCII hyphen (-) in identifiers, and use the Unicode minus sign (−) for subtraction.
Permit -- (two hyphens) as a synonym for − (minus). Related to the fact that some languages let you write ≤ instead of <=, and so on. Related to the fact that -- turns to – in markdown.
Your text editor automatically converts -- to − and <= to ≤.
This makes more sense if you are viewing source code using a proportional font. Identifiers consume less precious horizontal screen space in a proportional font. Hyphens are shorter than underscores, so it looks better and is nicer to read.
Use ASCII hyphen (-) in identifiers, and use the Unicode minus sign (−) for subtraction.
#include <vader.gif> Nooooooo!!!!!!!!!
I really don’t like this idea. I’m all for native support for Unicode strings and identifiers. And if you want to create locale-specific keywords, that is also fine. I might even be OK with expanding the set of common operators to specific Unicode symbols, provided there is a decent way to input them. [1]
But we should never, ever use two visually similar symbols for different things. Yes, I know, the compiler will immediately warn you if you mixed them up, but I would like to strongly discourage ever even starting down that path.
[1] Something like :interpunct: for the “·” for example. Or otherwise let’s have the entire world adopt new standard keyboards that have all the useful mathematical symbols. At any rate, I’d want to think about more symbols a lot more before incorporating it into a programming language.
The hyphen and minus sign differ greatly in length, and are easily distinguished, when the correct character codes and a properly designed proportional font is used. According to The Texbook (Donald Knuth, page 4), a minus sign is about 3 times as long as a hyphen. Knuth designed the standards we still use for mathematical typesetting.
When I type these characters into Lobsters and view in Firefox, Unicode minus sign (−) U+2212 is about twice the width of Unicode hyphen (‐) U+2010. I’m not sure if everybody is seeing the same font I am, but the l and I are also indistinguishable, which is also bad for programming.
A programming language that is designed to be edited and viewed using traditional mathematical typesetting conventions would need to use a font designed for the purpose. Programming fonts that clearly distinguish all characters (1 and l and I, 0 and O), are not a new idea.
Sun Labs’ Fortress project (An HPC language from ~15 years ago, a one time friendly competitor to Chapel, mentioned in the article) had some similar ideas to this, where unicode chars were allowed in programs, and there were specific rules for how to render Fortress programs when they were printed or even edited. for example
(a) If the identifier consists of two ASCII capital letters that are the same, possibly followed by digits, then a single capital letter is rendered double-struck, followed by full-sized (not subscripted) digits in roman font.
QQ is rendered as ℚ
RR64 is rendered as ℝ64
it supported identifier naming conventions for superscripts and subscripts, overbars and arrows, etc.
I used to have a bookmark from that project that read “Run your whiteboard!”
I feel that the programming community is mostly stuck in a bubble where the only acceptable way to communicate complex ideas is using a grid of fixed width ASCII characters. Need to put a diagram into a comment? ASCII graphics! Meanwhile, outside the bubble we have Unicode, Wikipedia and technical journals are full of images, diagrams, and mathematical notation with sophisticated typography. And text messages are full of emojis.
It would be nice to write code using richer visual notations.
Ah yes, but if you want to get really cool (read: archaic), methinks you’d be even better served by diæresis, its ligature also being (to my mind at least) significantly less offensive than the Neëuw Yorker style guide’s abominable diære…sizing(?) ;-)
For me, this is not at all about typing comfort, it’s all about reading. Dashes, underscores and camel case all sound different in my head when reading them, the underscore being the least comfortable.
For me, this is not at all about typing comfort, it’s all about reading.
Dashes, underscores and camel case all sound different in my head when
reading them
I am the same way, except they all sound different from my screenreader,
not just in my head. I prefer dashes. It’s also a traditional way to
separate a compound word.
As far as I can tell, different variable styles don’t sound like anything in my head. They make it harder for me to read when it’s inconsistent, and I have to adjust to different styles, but an all_underscore codebase is just as good to me as an all camelCase.
I use Ctrl-N in vim so typing underscore names doesn’t seem that bad. Usually the variable is already there somewhere. I also try to read and test “what I need” and then think about the code away from the computer, without referring to specific names
In Ruby it’s just convention to name your function valid? instead of the is_valid or isValid you have in most languages. The ? Is just part of the function name.
In Python and others, f"{var=}" meaning f"var={var}"
I really like JavaScript’s {key} being equivalent to {key: key} … it rewards consistent naming
I like C#‘s (fairly new) slice notation of list[1..^1] which is equivalent to Python’s list[1:-1]. Using negative numbers to indicate counting from the end is cute but can lead to errors.
I’m a little ambivalent about obj?.prop meaning something like obj && obj.prop … but it’s definitely a microfeature and I guess is useful.
I also like C#‘s constructors like x = new X {prop1 = val, prop2 = val} which is kind of like x = new X(); x.prop1=val; x.prop2=val. There’s lots of different constructor patterns like this… I’m not sure which I like best, though I do find simplistic implementations like Python and JavaScript to be tedious.
Doesn’t that C# slice syntax work exactly the same as the Python syntax with negative numbers? Or is that what you were saying?
edit: I think I see what was being said now after reading some comments on this article from another site: C#’s method of using ^1 instead of -1 requires the explicit choice of the programmer, whereas Python syntax could allow a negative index to slip in as an arithmetic or logic bug.
Yes, exactly that… I know I’ve definitely encountered difficult bugs where a negative number slipped in and that changed the basic semantics of slices, but instead of getting an error or the empty list (what you get with, say, a_list[1:0]) you get data that is totally plausible and also wrong.
First, there’s the config keyword. If you write config var n=1, the compiler will automatically add a –n flag to the binary. As someone who 1) loves having configurable program, and 2) hates wrangling CLI libraries, a quick-and-dirty way to add single-variable flags seems like an obvious win.
This is not a feature I’d be thrilled about in a general-purpose programming language (although perhaps it makes sense in the context in which Chapel is used). Adding syntactic support in the language for a specific type of command-line argument parsing and mapping the result of that parse to a variable just makes me think that if I ever needed to do some kind of complicated command-line parsing with custom logic, the language would fight me on this point.
It has potential drawbacks, but what I’d really be interested in is a language where main can have any function signature: its arguments become positional command-line arguments if required or flags if optional, its documentation comment/string becomes the --help text, and argument annotations can be used to add short option names or to bypass parsing and just read the raw argument list.
I think you would be interested in OSes with structured commands. With this, the shell can prompt and provide help for arguments, do some basic parsing, and pass the results to argv.
Elvish also has this, but instead of restricting this to a main function (which Elvish doesn’t have anyway), you can do this with any function: https://elv.sh/ref/flag.html#flag:call.
This sounds good in theory, but will clash with the idea of sub-commands and mutually-exclusive option groups.
However, if you know what you want from such an option generator, it’s quite easy to fake is the language provides access to function type signatures. Python 3.x’s type information API is still somewhat clunky, but it allows that.
The real advantage of having that as a library instead of a language feature is that you can use it for more than one purpose.
We have vyos.opmode that automatically exposes all show_* functions (among other reserved names) as subcommands and their arguments as options, in that case as version.py show [--funny] (that we later map to show version and show version funny commands).
However, we also have a GraphQL schema generator that exposes that function as ShowVersion query in the API.
It’s a solution that only works for us and our set of special names, but everyone can make a generator for their own needs relatively easily.
I wrote a python library years ago to generate argument parsers from function signatures (or for subcommands, an argument parser for an object, module or class with function properties). https://github.com/cmcaine/cli
You can use it to generate argument parsers for pretty much any function or for entire modules (then the first argument selects which function to call).
Conversely, it means that you have a language feature that doesn’t work for environments where there is no CLI. For example, WebAssembly components, bare-metal code, and so on. That’s find if you want to write a language specifically for a limited subset of deployments but it’s not ideal.
What I really want is language support for defining these things in a library. For this example, I want some simple syntax that lets me modify a data structure at compile time from different declarations in different modules. If I have that, then I can have a CLI parser library that lets me define a Config[T, Name] in my compilation unit and have it register the type and name with the CLI parser library and have a transparent accessor for the new variable that actually queries the CLI library for the value.
Not sure if this is quite what you want, but Chapel also has config params (param being kind of like a constexpr), which are values that can be set at compile time. Thus, you can configure the program with different variables even if it won’t have a CLI.
Yeah magical default parsing is one side of the “I don’t know how to do command line padding” coin. The instant you want to do something a bit bespoke you’re digging through incantations. All for something that is literally “take this input and give me output”!
Is writing “config n as intParam(default=1)” or something too much?
A lot of these are things that I consider antipatterns in a language. I firmly believe in the aspect of the Smalltalk philosophy that says no feature should live in the language if it can live in the standard library (Smalltalk took this a bit too far). Putting things in the language means that changing them is incredibly hard. Putting them in a library let’s you test it at scale, evolve to, and move it into the standard library only when it’s shown to be useful, and even then you can replace it with something different if a particular deployment domain needs something different. Anything where you have syntax for a specific data structure is in this category. If I have specific syntax for dates then I’ve had to teach the compiler about a particular date data structure representation (and does that representation properly handle the Jewish calendar? Does it take leap seconds into account?). Providing good syntax for arbitrary constructors is a much bigger win. The litmus test I’d suggest for this is: can I define my own string representation and construct an instance of it with simple syntax that includes pretty-printing other variables in the same scope, without requiring the variables to be both referenced in the format string and listed separately (e.g. can I write “{foo} things” rather than “{} things”, foo).
I agree with most of this so don’t consider my point a rebuttal to your main argument.
If I have specific syntax for dates then I’ve had to teach the compiler about a particular date data structure representation (and does that representation properly handle the Jewish calendar? Does it take leap seconds into account?).
By this argument baked in syntax for simple arithmetic like 1 + 2 is a mistake because it doesn’t handle arbitrarily large integers, complex numbers, etc. But I think it’s clear it’s not a mistake, and it’s desirable to have the baked in syntax for the 95% case even if people have to do new BigInteger(a) or whatever for the exotic cases.
I am struggling to see why the same reasoning shouldn’t apply to dates, which are nearly as common an operation? Especially if the baked in date is at least a good, if not complete in all cases, implementation?
That is, generally speaking, I think baked in support for very common operations is a good thing.
A related case is “standard lib” vs “3rd part ecosystem”. Take http request support. In Go the standard library is the defacto solution. In Ruby you have a mess of 3rd party solutions with varying degrees of “standard adoption”: RestClient, Farraday, httpparty, etc. I much prefer the situation in Go.
By this argument baked in syntax for simple arithmetic like 1 + 2 is a mistake because it doesn’t handle arbitrarily large integers, complex numbers, etc
You need, at a minimum, integers, pointers, arrays, and records in a language to be able to build everything else in the library. Even in C, 1 + 2 is only slightly special, the default type of numeric literals is int, but there are suffixes available to specify others. C++ generalises this to allow constructing any user-defined type from a number and to define the plus operator. This means that you can do things like define units as types and write 2_km / 1_s and get a result that is a speed, if you are doing physics computation and you define types for those units, whereas in C you are stuck with unitless values and get all of the errors that this implies.
I am struggling to see why the same reasoning shouldn’t apply to dates, which are nearly as common an operation? Especially if the baked in date is at least a good, if not complete in all cases, implementation?
Because, unlike machine integers, a date type can be built from smaller units. If I am writing system software and care only about UTC or seconds since some epoch, I can use a simple date representation. If I am writing a user-facing application, I can use something that understands time zones and conversation between locales and all of the fun corner cases like the missing two weeks in the Russian calendar that are just unnecessary overhead in other contexts. Primitive integers have a single representation in any problem domain: values that fit in registers. These may be used to build other abstractions (e.g. Lisp-style integers that transparently promote to big integers on overflow if overflow is worse than dynamic memory allocation in your particular problem space, as it is almost anywhere outside of a kernel) but the set of useful abstractions that it can build is infinite and that’s why you should avoid putting them in the language. It’s far easier to replace parts of the standard library (or of optional libraries) in specific domains than it is to replace core bits of the language.
A related case is “standard lib” vs “3rd part ecosystem”. Take http request support. In Go the standard library is the defacto solution. In Ruby you have a mess of 3rd party solutions with varying degrees of “standard adoption”: RestClient, Farraday, httpparty, etc. I much prefer the situation in Go.
On the other hand, the 3rd party http crate in Rust is used widely across higher level crates/frameworks. For example the http::Request type is used in hyper, axum, tide. Obviously not 100% universally, just like in Go - see for example fasthttp.
Isn’t that the same hand? In ruby RestClient is used in all sorts other 3rd party libs, and has very wide adoption. But that whole way of doing things is an anti-pattern in my experience, because it’s not a true consensus in the way a standard library is – just a general trend of many people using the same thing, which can wax and wane.
I misunderstood you - I thought that in ruby all those gems are completely independent implementations of http protocol and that this is what you are complaining about.
Still, like I mentioned - even with high quality of net/http there are still users with needs that are better served with alternative implementations. Moreover in case of other parts of std lib (eg. syscall or testing/quick) it didn’t end nice - deprecated packages are kept indefinitely and can’t be evolved because of being part of std lib.
There are 2 main reasons why putting stuff in std lib is a big risk:
it’s not easy ahead of time to know if the design of some library will be good enough long term and will make it possible to accommodate future changes to requirement
adding to std lib is easy but removing is almost impossible
Having only dynamic dispatch and no intraprocedural flow control in the language made it very difficult to implement efficiently. If you can statically prove that the receiver of an ifTrue message is a Boolean then you can inline it but otherwise you at least need a dynamic type check on every conditional. You can often make that one or two instructions, but one or two instructions (one of which is a conditional branch) on every if statement adds up to a lot of unavoidable overhead. With type specialisation in a trace-based JIT you can do a bit better, but it’s then a lot of complexity in the runtime to get something that most languages have for free (and means that ahead-of-time compilation with good performance is incredibly hard).
It is not too common to pass around booleans directly. Usually, you branch based on some inline predicate like a comparison. So I’m unconvinced this is a major liability. You also pay nothing extra when a branch is taken the way you expect (and you have dynamic instrumentation to tell you what to expect).
With type specialisation in a trace-based JIT you can do a bit better, but it’s then a lot of complexity in the runtime to get something that most languages have for free
Better to have that complexity in the implementation than in the language or in user code. And a performant implementation of any language will have a great deal of complexity.
ahead-of-time compilation with good performance is incredibly hard
Counterpoint: ahead-of-time compilation is undesirable anyway because you lose the ability optimise dynamically :)
(AOT compilation also encourages people to do really horrible things like maintain stable binary interfaces and hardware isas.)
is not too common to pass around booleans directly. Usually, you branch based on some inline predicate like a comparison. So I’m unconvinced this is a major liability
Consider this in Smalltalk: a isSomeProperty ifTrue: [ y doSomething ].. This will be initially lowered to something roughly equivalent to this C:
You be able to do any optimisation, you need to inline tmp and so you want to transform it to:
bool o = msgSend_boolRet(a, #isSomeProperty);
if (o)
{
msgSend(y, #doSomething);
}
Unfortunately, this transform is impossible in Smalltalk without whole-program dataflow analysis. Instead, you end up doing something like:
Object *o = msgSend(a, #isSomeProperty);
if (o->isa == BoolClass)
{
if (o == True)
{
msgSend(y, #doSomething);
}
}
else
{
BlockClosure *tmp = ...;
msgSend(o, #ifTrue, tmp);
}
This adds a lot of conditionals to the code which both impedes later optimisation, hurts instruction cache usage, and hurts performance at the end. I’ve written an AoT compiler for Smalltalk before and this kind of thing was annoying.
That transform is the first step to then being able to do some useful type inference and replace those message-send calls with direct function calls.
Counterpoint: ahead-of-time compilation is undesirable anyway because you lose the ability optimise dynamically :)
Without getting into a philosophical discussion here about whether AoT compilation is desirable, being able to efficiently AoT compile a language is desirable whichever compilation strategy you use. Any time spent in a JIT compiler is time spent not executing the code that is being JIT’d and so it is desirable to be able to quickly get good performance with little dynamic analysis, even if you then choose to get better performance with dynamic analysis.
if (o == True) {
true branch
} else if (o == False) {
} else {
slow path
}
Which adds no overhead when the branch is taken the way you expect, and a single (correctly predicted) branch when it’s taken the other way. You can swap the checks for true and false based on instrumentation.
…but all of this is irrelevant if you are really doing a naive, base-case dispatch for isSomeProperty. If you are, then the dispatch time will completely dominate, making an extra branch irrelevant. If you have specialised the caller appropriately, you may be able to infer that the result is always a boolean.
Without getting into a philosophical discussion here about whether AoT compilation is desirable, being able to efficiently AoT compile a language is desirable whichever compilation strategy you use. Any time spent in a JIT compiler is time spent not executing the code that is being JIT’d and so it is desirable to be able to quickly get good performance with little dynamic analysis, even if you then choose to get better performance with dynamic analysis.
Perhaps. I think this is a fairly weak argument, though; you might say that, all other things being equal, it is better to have a language that can be efficiently aot’d, but all other things are clearly not equal. Particularly considering the only thing you harm is startup time.
Which adds no overhead when the branch is taken the way you expect, and a single (correctly predicted) branch when it’s taken the other way. You can swap the checks for true and false based on instrumentation.
Not true. It adds instruction cache usage and it means that you need more complex inlining heuristics later on.
…but all of this is irrelevant if you are really doing a naive, base-case dispatch for isSomeProperty. If you are, then the dispatch time will completely dominate, making an extra branch irrelevant. If you have specialised the caller appropriately, you may be able to infer that the result is always a boolean.
Also not true, because this is a data-flow property, whereas the former is a control-flow property and so is harder to infer. To determine that a property accessor always returns a boolean, you need to ensure that every store to that ivar is always of a boolean.
Perhaps. I think this is a fairly weak argument, though; you might say that, all other things being equal, it is better to have a language that can be efficiently aot’d, but all other things are clearly not equal. Particularly considering the only thing you harm is startup time.
Again, not true. You harm the total memory overhead (you need to track more state), you harm instruction cache usage (you need to have more code for handling side exits from the slow path), and so on.
not true, because this is a data-flow property, whereas the former is a control-flow property and so is harder to infer. To determine that a property accessor always returns a boolean, you need to ensure that every store to that ivar is always of a boolean
Determining the control flow is a prerequisite for inferring the type of the variable. If you have no idea what code is executed following your message-send, you can not possibly infer anything about the result. And determining what code to execute is, as you say, the hard part. So don’t understand what you are objecting to here.
I would love symbols in more programming languages, but it’s a little tricky to implement in programming languages without a runti–
Oh, I just figured out how to implement them in something with C’s compilation and linking model. Each symbol is a global pointer to a string with its name. Each symbol also has a linker symbol with its name. Make the linker symbols exported and tell the linker to deduplicate and coalesce them, via Black Magic, and it will fix up all references to them automatically. Bingo, each symbol is represented by a unique integer that is a pointer to a unique string, and all symbols with the same string are interned.
Generating new symbols with names not previously mentioned in the program then still requires dynamic memory and some kind of global runtime, but you need to allocate memory to generate new symbols no matter what so the machinery that creates new symbols can be part of the same lib that provides your memory allocator.
In Next Generation Shell I’ve experimented by adding
section "arbitrary comment" {
code here
}
and this is staying in the language. It looks good. That’s instead of
# blah section - start
code here
# blah section - end
Later, since NGS knows about sections, I can potentially add section info to stack traces (also maybe logging and debugging messages). At the moment, it’s just an aesthetic comments and (I think) easily skip-able code section when reading.
Symbols
I’ve decided not to have symbols in NGS. My opinion (I assume not popular) is that all symbols together is one big enum, instead of having multiple enums which would convey which values are acceptable at each point.
I really like the idea of sections. The only downside from the comment version (which I’ve seen used a fair amount) is that the end curly brace doesn’t include the name of the section (I’ve seen C++ programmers end their namespace with } // end namespace bla).
Another cool thing is that in some languages with first class block support (like Ruby) this can be rather trivially implemented. For instance, I think the following would do the trick in Ruby / Crystal:
def section(name)
yield
end
section("section name") {
# code here
}
Sometimes, oddly, best syntax is no syntax. For example, in OCaml, there is no special multi-line string syntax. Instead, you can just add line breaks inside normal double-quoted strings.
I also find the ability to add your own infix operators a very nice micro-feature. However, no micro-feature comes at zero cost.
Yeah… it feels like single-line strings are fixing something that’s no longer a problem: strings accidentally being unclosed and it causing confusing errors. With syntax highlighting and halfway decent error messages it’s not that important.
BUT, the one case where I’d like a multi-line string syntax is something like:
def run_query():
sql = """
SELECT * FROM x
"""
Where I’d like that literal to actually resolve to "SELECT * FROM x" – dropping the leading and trailing empty line and any consistent leading whitespace (with a special case for internal empty lines).
The indentation of the closing quote determines the leading whitespace that’s stripped.
And if there’s an initial newline it’s dropped. But the last newline isn’t dropped, which I think makes sense ? (You can leave it off if you want by putting it on the same line as the last line of text.)
oil$ var x = """
> hello
> there
> """
oil$ = x
(Str) 'hello\nthere\n'
FWIW this pretty much comes from Julia, which has Python-like multi-line strings, except they strip leading whitespace
The multi-line strings are meant to supplant the 2 variants of here docs in shell.
There are unfortunately still 3 kinds multi-line strings to go with 3 kinds of shell strings. But I’m trying to reduce that even further:
and you get the string “SELECT * \nFROM x”. There’s no ambiguity about leading spaces. If you wanted a leading space before SELECT or FROM, you’d just put a space there.
I’m just confused why it uses \\ instead of the more obvious “””.
Yeah, Lua’s insistence on requiring a special syntax for strings just because they contain newlines makes zero sense. A syntax for strings that don’t require escaping backslashes is fine, but there’s no reason to couple that to newline support.
OCaml does actually have a special literal-string syntax. It’s not required for multi-line strings, as you mention, but can be quite useful because you never need to (nor can!) escape anything in it. The delimiters are {[ and ]}, but also to cover the edge case of strings that contain the closing delimiter, you can put any identifier consisting of a-z or _ between the two characters of the opening delimiter, and then the closing delimiter must contain the same identifier.
This is wrong. There is no {[ ]} syntax in OCaml as of 5.0.0. You may be referring to the {| |} syntax which is not a special syntax for multi-line strings (since there’s no need for a special syntax, as normal strings already allow newlines) — it’s there to allow unescaped quotes in string literals. The side effect is that you indeed cannot escape anything.
Erlang also has pattern matching over binaries. I don’t want this as a language feature, but I do want sufficient language features to be able to implement this as a library feature. I hate writing any low-level network or filesystem (or binary file-format) code in non-Erlang languages as a result. This is especially annoying for wire protocols that have fields that are not full bytes and most-significant-bit first. Erlang handles these trivially, C forces you to experience large amounts of pain.
Yes - binary pattern matching is pretty much the biggest thing I miss on anything not BEAM. There’s all sorts of little touches to make it error-free too, like the endianness notation and precise bit alignment.
Why stop there? In Common Lisp reader macros allow you to pretty much do whatever you want! Want a date literal? Cool. Want to mix JSON with sexpressions? No problem! Want literal regex? Yup, can do that too.
They’re also called keywords because Clojure has a different construct for symbols. (The subtle distinction between when to use a keyword and when to use a symbol is a common “discussion” point with new adopters.)
If you like the Frink date literal, you may also like Elixir’s sigils. It’s used for date, time, regex and more. You can even create your own custom sigils.
Thanks. I like sigils better than Lisp reader macros, because you can parse a sigil using a context free grammar, without executing code from the module that defines the sigil. The ability to parse a program without executing it is valuable in a lot of contexts.
I haven’t actually used them, but KDL’s “slashdash” comments to knock out individual elements are pretty interesting. From The KDL Document Language:
On top of that, KDL supports /- “slashdash” comments, which can be used to comment out individual nodes, arguments, or children:
// This entire node and its children are all commented out.
/-mynode "foo" key=1 {
a
b
c
}
mynode /-"commented" "not commented" /-key="value" /-{
a
b
}
It’s a little more clear with the syntax highlighting on the site.
I really like python’s “in between” operator. 10 <= x < 20
Rust’s include_str! macro is super useful as well. Much nicer than Java’s overcomplicated getResourceAsStream() (which is still very nice). Having the ability to include data in an “executable” in a well defined way is great.
In Monte, we matured the max= syntax example; it happened to be a proposed syntactic extension for E, and we merely followed through with the proposal. We also expanded other augmented syntax. For example, this high-level syntax…
Awesome read! I love the three classes of features.
I’ve got one to throw in the mix, Nextflow’s file(). It just handles remote and local files the same, so the end user can use anything from a file on an FTP server, to s3. It just works and stages the files locally.
Totally agreed about kebab case. It’s an unusually major quality-of-life improvement.
I’d also add being allowed to use
?
in an identifier.user-record-valid?
is pretty clear, both as a function or as a variable.The argument I hear against kebab case is that it makes it impossible to write subtraction as
foo-bar
but like … that’s … good actually? Why are we designing our syntax specifically in order to accommodate bad readability patterns? Just put a space in there and be done with it. Same logic applies to question marks in identifiers. If there’s no space around it, it’s part of the identifier.Agreed! (hi phil 👋)
This is mentioned in the article too in a way. In addition to the readability point you make, the author makes the argument that most of us use multi-word identifiers far, far more often than we do subtractions.
I dunno, I think there’s a lot of pesky questions here. Are all mathematical operators whitespace sensitive, or just
-
? Is kebab-case really worth pesky errors when someone doesn’t type things correctly?I format my mathematical operators with whitespace, but I also shotgun down code and might leave out the spaces, then rely on my formatter to correct it.
Basically, I think kebab-case is nice, but properly reserved for lisps.
Yes, of course! There’s no reason to disallow
tla+
as an identifier either orkm/h
for a variable to keep speed other than “that’s the way it’s been done for decades”.The compiler should catch it immediately since it’d be considered an unrecognized identifier.
I’m not sure if this is an argument for or against what you’re saying here, but this discussion reminded me of the old story about how fortran 77 and earlier just ignore all spaces in code:
(from https://www.star.le.ac.uk/~cgp/prof77.html)
If I see
x-y
, I always parse it visually as a single term, not x minus y. I think that’s a completely fair assumption to make.I have always found
kebab-case
easier on the eyes thansnake_case
, I wish the former was more prevalent in languages.Raku (previously known as Perl 6) does exactly this: dashes are allowed in variables names, and require spaces to be parsed as the minus operator.
Crazy idea: reverse
_
and-
in your keyboard map :)Probably would work out well for programmers. All your variables are easier to type
When you need to use minus, which is not as often, you press shift
More crazy ideas.
#include <vader.gif>
Nooooooo!!!!!!!!!I really don’t like this idea. I’m all for native support for Unicode strings and identifiers. And if you want to create locale-specific keywords, that is also fine. I might even be OK with expanding the set of common operators to specific Unicode symbols, provided there is a decent way to input them. [1]
But we should never, ever use two visually similar symbols for different things. Yes, I know, the compiler will immediately warn you if you mixed them up, but I would like to strongly discourage ever even starting down that path.
[1] Something like
:interpunct:
for the “·” for example. Or otherwise let’s have the entire world adopt new standard keyboards that have all the useful mathematical symbols. At any rate, I’d want to think about more symbols a lot more before incorporating it into a programming language.The hyphen and minus sign differ greatly in length, and are easily distinguished, when the correct character codes and a properly designed proportional font is used. According to The Texbook (Donald Knuth, page 4), a minus sign is about 3 times as long as a hyphen. Knuth designed the standards we still use for mathematical typesetting.
When I type these characters into Lobsters and view in Firefox, Unicode minus sign (−) U+2212 is about twice the width of Unicode hyphen (‐) U+2010. I’m not sure if everybody is seeing the same font I am, but the l and I are also indistinguishable, which is also bad for programming.
A programming language that is designed to be edited and viewed using traditional mathematical typesetting conventions would need to use a font designed for the purpose. Programming fonts that clearly distinguish all characters (1 and l and I, 0 and O), are not a new idea.
Sun Labs’ Fortress project (An HPC language from ~15 years ago, a one time friendly competitor to Chapel, mentioned in the article) had some similar ideas to this, where unicode chars were allowed in programs, and there were specific rules for how to render Fortress programs when they were printed or even edited. for example
it supported identifier naming conventions for superscripts and subscripts, overbars and arrows, etc. I used to have a bookmark from that project that read “Run your whiteboard!”
the language spec is pretty interesting to read and has a lot of examples of these. I found one copy at https://homes.luddy.indiana.edu/samth/fortress-spec.pdf
Thanks, this is cool!
I feel that the programming community is mostly stuck in a bubble where the only acceptable way to communicate complex ideas is using a grid of fixed width ASCII characters. Need to put a diagram into a comment? ASCII graphics! Meanwhile, outside the bubble we have Unicode, Wikipedia and technical journals are full of images, diagrams, and mathematical notation with sophisticated typography. And text messages are full of emojis.
It would be nice to write code using richer visual notations.
Use dieresis to indicate token break, as in some style guides for coöperate:
kebab-case
infix⸚s̈ubtract
(Unserious!)
Nice. All the cool people (from the 1800’s) spell this word diaëresis, which I think improves the vibe.
Ah yes, but if you want to get really cool (read: archaic), methinks you’d be even better served by diæresis, its ligature also being (to my mind at least) significantly less offensive than the Neëuw Yorker style guide’s abominable diære…sizing(?) ;-)
Thank you for pointing this out. I think that diæresis is more steampunk, but diaëresis is self-referential, which is a different kind of cool.
I’ve tried that before and it turns out dash is more common than underscore even in programming. For example terminal stuff is riddle with dashes.
For me, this is not at all about typing comfort, it’s all about reading. Dashes, underscores and camel case all sound different in my head when reading them, the underscore being the least comfortable.
I am the same way, except they all sound different from my screenreader, not just in my head. I prefer dashes. It’s also a traditional way to separate a compound word.
Interesting, you must have some synesthesia :-)
As far as I can tell, different variable styles don’t sound like anything in my head. They make it harder for me to read when it’s inconsistent, and I have to adjust to different styles, but an all_underscore codebase is just as good to me as an all camelCase.
I use Ctrl-N in vim so typing underscore names doesn’t seem that bad. Usually the variable is already there somewhere. I also try to read and test “what I need” and then think about the code away from the computer, without referring to specific names
I like ? being an operator you can apply to identifiers, like how it’s used with nullables in C#, or, as I recall, some kind of test in Ruby.
In Ruby, ? is part of the ternary operator and a legal method suffix so method names like
dst?
are idiomatic.Ah, that makes sense. I don’t use Ruby so I wasn’t sure, I just knew I had seen it.
In zig
maybe.?
resolvesmaybe
to not be null, and errors if it is null.maybe?
is different, in my mind.In Ruby it’s just convention to name your function
valid?
instead of theis_valid
orisValid
you have in most languages. The ? Is just part of the function name.Here’s some other microfeatures I like:
f"{var=}"
meaningf"var={var}"
{key}
being equivalent to{key: key}
… it rewards consistent naminglist[1..^1]
which is equivalent to Python’slist[1:-1]
. Using negative numbers to indicate counting from the end is cute but can lead to errors.obj?.prop
meaning something likeobj && obj.prop
… but it’s definitely a microfeature and I guess is useful.x = new X {prop1 = val, prop2 = val}
which is kind of likex = new X(); x.prop1=val; x.prop2=val
. There’s lots of different constructor patterns like this… I’m not sure which I like best, though I do find simplistic implementations like Python and JavaScript to be tedious.Doesn’t that C# slice syntax work exactly the same as the Python syntax with negative numbers? Or is that what you were saying?
edit: I think I see what was being said now after reading some comments on this article from another site: C#’s method of using
^1
instead of-1
requires the explicit choice of the programmer, whereas Python syntax could allow a negative index to slip in as an arithmetic or logic bug.Yes, exactly that… I know I’ve definitely encountered difficult bugs where a negative number slipped in and that changed the basic semantics of slices, but instead of getting an error or the empty list (what you get with, say,
a_list[1:0]
) you get data that is totally plausible and also wrong.This is not a feature I’d be thrilled about in a general-purpose programming language (although perhaps it makes sense in the context in which Chapel is used). Adding syntactic support in the language for a specific type of command-line argument parsing and mapping the result of that parse to a variable just makes me think that if I ever needed to do some kind of complicated command-line parsing with custom logic, the language would fight me on this point.
It has potential drawbacks, but what I’d really be interested in is a language where
main
can have any function signature: its arguments become positional command-line arguments if required or flags if optional, its documentation comment/string becomes the--help
text, and argument annotations can be used to add short option names or to bypass parsing and just read the raw argument list.I think you would be interested in OSes with structured commands. With this, the shell can prompt and provide help for arguments, do some basic parsing, and pass the results to argv.
Big two examples here are VMS with DCL (command definition/creation docs) and IBM i with CL (an example, though without context, there’s a lot to get used to!).
Raku (previously known as Perl 6) has this: https://docs.raku.org/routine/MAIN
Elvish also has this, but instead of restricting this to a main function (which Elvish doesn’t have anyway), you can do this with any function: https://elv.sh/ref/flag.html#flag:call.
This sounds good in theory, but will clash with the idea of sub-commands and mutually-exclusive option groups.
However, if you know what you want from such an option generator, it’s quite easy to fake is the language provides access to function type signatures. Python 3.x’s type information API is still somewhat clunky, but it allows that.
We are doing that in VyOS now. For a very simple example, here’s a function that returns system version data (either in JSON or formatted for humans): https://github.com/vyos/vyos-1x/blob/current/src/op_mode/version.py#L66-L73
The real advantage of having that as a library instead of a language feature is that you can use it for more than one purpose.
We have vyos.opmode that automatically exposes all
show_*
functions (among other reserved names) as subcommands and their arguments as options, in that case asversion.py show [--funny]
(that we later map toshow version
andshow version funny
commands).However, we also have a GraphQL schema generator that exposes that function as
ShowVersion
query in the API.It’s a solution that only works for us and our set of special names, but everyone can make a generator for their own needs relatively easily.
I wrote a python library years ago to generate argument parsers from function signatures (or for subcommands, an argument parser for an object, module or class with function properties). https://github.com/cmcaine/cli
Nim can do something like this with the cligen library.
I wrote a python library years ago to basically do this. https://github.com/cmcaine/cli
You can use it to generate argument parsers for pretty much any function or for entire modules (then the first argument selects which function to call).
Conversely, it means that you have a language feature that doesn’t work for environments where there is no CLI. For example, WebAssembly components, bare-metal code, and so on. That’s find if you want to write a language specifically for a limited subset of deployments but it’s not ideal.
What I really want is language support for defining these things in a library. For this example, I want some simple syntax that lets me modify a data structure at compile time from different declarations in different modules. If I have that, then I can have a CLI parser library that lets me define a
Config[T, Name]
in my compilation unit and have it register the type and name with the CLI parser library and have a transparent accessor for the new variable that actually queries the CLI library for the value.Not sure if this is quite what you want, but Chapel also has
config param
s (param
being kind of like a constexpr), which are values that can be set at compile time. Thus, you can configure the program with different variables even if it won’t have a CLI.Yeah magical default parsing is one side of the “I don’t know how to do command line padding” coin. The instant you want to do something a bit bespoke you’re digging through incantations. All for something that is literally “take this input and give me output”!
Is writing “config n as intParam(default=1)” or something too much?
Agreed, it makes sense for languages that are going to be used from the command-line. Nextflow handles these under
params
https://www.nextflow.io/docs/edge/config.html#scope-params in a pretty elegant way!A lot of these are things that I consider antipatterns in a language. I firmly believe in the aspect of the Smalltalk philosophy that says no feature should live in the language if it can live in the standard library (Smalltalk took this a bit too far). Putting things in the language means that changing them is incredibly hard. Putting them in a library let’s you test it at scale, evolve to, and move it into the standard library only when it’s shown to be useful, and even then you can replace it with something different if a particular deployment domain needs something different. Anything where you have syntax for a specific data structure is in this category. If I have specific syntax for dates then I’ve had to teach the compiler about a particular date data structure representation (and does that representation properly handle the Jewish calendar? Does it take leap seconds into account?). Providing good syntax for arbitrary constructors is a much bigger win. The litmus test I’d suggest for this is: can I define my own string representation and construct an instance of it with simple syntax that includes pretty-printing other variables in the same scope, without requiring the variables to be both referenced in the format string and listed separately (e.g. can I write “{foo} things” rather than “{} things”, foo).
I agree with most of this so don’t consider my point a rebuttal to your main argument.
By this argument baked in syntax for simple arithmetic like
1 + 2
is a mistake because it doesn’t handle arbitrarily large integers, complex numbers, etc. But I think it’s clear it’s not a mistake, and it’s desirable to have the baked in syntax for the 95% case even if people have to donew BigInteger(a)
or whatever for the exotic cases.I am struggling to see why the same reasoning shouldn’t apply to dates, which are nearly as common an operation? Especially if the baked in date is at least a good, if not complete in all cases, implementation?
That is, generally speaking, I think baked in support for very common operations is a good thing.
A related case is “standard lib” vs “3rd part ecosystem”. Take http request support. In Go the standard library is the defacto solution. In Ruby you have a mess of 3rd party solutions with varying degrees of “standard adoption”: RestClient, Farraday, httpparty, etc. I much prefer the situation in Go.
You need, at a minimum, integers, pointers, arrays, and records in a language to be able to build everything else in the library. Even in C, 1 + 2 is only slightly special, the default type of numeric literals is int, but there are suffixes available to specify others. C++ generalises this to allow constructing any user-defined type from a number and to define the plus operator. This means that you can do things like define units as types and write 2_km / 1_s and get a result that is a speed, if you are doing physics computation and you define types for those units, whereas in C you are stuck with unitless values and get all of the errors that this implies.
Because, unlike machine integers, a date type can be built from smaller units. If I am writing system software and care only about UTC or seconds since some epoch, I can use a simple date representation. If I am writing a user-facing application, I can use something that understands time zones and conversation between locales and all of the fun corner cases like the missing two weeks in the Russian calendar that are just unnecessary overhead in other contexts. Primitive integers have a single representation in any problem domain: values that fit in registers. These may be used to build other abstractions (e.g. Lisp-style integers that transparently promote to big integers on overflow if overflow is worse than dynamic memory allocation in your particular problem space, as it is almost anywhere outside of a kernel) but the set of useful abstractions that it can build is infinite and that’s why you should avoid putting them in the language. It’s far easier to replace parts of the standard library (or of optional libraries) in specific domains than it is to replace core bits of the language.
On the other hand, the 3rd party
http
crate in Rust is used widely across higher level crates/frameworks. For example thehttp::Request
type is used in hyper, axum, tide. Obviously not 100% universally, just like in Go - see for example fasthttp.Isn’t that the same hand? In ruby RestClient is used in all sorts other 3rd party libs, and has very wide adoption. But that whole way of doing things is an anti-pattern in my experience, because it’s not a true consensus in the way a standard library is – just a general trend of many people using the same thing, which can wax and wane.
I misunderstood you - I thought that in ruby all those gems are completely independent implementations of http protocol and that this is what you are complaining about.
Still, like I mentioned - even with high quality of
net/http
there are still users with needs that are better served with alternative implementations. Moreover in case of other parts of std lib (eg.syscall
ortesting/quick
) it didn’t end nice - deprecated packages are kept indefinitely and can’t be evolved because of being part of std lib.There are 2 main reasons why putting stuff in std lib is a big risk:
Why do you say that?
Having only dynamic dispatch and no intraprocedural flow control in the language made it very difficult to implement efficiently. If you can statically prove that the receiver of an ifTrue message is a Boolean then you can inline it but otherwise you at least need a dynamic type check on every conditional. You can often make that one or two instructions, but one or two instructions (one of which is a conditional branch) on every if statement adds up to a lot of unavoidable overhead. With type specialisation in a trace-based JIT you can do a bit better, but it’s then a lot of complexity in the runtime to get something that most languages have for free (and means that ahead-of-time compilation with good performance is incredibly hard).
It is not too common to pass around booleans directly. Usually, you branch based on some inline predicate like a comparison. So I’m unconvinced this is a major liability. You also pay nothing extra when a branch is taken the way you expect (and you have dynamic instrumentation to tell you what to expect).
Better to have that complexity in the implementation than in the language or in user code. And a performant implementation of any language will have a great deal of complexity.
Counterpoint: ahead-of-time compilation is undesirable anyway because you lose the ability optimise dynamically :)
(AOT compilation also encourages people to do really horrible things like maintain stable binary interfaces and hardware isas.)
Consider this in Smalltalk:
a isSomeProperty ifTrue: [ y doSomething ].
. This will be initially lowered to something roughly equivalent to this C:You be able to do any optimisation, you need to inline
tmp
and so you want to transform it to:Unfortunately, this transform is impossible in Smalltalk without whole-program dataflow analysis. Instead, you end up doing something like:
This adds a lot of conditionals to the code which both impedes later optimisation, hurts instruction cache usage, and hurts performance at the end. I’ve written an AoT compiler for Smalltalk before and this kind of thing was annoying.
That transform is the first step to then being able to do some useful type inference and replace those message-send calls with direct function calls.
Without getting into a philosophical discussion here about whether AoT compilation is desirable, being able to efficiently AoT compile a language is desirable whichever compilation strategy you use. Any time spent in a JIT compiler is time spent not executing the code that is being JIT’d and so it is desirable to be able to quickly get good performance with little dynamic analysis, even if you then choose to get better performance with dynamic analysis.
That seems suspect. You should rather say:
Which adds no overhead when the branch is taken the way you expect, and a single (correctly predicted) branch when it’s taken the other way. You can swap the checks for true and false based on instrumentation.
…but all of this is irrelevant if you are really doing a naive, base-case dispatch for isSomeProperty. If you are, then the dispatch time will completely dominate, making an extra branch irrelevant. If you have specialised the caller appropriately, you may be able to infer that the result is always a boolean.
Perhaps. I think this is a fairly weak argument, though; you might say that, all other things being equal, it is better to have a language that can be efficiently aot’d, but all other things are clearly not equal. Particularly considering the only thing you harm is startup time.
Not true. It adds instruction cache usage and it means that you need more complex inlining heuristics later on.
Also not true, because this is a data-flow property, whereas the former is a control-flow property and so is harder to infer. To determine that a property accessor always returns a boolean, you need to ensure that every store to that ivar is always of a boolean.
Again, not true. You harm the total memory overhead (you need to track more state), you harm instruction cache usage (you need to have more code for handling side exits from the slow path), and so on.
Determining the control flow is a prerequisite for inferring the type of the variable. If you have no idea what code is executed following your message-send, you can not possibly infer anything about the result. And determining what code to execute is, as you say, the hard part. So don’t understand what you are objecting to here.
The section about different number representations reminds me when PHP got support for binary numbers, e.g.
and then I proposed Roman number literals like
as an April Fool’s. Good times :D
Also mostly agreed with the rest, symbols are great, especially in Clojure.
For the string literals, I’m not sure it’s particularly needed if you have HEREDOC syntax or if that is basically interchangeable.
0spqr
idea admirabilis est. ;)I would love symbols in more programming languages, but it’s a little tricky to implement in programming languages without a runti–
Oh, I just figured out how to implement them in something with C’s compilation and linking model. Each symbol is a global pointer to a string with its name. Each symbol also has a linker symbol with its name. Make the linker symbols exported and tell the linker to deduplicate and coalesce them, via Black Magic, and it will fix up all references to them automatically. Bingo, each symbol is represented by a unique integer that is a pointer to a unique string, and all symbols with the same string are interned.
Generating new symbols with names not previously mentioned in the program then still requires dynamic memory and some kind of global runtime, but you need to allocate memory to generate new symbols no matter what so the machinery that creates new symbols can be part of the same lib that provides your memory allocator.
I’d say that in statically typed languages, enums replace the main use cases of symbols.
In my view, symbols are just a special case of a global enum where you can add extra values extremely easily.
At the cost of not having type safety and exhaustiveness checking.
In Next Generation Shell I’ve experimented by adding
and this is staying in the language. It looks good. That’s instead of
Later, since NGS knows about sections, I can potentially add section info to stack traces (also maybe logging and debugging messages). At the moment, it’s just an aesthetic comments and (I think) easily skip-able code section when reading.
SymbolsI’ve decided not to have symbols in NGS. My opinion (I assume not popular) is that all symbols together is one big
enum
, instead of having multipleenum
s which would convey which values are acceptable at each point.I really like the idea of sections. The only downside from the comment version (which I’ve seen used a fair amount) is that the end curly brace doesn’t include the name of the section (I’ve seen C++ programmers end their namespace with
} // end namespace bla
).Another cool thing is that in some languages with first class block support (like Ruby) this can be rather trivially implemented. For instance, I think the following would do the trick in Ruby / Crystal:
Sometimes, oddly, best syntax is no syntax. For example, in OCaml, there is no special multi-line string syntax. Instead, you can just add line breaks inside normal double-quoted strings.
I also find the ability to add your own infix operators a very nice micro-feature. However, no micro-feature comes at zero cost.
Yeah… it feels like single-line strings are fixing something that’s no longer a problem: strings accidentally being unclosed and it causing confusing errors. With syntax highlighting and halfway decent error messages it’s not that important.
BUT, the one case where I’d like a multi-line string syntax is something like:
Where I’d like that literal to actually resolve to
"SELECT * FROM x"
– dropping the leading and trailing empty line and any consistent leading whitespace (with a special case for internal empty lines).Oil has that!
https://www.oilshell.org/blog/2021/09/multiline.html#multi-line-string-literals-and-and
The indentation of the closing quote determines the leading whitespace that’s stripped.
And if there’s an initial newline it’s dropped. But the last newline isn’t dropped, which I think makes sense ? (You can leave it off if you want by putting it on the same line as the last line of text.)
FWIW this pretty much comes from Julia, which has Python-like multi-line strings, except they strip leading whitespace
The multi-line strings are meant to supplant the 2 variants of here docs in shell.
There are unfortunately still 3 kinds multi-line strings to go with 3 kinds of shell strings. But I’m trying to reduce that even further:
https://lobste.rs/s/9ttq0x/matchertext_escape_route_from_language#c_vire9r
https://lobste.rs/s/9ttq0x/matchertext_escape_route_from_language#c_tlubcl
Val has this: https://github.com/val-lang/specification/blob/main/spec.md#string-literals
And I’m quite sure to have seen other languages do this, but can’t recall now.
Java’s new-ish text blocks do this too!
Aha, right. For anyone else curious it’s described in JEP 378: Text Blocks.
Zig does that very well. You write:
and you get the string “SELECT * \nFROM x”. There’s no ambiguity about leading spaces. If you wanted a leading space before SELECT or FROM, you’d just put a space there.
I’m just confused why it uses \\ instead of the more obvious “””.
Yeah, Lua’s insistence on requiring a special syntax for strings just because they contain newlines makes zero sense. A syntax for strings that don’t require escaping backslashes is fine, but there’s no reason to couple that to newline support.
OCaml does actually have a special literal-string syntax. It’s not required for multi-line strings, as you mention, but can be quite useful because you never need to (nor can!) escape anything in it. The delimiters are
{[
and]}
, but also to cover the edge case of strings that contain the closing delimiter, you can put any identifier consisting ofa-z
or_
between the two characters of the opening delimiter, and then the closing delimiter must contain the same identifier.This is wrong. There is no
{[ ]}
syntax in OCaml as of 5.0.0. You may be referring to the{| |}
syntax which is not a special syntax for multi-line strings (since there’s no need for a special syntax, as normal strings already allow newlines) — it’s there to allow unescaped quotes in string literals. The side effect is that you indeed cannot escape anything.For examples of these not mentioned: Erlang has symbols called atoms, VB.NET has date literals (and XML literals).
Erlang also has pattern matching over binaries. I don’t want this as a language feature, but I do want sufficient language features to be able to implement this as a library feature. I hate writing any low-level network or filesystem (or binary file-format) code in non-Erlang languages as a result. This is especially annoying for wire protocols that have fields that are not full bytes and most-significant-bit first. Erlang handles these trivially, C forces you to experience large amounts of pain.
Yes - binary pattern matching is pretty much the biggest thing I miss on anything not BEAM. There’s all sorts of little touches to make it error-free too, like the endianness notation and precise bit alignment.
Why stop there? In Common Lisp reader macros allow you to pretty much do whatever you want! Want a date literal? Cool. Want to mix JSON with sexpressions? No problem! Want literal regex? Yup, can do that too.
I guess you meant s-expressions, cough ;)
yes, of course. Thanks for being pedantic.
I think it was a kebab case joke?
I don’t think there’s a joke in there.
s-expressions
is the way it’s spelled. The hyphen emphasizes “Ess expressions” over “sex pressions”Now who’s being pedantic?
(The article has a whole section on kebab case…)
Now I’m confused! :)
If you were joking, I didn’t pick up on it. I wasn’t trying to be pedantic, just offer the background of why it’s
s-
and not just plains
.Examples of symbols:
:example
example
:example
:example
'example
Clojure also has Symbols, IIRC called keywords (I guess because they are most often used as keys in mapping data structures).
They’re also called keywords because Clojure has a different construct for symbols. (The subtle distinction between when to use a keyword and when to use a symbol is a common “discussion” point with new adopters.)
There was an attempt to add XML literals and XML querying syntax to JavaScript (the E4X syntax), but it never got wide enough support.
Scala has XML literals. I imagine this is the PL equivalent of seeing someone’s old picture from high school.
If you like the Frink date literal, you may also like Elixir’s sigils. It’s used for date, time, regex and more. You can even create your own custom sigils.
Thanks. I like sigils better than Lisp reader macros, because you can parse a sigil using a context free grammar, without executing code from the module that defines the sigil. The ability to parse a program without executing it is valuable in a lot of contexts.
I haven’t actually used them, but KDL’s “slashdash” comments to knock out individual elements are pretty interesting. From The KDL Document Language:
It’s a little more clear with the syntax highlighting on the site.
Clojure supports something similar with its
#_
reader macro, which makes the reader ignore the next form. It’s pretty handy for debugging.Clojure also has a
comment
macro that ignores its body and evaluates tonil
. I rarely use it.Racket (and maybe other Scheme-family?) has this as well with S-expression comments:
#;
Easy to remember since
;
is a line comment.10 <= x < 20
include_str!
macro is super useful as well. Much nicer than Java’s overcomplicatedgetResourceAsStream()
(which is still very nice). Having the ability to include data in an “executable” in a well defined way is great.Have you ever looked at the Rebol language? In terms of PL paradigms, it might be described as “everything is a microfeature”
Agreed. Additionally Rebol has rich literals syntax. An example taken from http://www.rebol.info/ :
In Monte, we matured the
max=
syntax example; it happened to be a proposed syntactic extension for E, and we merely followed through with the proposal. We also expanded other augmented syntax. For example, this high-level syntax……would be equivalent to this less-sugared syntax:
Optional chaining and defaulting operators are my pick.
Awesome read! I love the three classes of features.
I’ve got one to throw in the mix, Nextflow’s
file()
. It just handles remote and local files the same, so the end user can use anything from a file on an FTP server, to s3. It just works and stages the files locally.