On a slightly different but very related topic…
Another interesting viewpoint is the Anders Heijlsberg perspective. When he was talking about TypeScript in a video (don’t have the reference handy), he talks about the existing ECMAscript imports syntax. With their choice, it makes building a completion system in an editor very hard (aka impossible). The problem is that the user specifies the packages/modules last.
When he was talking about TypeScript in a video (don’t have the reference handy), he talks about the existing ECMAscript imports syntax. With their choice, it makes building a completion system in an editor very hard (aka impossible). The problem is that the user specifies the packages/modules last.
Yes, having seen that video it hurts me every time I need to do TypeScript imports. This only shows that language design should be an incremental, feedback-loop-driven approach where the entire development environment is considered. As a positive example: C# Linq queries start with from and end with select to guide IDE completions instead of the reverse because “SQL does it”.
Another tip for the language writers: the compiler should not be a “black box” - instead built as a service that can be queried by the IDE (c.f. Language Server Protocol). This was also mentioned by Anders.
There’s an interesting section on the parser for the CompCert C compiler. I guess they parse twice so that the second parse can be more obviously correct (?)
The C people have called out two big mistakes in the language since almost the start: Making declaration mirror usage (ie, to get an int out of int (*fn[7])(int), evaluate (*fn[i])(42)), and the operator precedence of bitwise stuff.
Haskell is moving even more in this direction. There was the quirk that, while type signatures could be in separate lines, kind signatures (the “types of types”, useful in places like datatype and type family declarations) could not. But there’s a new extension called StandaloneKindSignatures that allows that. For example:
type MyEither :: Type -> Type -> Type
data MyEither a b = MyLeft a | MyRight b
I definitely do not like Haskell (hence “I’m afraid”), but credit is given where credit is due. Using polymorphism in Haskell is much more pleasant than using it in a language that requires you to explicitly spell out every single type variable.
What is the cost in separating function signatures from function argument lists?
I don’t see how separating the signature would change the properties of type inference, neither using a single : would improve anything in this regard.
Another aspect that would be more in line with the content of your article would be how class constraints are defined:
fromIntegral :: (Num b, Integral a) => a -> b:
Here we can see that haskell took kind of both roads at the same time :)
While I agree with the spirit, the conclusion is not the only possible one.
I would rather say: “language designers, use only a single type in your language and you will not have this problem”. For example, your language may have only strings, or only multidimensional arrays of floats.
In terms of readability, one nice feature of this choice is that optional type declarations become much easier to read, since one does not have to guess whether the first bareword is supposed to be an identifier or a type. Just read left-to-right in all cases. For example, in E:
fn x, y { x + y }
and
fn x :Int, y :Int { x + y }
Similarly, in Zephyr ASDL, declarations are backwards, with types coming before identifiers. This is correct, though, by the very same logic, because Zephyr ASDL allows omitting identifiers, but not types! For example:
As someone who’s not used to the language, the comma there had me extremely confused what was being defined. It took me a second, and only after reading your second definition did my brain not parse that as a multiple definition of the variables x and y in some unknown higher scope, which didn’t make sense for the rest of the information given.
If I may be frank, I am unconvinced that this matters at all. Humans are used to inconsistencies in language. We can look at a code fragment containing the words String and username in any order and interpret it correctly.
It stops being easy with bigger types, e.g. if the type is a callback function type.
It still makes a difference for parsers. If the language supports more complex syntax like argument destructuring, or C-like definition-mirrors-usage, then the grammar for Type ident can get messy, while ident: Type starts with the simpler part and has a convenient separator.
You can put the type on its own line, Haskell-style. If it’s so large as to require multiple lines, ident: Type ends up hanging off awkwardly to the right.
This one certainly is cleaner, but you have the same “when does this phrase end” problem elsewhere, in e.g. expressions. You may already be using tools such as precedence, mandatory casing, and parentheses in other areas of the language to disambiguate types, patterns, and identifiers, so you can just re-use them here.
I am not arguing that Type ident is better[1], but it seems every argument in favor of one style can be given an equally particular counterargument in favor of the other. It’s like the eternal tabs/spaces bikeshed, except it doesn’t actually seem to affect most programmers enough for them to have a strong opinion.
[1] I am a little fond of the idea of being able to treat types as values or patterns, with the value Type having the type Type -> Type, obviating the need for type annotations as a separate construct. Obviously, this only makes sense with prefix notation in the first place.
The first point is wholly unconvincing. Consistent vertical spacing is a bugbear overall, but if you most have it, add more spaces. This also stops you from mangling all of your prefix qualifiers into ones that have the same amount of chars. I’m sorry but var and const are more readable than var and val.
I agree; listen to this guy. (Your argument is a bit self-defeating…)
I think it’s good to be receptive to new, truthful information so you can take it into account when making your own decision. For example, this article contains the information that the order iof i: Int syntax has something in common with the order of lambda syntax.
Because why would you care to evaluate arguments and reasoning when you can just trust your gut instinct and save a few minutes now to make things harder for yourself for the time you’re using your own language?
I thought the main advantage of ident: Type was that it’s much easier to parse, but that’s not brought up in this article. Is the parsing argument not actually that important?
It depends – as long as the language is not overloading random symbols like (x) * y to mean both “multiply x with y” and “cast the dereference of y to x” it’s usually fine.
I certainly hope that prefix operators like & and * fall out of favor in the future altogether.
But hey, it’s fAmiLiRaTy!1!! to add random stuff to a language that could have been made to look like a normal method invocation.
But hey, it’s fAmiLiRaTy!1!! to add random stuff to a language that could have been made to look like a normal method invocation.
I don’t think that ‘methods’ should be considered ‘normal’ nor do I like post-fix function call syntax like a.f() because it only makes sense when there’s a single special parameter which always comes first which is a very small subset of all useful functions.
The same thing that happens when multiple libraries define entities with the same name in any programming language: some form of link failure or some form of priority system choosing one over the other.
Ease of parsing should never be a priority of a programming language design. Unless it’s designed to be interpreted on microcontrollers, that’s just not a sensible value proposition. In aggregate, much more developer time will be spent using a language than implementing it, so a bit more work spent on the implementation is worth it.
I disagree in general. Programmers need a quick write-compile-test feedback cycle. A slow parser like C++ or Scala slows down this cycle.
Also, the easier to parse the faster there will be a good ecosystem: linter, syntax highlighting, static analysis, language server. Java is still ahead of C++.
However, for the specific topic of type-identifier-order the impact on speed is neglible.
Standard, simple data structures and data formats that can be easily parsed and emitted with simple code means you can write that code in the language you’re writing your code in. Having to be reliant on some centralised parsing library with a particular API and ABI is a terrible situation to be in. FFI is always second class at best.
Most (mature) languages are implemented in themselves. I see no reason why a reusable parser wouldn’t be.
Even a regular, simple grammar (by programming language standards) is likely to be complex enough that you wouldn’t want to hand-parse it anyway. There are some notable exceptions to this, like lisp and forth.
I kind of agree, but there’s probably a sweet spot for syntactic complexity which is well below what some languages top out at. For example, Cobol and its modern descendant SQL likely have a syntax which is too heavy, in that it’s hard for humans to know at a glance exactly how more complicated expressions will be parsed. It’s also possible to eschew reserved words, and allow programmers to use every possible sequence of characters as a name, but modern languages have backed off from that, too. So while ease of parsing shouldn’t drive design, I think a good designer will end up with a language which doesn’t require nearly as much syntactic complexity as some older languages do.
The C people finally understood this when they made Go.
This thread covers that part of a talk by Rob Pike.
https://news.ycombinator.com/item?id=4705051
On a slightly different but very related topic… Another interesting viewpoint is the Anders Heijlsberg perspective. When he was talking about TypeScript in a video (don’t have the reference handy), he talks about the existing ECMAscript imports syntax. With their choice, it makes building a completion system in an editor very hard (aka impossible). The problem is that the user specifies the packages/modules last.
Yeah I do kinda wish ES had gone with
from 'foo' import {bar};
instead ofimport {bar} from 'foo':
Yes, having seen that video it hurts me every time I need to do TypeScript imports. This only shows that language design should be an incremental, feedback-loop-driven approach where the entire development environment is considered. As a positive example: C# Linq queries start with
from
and end withselect
to guide IDE completions instead of the reverse because “SQL does it”.Another tip for the language writers: the compiler should not be a “black box” - instead built as a service that can be queried by the IDE (c.f. Language Server Protocol). This was also mentioned by Anders.
Some more links about it here:
http://www.oilshell.org/blog/2017/12/15.html#appendix-lexing-the-c-language
it’s variously called:
There’s an interesting section on the parser for the CompCert C compiler. I guess they parse twice so that the second parse can be more obviously correct (?)
Re: CompCert - that is funny because the original C designers wanted the language to be a single pass compiler.
They’ve succeeded — too well. It has to be parsed in a single pass, and can’t be cleanly separated into tokenization and AST-building passes.
I think clang still uses separate steps. See https://en.m.wikipedia.org/wiki/Lexer_hack and especially the references to Eli Bendersky’s site on that page.
The C people have called out two big mistakes in the language since almost the start: Making declaration mirror usage (ie, to get an
int
out ofint (*fn[7])(int)
, evaluate(*fn[i])(42)
), and the operator precedence of bitwise stuff.I’m afraid Haskell wins this one:
Not a fan of Haskell in this regard – I think the split of params from their types comes at a cost, just as fancier type inference.
But yeah, Haskell (or rather Idris with its single
:
) may occupy a different local optimum from the one I described.Haskell is moving even more in this direction. There was the quirk that, while type signatures could be in separate lines, kind signatures (the “types of types”, useful in places like datatype and type family declarations) could not. But there’s a new extension called StandaloneKindSignatures that allows that. For example:
I definitely do not like Haskell (hence “I’m afraid”), but credit is given where credit is due. Using polymorphism in Haskell is much more pleasant than using it in a language that requires you to explicitly spell out every single type variable.
What is the cost in separating function signatures from function argument lists?
I don’t see how separating the signature would change the properties of type inference, neither using a single
:
would improve anything in this regard.Another aspect that would be more in line with the content of your article would be how class constraints are defined:
fromIntegral :: (Num b, Integral a) => a -> b
:Here we can see that haskell took kind of both roads at the same time :)
While I agree with the spirit, the conclusion is not the only possible one.
I would rather say: “language designers, use only a single type in your language and you will not have this problem”. For example, your language may have only strings, or only multidimensional arrays of floats.
Tcl/shell and apl, respectively.
related: https://wiki.c2.com/?StringlyTyped
In terms of readability, one nice feature of this choice is that optional type declarations become much easier to read, since one does not have to guess whether the first bareword is supposed to be an identifier or a type. Just read left-to-right in all cases. For example, in E:
and
Similarly, in Zephyr ASDL, declarations are backwards, with types coming before identifiers. This is correct, though, by the very same logic, because Zephyr ASDL allows omitting identifiers, but not types! For example:
and
As someone who’s not used to the language, the comma there had me extremely confused what was being defined. It took me a second, and only after reading your second definition did my brain not parse that as a multiple definition of the variables x and y in some unknown higher scope, which didn’t make sense for the rest of the information given.
If I may be frank, I am unconvinced that this matters at all. Humans are used to inconsistencies in language. We can look at a code fragment containing the words
String
andusername
in any order and interpret it correctly.It stops being easy with bigger types, e.g. if the type is a callback function type.
It still makes a difference for parsers. If the language supports more complex syntax like argument destructuring, or C-like definition-mirrors-usage, then the grammar for
Type ident
can get messy, whileident: Type
starts with the simpler part and has a convenient separator.ident: Type
ends up hanging off awkwardly to the right.I am not arguing that
Type ident
is better[1], but it seems every argument in favor of one style can be given an equally particular counterargument in favor of the other. It’s like the eternal tabs/spaces bikeshed, except it doesn’t actually seem to affect most programmers enough for them to have a strong opinion.[1] I am a little fond of the idea of being able to treat types as values or patterns, with the value
Type
having the typeType -> Type
, obviating the need for type annotations as a separate construct. Obviously, this only makes sense with prefix notation in the first place.The first point is wholly unconvincing. Consistent vertical spacing is a bugbear overall, but if you most have it, add more spaces. This also stops you from mangling all of your prefix qualifiers into ones that have the same amount of chars. I’m sorry but
var
andconst
are more readable thanvar
andval
.Use whatever you like and don’t listen to people on the internet telling you what to do.
I agree; listen to this guy. (Your argument is a bit self-defeating…)
I think it’s good to be receptive to new, truthful information so you can take it into account when making your own decision. For example, this article contains the information that the order iof
i: Int
syntax has something in common with the order of lambda syntax.Because why would you care to evaluate arguments and reasoning when you can just trust your gut instinct and save a few minutes now to make things harder for yourself for the time you’re using your own language?
why even discuss anything then?
I thought the main advantage of
ident: Type
was that it’s much easier to parse, but that’s not brought up in this article. Is the parsing argument not actually that important?It depends – as long as the language is not overloading random symbols like
(x) * y
to mean both “multiplyx
withy
” and “cast the dereference ofy
tox
” it’s usually fine.I certainly hope that prefix operators like
&
and*
fall out of favor in the future altogether.But hey, it’s fAmiLiRaTy!1!! to add random stuff to a language that could have been made to look like a normal method invocation.
I don’t think that ‘methods’ should be considered ‘normal’ nor do I like post-fix function call syntax like
a.f()
because it only makes sense when there’s a single special parameter which always comes first which is a very small subset of all useful functions.I have never ever experienced this issue. Either one is special, then pick that one. If none are “special”, flip a coin.
There is literally no harm. In the best case, it makes many things more ergonomic (auto completion etc.) and in the worst case it’s doesn’t matter.
Have you tried programming in a language with multiple dispatch and/or open classes? I strongly prefer it (as instantiated by Julia)
.method() calls undermine that paradigm.
There’s the huge harm of having inconsistent syntax in your language for
a.f(b, c)
andf(a, b, c)
, along witha.f(b, c)
being harder to read.Then let’s not have global functions?
That’s debatable.
Global functions are the most basic functionality any language should include.
They are terrible in terms of modularity. See C.
They have nothing to do with modularity.
So what’s going to happen when multiple libraries define the same, global functions?
The same thing that happens when multiple libraries define entities with the same name in any programming language: some form of link failure or some form of priority system choosing one over the other.
That sounds rather silly.
Modules/namespaces exist for what – 40 years – already? Why not use them?
Ease of parsing should never be a priority of a programming language design. Unless it’s designed to be interpreted on microcontrollers, that’s just not a sensible value proposition. In aggregate, much more developer time will be spent using a language than implementing it, so a bit more work spent on the implementation is worth it.
I disagree in general. Programmers need a quick write-compile-test feedback cycle. A slow parser like C++ or Scala slows down this cycle.
Also, the easier to parse the faster there will be a good ecosystem: linter, syntax highlighting, static analysis, language server. Java is still ahead of C++.
However, for the specific topic of type-identifier-order the impact on speed is neglible.
How fast it’s possible to parse is fine to consider. That’s a separate issue from how easy it is to write a parser.
That can be solved easily enough with a centralized parsing library like libclang.
Standard, simple data structures and data formats that can be easily parsed and emitted with simple code means you can write that code in the language you’re writing your code in. Having to be reliant on some centralised parsing library with a particular API and ABI is a terrible situation to be in. FFI is always second class at best.
Most (mature) languages are implemented in themselves. I see no reason why a reusable parser wouldn’t be.
Even a regular, simple grammar (by programming language standards) is likely to be complex enough that you wouldn’t want to hand-parse it anyway. There are some notable exceptions to this, like lisp and forth.
I kind of agree, but there’s probably a sweet spot for syntactic complexity which is well below what some languages top out at. For example, Cobol and its modern descendant SQL likely have a syntax which is too heavy, in that it’s hard for humans to know at a glance exactly how more complicated expressions will be parsed. It’s also possible to eschew reserved words, and allow programmers to use every possible sequence of characters as a name, but modern languages have backed off from that, too. So while ease of parsing shouldn’t drive design, I think a good designer will end up with a language which doesn’t require nearly as much syntactic complexity as some older languages do.
COBOL is basically constructed to prevent expressing complicated expressions.
If SQL has a parent, it is probably PL/1.