They represent the file or directory that they resolve to relative to the source file they’re in. If you coerce them to a string, that file or directory is “interned” into the nix store and the resulting absolute path is used as the string.
Nushell has this too and it’s a blessing for shell scripts.
It also has file sizes as a type, which can be noted down as 10mb, 0.01gb or whatever you like and converted to bytes using into int. The same thing for Durations would be great for normal languages
No, go does not actually have a proper Duration type. You can do time.sleep(123) expecting seconds, and forget the Duration const, expecting seconds but get nanoseconds, it’s not opaque.
Kotlin is a language with a proper duration type, and you have to do 123.seconds, which will convert the Int to a Duration. But it is not a built-in type, which is what I was pointing towards.
To address two of the most common responses I’ve gotten for this:
Sum types are an “unusual” basis type with arguably the widest (but not universal) adoption.
Lots of people said that pairs were just tuples. I disagree because pairs use a key lookup, while tuples use an index lookup. It’s the difference between ("a" => "b")<a> and ("a", "b")[0].
They don’t though? Python tuples use index lookup because its tuples are really immutable lists. You can’t index a tuple in OCaml, Erlang, or Rust.
Tuples are closer to structs with positional (rather than named) fields (in fact erlang’s “structs” are syntactic sugar over tuples), but in general you can’t “index” a tuple any more than you can “index” a struct (which you can in some languages like Javascript, doesn’t mean struct “use an index lookup”).
Off-topic, but you might want to put your name somewhere on that page :-) I figured it was you based on topic and not remembering seeing anyone else use buttondown.email, although I presume it’s a service anyone could use
[Edit] came up with an on-topic question! In Raku, are hashes really just sets of pairs? So you can have k1→v1 and k1→v2 in the same map?
A hash is a set of pairs where each key occurs once. So it’s a set, with a special condition attached. That is a common mathematical way of thinking about hashes.
Lots of people said that pairs were just tuples. I disagree because pairs use a key lookup, while tuples use an index lookup. It’s the difference between (“a” => “b”) and (“a”, “b”)[0].
In traditional Lisps, lists are built up out of pairs (aka “cons cells”). Association lists are lists with “raw” pairs in them which map keys to values. For example, in Scheme:
(alist-ref 'a '((a . 1) (b . 2))) => 1
The notation with the dot in between two items denotes a pair. Or more generally, an improper list (as in, a sequence of pairs in which the final pair’s cdr doesn’t denote the empty list). For example (a . (b . c)) is a pair which has the symbol a in its car and a pair in its cdr which has b as its car and c as its cdr. May also be written as (a b . c) for brevity.
Clojure and its other modern derivatives dispense with improper lists. There, you can only build lists which end in nil (the empty list). This simplifies a lot of code because it doesn’t need to deal with the weird edge case where a list isn’t a proper list, but it also removes some of the flexibility that pairs offer, like alists. Of course, Clojure has extensive support for maps as a distinct type so it doesn’t need alists. But it feels weird to someone who is used to the traditional Lisps because the pair is such an essential basic building block.
BTW, in Scheme you can use assoc to obtain the entire pair by its key:
(assoc a '((a . 1) (b . 2))) => (a . 1)
So in traditional Lisps, pairs are definitely just tuples. But they are often used as maps.
Lisp atoms are the same thing as symbols. Lua strings do double duty as symbols, being unique and immutable. The other performance benefit of symbols over strings is that they’re compared by pointer equality, which is much faster.
How about complex numbers? They’re a built-in type in C, oddly.
Does CSS count as a language? It has some unusual built in types, like colors and distances.
REBOL has special syntax for URLs and email addresses.
CSS’s unusual types line up with something I’ve been thinking: special syntax for a type makes sense when you expect all programmers in that language to use that type. But “all programmers” is everything from enterprise programmers to embedded programmers, and very few types fit that bill. So you see special syntax more often in DSLs like CSS, where you’ve already filtered for a smaller subset of programmers.
A detail that rarely seems to get mentioned about Rebol is that its syntax and semantics are very similar to Logo. Most people know it as “that turtle language”, but Logo is a quite powerful and expressive functional language as demonstrated in Computer Science Logo Style.
Nitpick: atoms are any type of indivisible thing. This includes symbols of course, but also numbers and even strings. Anything but a pair, essentially. See also the ATOM function which returns whether a given thing is an atom or not.
How about complex numbers? They’re a built-in type in C, oddly.
Lisps also typically have complex number support.
Does CSS count as a language? It has some unusual built in types, like colors and distances.
That reminds me of Kawa, which has support for quantities, which is a pretty awesome idea if you ask me (no clue if it’s really useful in practice). It’s a lot like CSS’ support for units (like 2px vs 3cm vs 4pt etc).
One more for the list, I think: the unevaluated expression. If it’s available it gets used a lot for ad-hoc knowledge representation, especially if that knowledge has tree-like structures. Also gets used for representing desired computations. All the same things YAML gets used for, really.
In Lisp-likes, such as Guile, you use the single quote:
guile> (+ 3 5)
8
guile> '(+ 3 5)
(+ 3 5)
In Prolog, it’s data objects: anything that looks like myfunctor(rg1, ...) is a data object, a tree-like structure of head and arguments that simply reflects the syntax (tree) you typed in, so that you can examine or traverse it.
In R, it’s the formula object. Most famously when telling lm, the linear modeling function, what the model you want to fit looks like. Much like in Prolog, the language looks like a list of a functor and its args, so the receiver (most often a modeling function of some kind) can examine what sort of expression the caller constructed.
> myformula <- y ~ x1 + x2
y ~ x1 + x2
> as.list(myformula)
[[1]]
`~`
[[2]]
y
[[3]]
x1 + x2
> Expression<Func<int, int, int>> e = (x, y) => x + y;
> from p in e.Parameters select $"{p.Type} {p.Name}"
System.Int32 x
System.Int32 y
> e.Body.NodeType
Add
> e.Compile()(2, 3)
5
The intended use is to compile map/filter/etc calls to SQL queries.
I don’t think this counts because I’m not sure if you can stick it in a normal variable, but Scala has by-name parameters. This allows you to pass an unevaluated block to a function.
It’s in the same spirit, certainly! R, too, lets functions choose whether to take a parameter as an evaluated value or as unevaluated code. Mainly used to let users type names to be evaluated in a data frame’s context. That you can type names without quotes is a bonus.
Also, your blog post is just very very cool. The more unusual basis types have such a big effect on What kinds of knowledge does this language (not) allow you to easily create and manipulate as a value at runtime?
They left out functions! Shaaaaaaaame. :-P I won’t get on their case for other first-class control thingies like coroutines though.
Surprised pointers/references aren’t on the list. Maybe the trend of making everything invisibly pass-by-reference makes most programmers not think about them, but I think we’d probably be better off if they were first-class objects more of the time. Maybe not for scripting languages, but lots of the fiddly edges of mid-level languages like C# seem to come from wanting to pretend pointers don’t exist. (If your lang is Really Immutable like Erlang or Haskell, then pretending pointers don’t exist works a lot better.)
Still waiting for a programming language with first-class patterns for pattern matching.
Bottom values like nil are often a sort of weird edge-case, especially in some languages like Lua where they’re treated not as an object, but as as the absence of an object.
That’s the main ones I can think of right now. Hmmm, I expected the list to be longer.
It’s not super-clear but I mentioned functions and pointers as domain-specific universal basis types:
Some domains have additional basis types: systems languages have pointers, dynamic languages have key-value mappings, FP languages have first-class functions, etc.
You should go look at basis types you can find in theorem proving languges like Coq. Nothing like trying to decipher what propositional universe you’re in
Yes, but the Bool type is not - by itself - special. It’s just got a special annotation on it that tells the compiler “hey, here’s the thing you need to work with when lowering if-then-else syntax”. You could define your own Bool and give it the same annotation and (ignoring the naming conflict the type in the standard library) it would work just fine with if-then-else.
I think this special annotation is part of the GHC implementation of Haskell, and not part of the Haskell language standard. If you just read the language standard, then Bool is special.
R definitely has data frames - it’s based on AT&T S which pioneered data frames. The project started in 1995. It was probably the very first open source implementation of data frames (which differ from arrays in that the columns are heterogeneous)
R has the best data frame API there is — tidyverse. Example code here:
Julia, python, and others are still trying to catch up with R after more than a decade. This is widely acknowledged by the people doing all that hard and deep work.
R has less awareness among software engineers because its main users are people doing statistical computing, often at universities
Ok I guess this means I really need to publish a draft blog post with the ggplot2 plotting library and xargs. That’s the first open source implementation of Grammar of graphics and it’s still head and shoulders above basically any other plotting library.
I had this impression that Forth had some interesting unusual basis types, but thinking through it, none of the things that are odd in Forth take the form of values in the sense we would understand in other programming languages. Which isn’t terribly surprising since it’s an untyped language without functions and where variables are a thing added in the standard library.
I’m a little disappointed that sets, when finally mentioned, are attributed to Python when they were in fact pioneered by Wirth’s languages 40 years or so earlier and have enormous significance in systems programming.
Part of my reasoning here is that sets, at least of limited size, can be accessed efficiently and used for efficient bitwise operations (e.g. bitboards in games). However this appears to escape many authors, who are so intent on generalisation that they are prepared to implement a set as a linked list: something which is spectacularly unsuited to a sparse unordered structure.
In fact I’m tempted to suggest that “set of X”, “array of X” and possibly “bag of X” are fundamental types at least as important as integer, boolean and so on; and that these are in part defined by the fact that certain operations may be applied /efficiently/ to them irrespective of the type X at least up to some defined size (machine word size for a set, cache page size for an array and so on).
I bet there’s a lot of unusual stuff in Mathematica, but I’m not paying $400 just to check
Mathematica has many uncommon “basis types”, for example rules and and patterns are a first class citizen and you can also copy-paste images and graphics around as expressions so I think those count too?
P.S.: Actually there is wolframcloud.com that is free for trying out Mathematica
Arrays can have any number of dimensions, though I find more than 2 dimensions awkward. If I have 3 or more I’m usually better off with a list of 2D arrays (via boxing).
oh no
But 2D arrays are quite useful! Which is why the most popular 2D array by far is the dataframe.
The EXPRESS modelling language has bags, and arguably, SQL has as well (when you don’t do an ORDER BY, that is).
NULL in SQL is quite unique in that very few languages have the same semantics on NULL. Many think of it as an “not there” value (and it is often used that way), but I think it’s better explained as a value “we don’t know” yet. In a sense, UNKNOWN would be a better name, as select (null = 1) is null returning true sounds quite confusing to people, whereas select (unknown = 1) is unknown is probably more understandable. Arguably, the absence of a value and a missing/unknown one are two different things, so we should have both.
Also, Postgres at least has both literal rows and literal tables. I don’t know whether or not that is in conformance to the standard or a local innovation.
In some Prolog systems you can attach metadata to the (logical) variables, some constraint systems are implemented this way. This metadata is completely transparent, and only shows up if you request it explicitely. I don’t know if it counts as a data type though!
I think elixir/erlang terms might count. They’re a weird amalgamation of lists and symbols and maybe pairs as defined here, but their implementation and use makes them feel like more.
Isn’t “term” just the erlang word for an erlang value? It’s not an actual thing, the actual things are numbers (specifically integers and floats), atoms (~symbols), bitstrings, functions, pids, lists, maps, tuples, and a pair of oddities like ports and refs.
No booleans (boolean operators work on special atoms), no structs (records desugar to tuples), no strings (in common fp fashion string literals yield lists of integers).
I ‘spose. The way keyword lists compose in elixir didn’t initially seem covered by any of the semantics of those quite right, but I’m not convinced of this anymore.
I’ll confess to not knowing much about elixir so I’ve no idea what keyword lists are.
In erlang I know alists / plists which are common ways to encode associations (maps), and iolists which are ad-hoc rope-ish structures for cheap(ish) outpout composition
I think keyword lists are what elixir calls plists, there’s a syntactic sugar for using them to call functions with named parameters. Thing is it’s exactly that, sugar, not really a basis type. Still works in a way I’ve never really seen anywhere else though.
I’m really confused about exactly what the topic of this post and thread is. Any topic is fine with me! But I’ve been dying to know - are we talking about syntax, implementation, a combination of both, or what?
The orignal post says:
These types are always supported with dedicated syntax, as opposed to being built out of the language abstractions.
I’m confused about what “as opposed to” means there, and it affects whether Elixir plists count or not.
It’s been a while since I used Elixir, but keyword lists are basically sugar around turning [{:key, value}, {:key, value}] into [key: value, key: value]. It’s a common pattern in OTP, so Elixir tries to make it easy to express.
Another: Nix has paths as a basis type.
They represent the file or directory that they resolve to relative to the source file they’re in. If you coerce them to a string, that file or directory is “interned” into the nix store and the resulting absolute path is used as the string.
Nushell has this too and it’s a blessing for shell scripts.
It also has file sizes as a type, which can be noted down as
10mb,0.01gbor whatever you like and converted to bytes using into int. The same thing for Durations would be great for normal languagesGo has that, actually. The durations are defined like so:
and outside the time package you can do
5 * time.Secondto denote 5 seconds, for example.No, go does not actually have a proper Duration type. You can do
time.sleep(123)expecting seconds, and forget the Duration const, expecting seconds but get nanoseconds, it’s not opaque.Kotlin is a language with a proper duration type, and you have to do
123.seconds, which will convert the Int to a Duration. But it is not a built-in type, which is what I was pointing towards.To address two of the most common responses I’ve gotten for this:
("a" => "b")<a>and("a", "b")[0].They don’t though? Python tuples use index lookup because its tuples are really immutable lists. You can’t index a tuple in OCaml, Erlang, or Rust.
Tuples are closer to structs with positional (rather than named) fields (in fact erlang’s “structs” are syntactic sugar over tuples), but in general you can’t “index” a tuple any more than you can “index” a struct (which you can in some languages like Javascript, doesn’t mean struct “use an index lookup”).
Off-topic, but you might want to put your name somewhere on that page :-) I figured it was you based on topic and not remembering seeing anyone else use buttondown.email, although I presume it’s a service anyone could use
[Edit] came up with an on-topic question! In Raku, are hashes really just sets of pairs? So you can have k1→v1 and k1→v2 in the same map?
A hash is a set of pairs where each key occurs once. So it’s a set, with a special condition attached. That is a common mathematical way of thinking about hashes.
I would assume pairs have bespoke behaviour to ignore the identity of the second parameter, such that it’s not required to be hashable or equatable.
In traditional Lisps, lists are built up out of pairs (aka “cons cells”). Association lists are lists with “raw” pairs in them which map keys to values. For example, in Scheme:
The notation with the dot in between two items denotes a pair. Or more generally, an improper list (as in, a sequence of pairs in which the final pair’s
cdrdoesn’t denote the empty list). For example(a . (b . c))is a pair which has the symbolain itscarand a pair in itscdrwhich hasbas itscarandcas itscdr. May also be written as(a b . c)for brevity.Clojure and its other modern derivatives dispense with improper lists. There, you can only build lists which end in
nil(the empty list). This simplifies a lot of code because it doesn’t need to deal with the weird edge case where a list isn’t a proper list, but it also removes some of the flexibility that pairs offer, like alists. Of course, Clojure has extensive support for maps as a distinct type so it doesn’t need alists. But it feels weird to someone who is used to the traditional Lisps because the pair is such an essential basic building block.BTW, in Scheme you can use
assocto obtain the entire pair by its key:So in traditional Lisps, pairs are definitely just tuples. But they are often used as maps.
CSS’s unusual types line up with something I’ve been thinking: special syntax for a type makes sense when you expect all programmers in that language to use that type. But “all programmers” is everything from enterprise programmers to embedded programmers, and very few types fit that bill. So you see special syntax more often in DSLs like CSS, where you’ve already filtered for a smaller subset of programmers.
I want to hear more about REBOL!
Rebol’s literal types include:
10x20foo@example.com<img src="example.jpg">The reference card is here.
A detail that rarely seems to get mentioned about Rebol is that its syntax and semantics are very similar to Logo. Most people know it as “that turtle language”, but Logo is a quite powerful and expressive functional language as demonstrated in Computer Science Logo Style.
Nitpick: atoms are any type of indivisible thing. This includes symbols of course, but also numbers and even strings. Anything but a pair, essentially. See also the ATOM function which returns whether a given thing is an atom or not.
Lisps also typically have complex number support.
That reminds me of Kawa, which has support for quantities, which is a pretty awesome idea if you ask me (no clue if it’s really useful in practice). It’s a lot like CSS’ support for units (like
2pxvs3cmvs4ptetc).Sorry for being a smug Lisp weenie :)
I find the
never(!,⊥) type an interesting addition.One more for the list, I think: the unevaluated expression. If it’s available it gets used a lot for ad-hoc knowledge representation, especially if that knowledge has tree-like structures. Also gets used for representing desired computations. All the same things YAML gets used for, really.
In Lisp-likes, such as Guile, you use the single quote:
In Prolog, it’s data objects: anything that looks like
myfunctor(rg1, ...)is a data object, a tree-like structure of head and arguments that simply reflects the syntax (tree) you typed in, so that you can examine or traverse it.In R, it’s the formula object. Most famously when telling
lm, the linear modeling function, what the model you want to fit looks like. Much like in Prolog, the language looks like a list of a functor and its args, so the receiver (most often a modeling function of some kind) can examine what sort of expression the caller constructed.C# has one of these too.
The intended use is to compile map/filter/etc calls to SQL queries.
I don’t think this counts because I’m not sure if you can stick it in a normal variable, but Scala has by-name parameters. This allows you to pass an unevaluated block to a function.
It’s in the same spirit, certainly! R, too, lets functions choose whether to take a parameter as an evaluated value or as unevaluated code. Mainly used to let users type names to be evaluated in a data frame’s context. That you can type names without quotes is a bonus.
Also, your blog post is just very very cool. The more unusual basis types have such a big effect on What kinds of knowledge does this language (not) allow you to easily create and manipulate as a value at runtime?
My own additions:
nilare often a sort of weird edge-case, especially in some languages like Lua where they’re treated not as an object, but as as the absence of an object.That’s the main ones I can think of right now. Hmmm, I expected the list to be longer.
Do active patterns in F# count?
It’s not super-clear but I mentioned functions and pointers as domain-specific universal basis types:
Kinda regret not writing about SNOBOL, the first language with a regex type, and arguably the most elegant of them all.
In a “JUMP IF MATCHED” kind of way; I don’t regard SNOBOL as a very elegant language though the pattern thing is cool and original.
Python has a couple of obscure ones to do with slicing.
One is slice objects, which you normally construct implicitly by using slicing syntax.
a[1:25:2]is the same asa.__getitem__(slice(1,25,3))The other is Ellipsis, also spelled as
.... I don’t think anything in the standard library uses it? But https://docs.python.org/3/library/constants.html#Ellipsis indicates that slicing for custom types is the intended use.You should go look at basis types you can find in theorem proving languges like Coq. Nothing like trying to decipher what propositional universe you’re in
Please correct me if I’m wrong, but I don’t believe Booleans in Haskell are supported with special syntax at all. They’re simply defined as a type with two constructors,
TrueandFalse. https://hackage.haskell.org/package/ghc-prim-0.10.0/docs/src/GHC.Types.html#BoolIn Haskell, the
if then elseconditional expression is special syntax.Yes, but the
Booltype is not - by itself - special. It’s just got a special annotation on it that tells the compiler “hey, here’s the thing you need to work with when lowering if-then-else syntax”. You could define your ownBooland give it the same annotation and (ignoring the naming conflict the type in the standard library) it would work just fine with if-then-else.I think this special annotation is part of the GHC implementation of Haskell, and not part of the Haskell language standard. If you just read the language standard, then Bool is special.
One could argue the same about lists, but they too are just an algebraic data type.
R definitely has data frames - it’s based on AT&T S which pioneered data frames. The project started in 1995. It was probably the very first open source implementation of data frames (which differ from arrays in that the columns are heterogeneous)
R has the best data frame API there is — tidyverse. Example code here:
What Is a Data Frame? (In Python, R, and SQL) -https://www.oilshell.org/blog/2018/11/30.html
Julia, python, and others are still trying to catch up with R after more than a decade. This is widely acknowledged by the people doing all that hard and deep work.
R has less awareness among software engineers because its main users are people doing statistical computing, often at universities
Ok I guess this means I really need to publish a draft blog post with the ggplot2 plotting library and xargs. That’s the first open source implementation of Grammar of graphics and it’s still head and shoulders above basically any other plotting library.
—
Pairs in Raku are interesting! Didn’t know that
I had this impression that Forth had some interesting unusual basis types, but thinking through it, none of the things that are odd in Forth take the form of values in the sense we would understand in other programming languages. Which isn’t terribly surprising since it’s an untyped language without functions and where variables are a thing added in the standard library.
I’m a little disappointed that sets, when finally mentioned, are attributed to Python when they were in fact pioneered by Wirth’s languages 40 years or so earlier and have enormous significance in systems programming.
Part of my reasoning here is that sets, at least of limited size, can be accessed efficiently and used for efficient bitwise operations (e.g. bitboards in games). However this appears to escape many authors, who are so intent on generalisation that they are prepared to implement a set as a linked list: something which is spectacularly unsuited to a sparse unordered structure.
In fact I’m tempted to suggest that “set of X”, “array of X” and possibly “bag of X” are fundamental types at least as important as integer, boolean and so on; and that these are in part defined by the fact that certain operations may be applied /efficiently/ to them irrespective of the type X at least up to some defined size (machine word size for a set, cache page size for an array and so on).
You did not read the article correctly, they are not attributed to Python or anything else. It says “seen in Python”.
Also, sets appeared in programming after they appeared in computer science from set theory, a fundamental branch of logic.
Mathematica has many uncommon “basis types”, for example rules and and patterns are a first class citizen and you can also copy-paste images and graphics around as expressions so I think those count too?
P.S.: Actually there is wolframcloud.com that is free for trying out Mathematica
oh no
oh NO
The EXPRESS modelling language has bags, and arguably, SQL has as well (when you don’t do an ORDER BY, that is).
NULL in SQL is quite unique in that very few languages have the same semantics on NULL. Many think of it as an “not there” value (and it is often used that way), but I think it’s better explained as a value “we don’t know” yet. In a sense, UNKNOWN would be a better name, as
select (null = 1) is nullreturning true sounds quite confusing to people, whereasselect (unknown = 1) is unknownis probably more understandable. Arguably, the absence of a value and a missing/unknown one are two different things, so we should have both.Also, Postgres at least has both literal rows and literal tables. I don’t know whether or not that is in conformance to the standard or a local innovation.
Queries in C#?
In some Prolog systems you can attach metadata to the (logical) variables, some constraint systems are implemented this way. This metadata is completely transparent, and only shows up if you request it explicitely. I don’t know if it counts as a data type though!
I think elixir/erlang terms might count. They’re a weird amalgamation of lists and symbols and maybe pairs as defined here, but their implementation and use makes them feel like more.
Isn’t “term” just the erlang word for an erlang value? It’s not an actual thing, the actual things are numbers (specifically integers and floats), atoms (~symbols), bitstrings, functions, pids, lists, maps, tuples, and a pair of oddities like ports and refs.
No booleans (boolean operators work on special atoms), no structs (records desugar to tuples), no strings (in common fp fashion string literals yield lists of integers).
I ‘spose. The way keyword lists compose in elixir didn’t initially seem covered by any of the semantics of those quite right, but I’m not convinced of this anymore.
I’ll confess to not knowing much about elixir so I’ve no idea what keyword lists are.
In erlang I know alists / plists which are common ways to encode associations (maps), and iolists which are ad-hoc rope-ish structures for cheap(ish) outpout composition
I think keyword lists are what elixir calls plists, there’s a syntactic sugar for using them to call functions with named parameters. Thing is it’s exactly that, sugar, not really a basis type. Still works in a way I’ve never really seen anywhere else though.
I’m really confused about exactly what the topic of this post and thread is. Any topic is fine with me! But I’ve been dying to know - are we talking about syntax, implementation, a combination of both, or what?
The orignal post says:
I’m confused about what “as opposed to” means there, and it affects whether Elixir plists count or not.
It’s been a while since I used Elixir, but keyword lists are basically sugar around turning
[{:key, value}, {:key, value}]into[key: value, key: value]. It’s a common pattern in OTP, so Elixir tries to make it easy to express.So that’s just syntactic sugar for an alist, which is not a basis type of erlang (it’s just a list of 2-uples).