I sympathize with wanting a cool language for this, but I also am a little miffed by them not just using Lua. This is literally what Lua is for, and it’s pretty good at it.

The built in support for queries and tables looks really nice! I wonder if there are other languages featuring this. It bears a clear resemblance to PLSQL.

Lil blisters and oozes with a remarkable and innovative collection of Bad Ideas.

JSON can’t represent NaN or Infinities, so Lil can’t represent them in its numbers, either.

If you are doing this, then also get rid of -0 as a special number distinct from 0, but also mysteriously equal to 0. The IEEE -0 value is not an integer, it is not a real number, and the behaviour of various common mathematical operations on this value is entirely arbitrary and impossible to predict based on a knowledge of mathematics. Just keep things simple for the user, I say.

Uniform operator precedence. PEMDAS be damned, I say- expressions evaluate right-to-left unless acted upon by parentheses or brackets.

Good idea. Most new languages think is a good idea to have 20 or so levels of operator precedence, and there’s no way I will ever memorize all these levels. Much better to keep the grammar simple enough that you can learn it.

Absolutely no runtime errors. Except for, y’know, the kind that are caused by interpreter bugs, the host OS, sunspot activity, or a general malaise. Lil has excruciatingly straightforward control flow and operates upon the highly dubious yet daring, even brave, premise that any consistent behavior at runtime is conceivably more useful than a crash.

A mushy, coercion-happy closed type system that generally aims at that lofty and deeply problematic goal of “doing the right thing” and generalizing operations over all reasonable datatypes.

This is the opposite of simple. Based on personal experience using such languages, this is a terrible idea, for several reasons.

If there is no static type checking, then I want to get runtime errors so that I get notified of the exact time and place when my program attempts to do something meaningless. Otherwise code is much harder to debug.

Code is easier to understand if the operations have clear semantics. For example, I prefer that x + y means we are adding numbers, and that + supports the usual identities (x + y can be changed to y + x without changing the value computed, x + 0 == x for any valid argument of +, etc).

I do see one possible reason for making this terrible design decision, which is ease of implementation, so you don’t have to add a lot of error handling code to your interpreter, design good error messages, present errors to the user in a helpful way, etc. In my experience, good error handling adds a non-trivial amount of complexity to a hobby language interpreter.

The IEEE -0 value is not an integer, it is not a real number, and the behaviour of various common mathematical operations on this value is entirely arbitrary and impossible to predict based on a knowledge of mathematics.

Like what? As far as I can tell, -0 behaves equivalently to +0 in basically all cases, most importantly comparisons. There’s a few weird edge cases that can produce -0 instead of +0, but since -0 is equal to +0 you aren’t going to notice unless you do a bitwise compare.

Section 6.3 of the 2008 standard appears to specify all the edge cases quite clearly, and there’s basically three ways to produce -0: rounding a negative with the result being 0, the rounding implicit in the fused-multiply-add operation can produce -0, and sqrt(-0) equals -0. As far as I’ve read, -0 as an input to basically any operation functions identically to +0.

Let’s turn the question around. Suppose you started using a language similar to Python or Javascript, except that there is no difference between +0 and -0. They are the same number: -0 as an input to any numeric operation returns the same result as +0, without any exceptions. In fact, the expressions -0 and +0 both print as the string “0”. This violates the IEEE standard, but would you even notice the difference? Would you care? My theory is that most people don’t know about the magic properties of -0, and don’t care. For an end-user programming language like Lil, or like the new language I’m designing, getting rid of the special magic associated with -0 simplifies the language and removes a footgun.

Let’s answer the question instead. I’m not arguing that anyone wants or needs a sign bit, I’m asking for what nasty edge cases its current implementation introduces.

-0 behaves equivalently to +0 in basically all cases

-0 as an input to basically any operation functions identically to +0

You are using the word “basically” as a weasel word, meaning that -0 is equivalent to +0 most of the time, except when it isn’t. The nasty edge cases all lie in those cases where -0 isn’t equivalent to +0. Those are the cases where your code can go wrong.

People learn the properties of the real numbers in elementary and high school. The problem with the IEEE standard is that it contains two elements, NaN and -0, which are not real numbers, and which violate the axioms of important arithmetic operations. Having learned real numbers in high school, and not being trained in the correct use of floating point numbers, most people will probably just write code as if floats were real numbers, and not think through the consequences of “what if this expression returns NaN” or “what if this expression returns -0” every time they write an arithmetic expression. That creates opportunities for code to go wrong if these values arise. If you eliminate NaN and -0 from a programming language, then these footguns go away.

One of the laws of real arithmetic is the law of trichotomy. Every real number is either negative, zero, or positive.

The NaN and -0 values in IEEE floats violate the law of trichotomy. -0 sometimes represents a small negative number, and sometimes represents zero, it depends on the context. This context is not stored in the number itself, so you may need to track it externally in order for your code to work. As a result, -0 by itself, without this context, cannot be unambiguously be treated as being definitely zero, or definitively a small negative number. Instead it is something different, just its own thing. This creates a problem when defining new numeric operations that aren’t speced by the IEEE standard. What happens when you pass -0 as an input? Often there is no right answer that produces the correct behaviour for all use cases.

For example, how would you implement the sign(x) operator, which returns 0 if the input is 0, -1 if the input is negative, or +1 if the negative is positive? If the input is a real number, then the input obeys the law of trichotomy, and the code is trivial. There are multiple ways to write the code that produce the same results. If the input is -0, then then different ways to write the code (see previous sentence) may produce different results, and these different behaviours may be unexpected if you have internalized the law of trichotomy. No matter what sign(-0) returns, it will be incorrect for some reasonable use cases. Different languages produce different results for sign(-0), and in some cases this might be by accident: maybe the library code is written by somebody who unconsciously assumes that the law of trichotomy is true. Ultimately, the way you work around this problem is by providing multiple different versions of the sign operator and training people on which version to use in different circumstances. This complexity isn’t necessary if you don’t include -0 in your language.

Another issue is the equality operator. In the real number system, the equality operator x==y satisfies the following requirements:

It is an equivalence relation, which means that if x, y, z are real numbers, then:

x==x

x==y implies y==x

x==y and y==z implies x==z

x==y is true if and only if x and y and the same number.

NaN violates the first requirement, -0 violates the second requirement. This matters when writing code, because people internalize these axioms and may subconsciously assume that they are true when reasoning about code.

The new language I’m working is very simple. All data is represented by immutable values. There is a single generic equality operator that satisfies the above requirements and works on all values, and there is code that doesn’t work correctly unless the equality operator satisfies the requirements. If I allow the NaN and -0 values into my language, then I need a second floating point equality operator that satisfies the IEEE requirements. I need to give different names to the two equality operators, and I need to train users on which equality operator to use in which cases. Similar to the problem of needing two sign operators.

Other language designers care less about these issues. They put multiple equality operators in their language (Scheme has =, eq, eql and equal, for example). They provide generic abstractions that fail in various ways when you put -0 or NaN into the abstraction, but so what, the code does what it does, floating point numbers are evil, just deal with it. Eliminating footguns is less important than other issues, such as simplicity of implementation and conformance to the IEEE standard. I’m not saying other people are wrong if their priorities are different than mine, I’m just saying that things don’t have to be this way, there are other ways to design a programming language.

Other language designers care less about these issues. They put multiple equality operators in their language (Scheme has =, eq, eql and equal, for example).

Scheme has multiple equality relations because of mutability (and because floating-point numbers are inexact, but that’s a a whole clusterfuck I don’t want to touch right now), not because of -0 or nan. Take for example common lisp, which has the same equality relations as scheme, but which has no infinities or nans.

-0 sometimes represents a small negative number, and sometimes represents zero, it depends on the context …

This is a highly misleading thing to say, when exactly the same thing applies to +0 (or unsigned 0, if it is the only 0). Toggle the inexact bit if you care.

In what respect am I mistaken about scheme? You said that scheme has many equality operators because of anomalies in ieee 754 (-0, nan). I said no, scheme has many equality operators not because of such anomalies, but because of mutability and inexactitude, and it would continue to have many equality operators even if it did not have -0 or nan. Do you disagree? Which operators do you think could be removed?

The problem I described concerning the interaction between equality operators and IEEE floats is something I learned about by reading the Scheme standard. Here’s an example.

In Chez Scheme,

> (eq? +nan.0 +nan.0)
#t
> (= +nan.0 +nan.0)
#f

and

> (eq? -0.0 0.0)
#f
> (= -0.0 0.0)
#t

The eq? predicate tests if two values are operationally equivalent, and it is also an equivalence relation, so it must consider NaN equal to itself, and it must consider 0.0 and -0.0 to be different numbers. The = predicate implements IEEE semantics, so it cannot be an equivalence relation, and it must have the opposite behaviour. If we were using Posits as our floating point standard, instead of the IEEE thing, then this behavioural difference would go away. So this behavioural difference is not caused by inexactness, it is caused by the IEEE standard and the non-mathematical behaviour of the special entities NaN and -0, which are not real numbers.

Although I am heavily influenced by Scheme, I think there is a lot of complexity that could be eliminated. Why four equality operators? Can’t you get by with just one?

I am working on a simple language that has a single equality operator. Like eq? in Scheme, it tests if two values are operationally equivalent, and it is an equivalence relation. I adopted those specifications from Scheme. My language includes floating point numbers. If I fully implement the IEEE standard, then I need two equality operators, corresponding to eq? and = in Scheme. I don’t want to do that. In order to simplify the number system and bring it closer to the numbers in mathematics, I decided to omit NaN and -0 from the language.

Also, sorry about the strong language in my prior post. I can’t edit the comment, so I deleted it.

Do you have integers in your language? Or more than one size of floating-point number? If so, you must sometimes distinguish between 3 and 3.0, or between 3s0 and 3d0, but will also sometimes want to consider them the same. That is what = and eqv? are for.

Do you have mutability in your language? If so, then you will sometimes find it desirable to check whether two objects have the same identity, and sometimes whether they have the same structure. That is what equal? and eqv? are for.

eq?, it is true, is a performance hack and could have been elided from the start.

I do not find it incoherent to be completely immutable. I do, however, find it somewhat suspect to have only one number type, especially if that type is a floating-point type.

I have only one kind of number in my language. There are integers, but the integers are just a subset of the real numbers. I do not distinguish between 1 and 1.0, they are the same number. There is just a single set of arithmetic rules that apply uniformly to all numbers, instead of having different numeric types that obey different rules. Internally, there is more than one numeric representation, but we don’t expose these different internal representations as distinct data types that the user must choose between when writing numeric code. Non-integral real numbers are represented internally as 64 bit floats. Integers are represented internally either as 64 bit floats or as bignums.

I have mutability in my language. There are mutable local variables, assignment statements, while statements, etc. So you can write imperative code. But I don’t have shared mutable state, because that creates a kind of complexity that I want to eliminate. Thus, I don’t have mutable global variables, nor do I have pointers to mutable objects as first class values (as found in Lisp and Smalltalk influenced languages). There is no predicate that tests identity of mutable objects. All data is represented by immutable values. If you assign a container value (like an array) to a local variable, then you can mutate that local variable, eg by assigning a new value to one of the array elements, and the array will be efficiently updated using copy-on-write. Sometimes this kind of language design is called “value semantics” or “copy semantics”.

One simplification this affords: we don’t need a distinction between mutable container object types and immutable container object types. This distinction will otherwise lead to complexity and weird arbitrary design choices. Eg, in Python, strings are immutable, tuples are immutable, but arrays are mutable. However strings are mutable in Scheme (except for strings derived from string literals, which are immutable).

I agree, Henry Baker is interesting. He says “object identity can be considered to be a rejection of the “relational algebra” view of the world in which two objects can only be distinguished through differing attributes.” By rejecting object identity, my language embraces the ‘relational algebra’ view of the world, and achieves a simpler semantics.

Kahan doesn’t speak for all numeric analysts. The Posit standard doesn’t have a signed zero, and fixes all the problems I described in my extended reply elsewhere in this thread.

I sympathize with wanting a cool language for this, but I also am a little miffed by them not just using Lua. This is literally what Lua is for, and it’s pretty good at it.

The built in support for queries and tables looks really nice! I wonder if there are other languages featuring this. It bears a clear resemblance to PLSQL.

It appears very similar to q/kdb+ in that regard (I think I read that it was inspired by q in fact).

If you are doing this, then also get rid of

`-0`

as a special number distinct from`0`

, but also mysteriously equal to`0`

. The IEEE`-0`

value is not an integer, it is not a real number, and the behaviour of various common mathematical operations on this value is entirely arbitrary and impossible to predict based on a knowledge of mathematics. Just keep things simple for the user, I say.Good idea. Most new languages think is a good idea to have 20 or so levels of operator precedence, and there’s no way I will ever memorize all these levels. Much better to keep the grammar simple enough that you can learn it.

This is the opposite of simple. Based on personal experience using such languages, this is a terrible idea, for several reasons.

`x + y`

means we are adding numbers, and that`+`

supports the usual identities (`x + y`

can be changed to`y + x`

without changing the value computed,`x + 0`

==`x`

for any valid argument of`+`

, etc).I do see one possible reason for making this terrible design decision, which is ease of implementation, so you don’t have to add a lot of error handling code to your interpreter, design good error messages, present errors to the user in a helpful way, etc. In my experience, good error handling adds a non-trivial amount of complexity to a hobby language interpreter.

Like what? As far as I can tell,

`-0`

behaves equivalently to`+0`

in basically all cases, most importantly comparisons. There’s a few weird edge cases that can produce`-0`

instead of`+0`

, but since`-0`

is equal to`+0`

you aren’t going to notice unless you do a bitwise compare.Section 6.3 of the 2008 standard appears to specify all the edge cases quite clearly, and there’s basically three ways to produce

`-0`

: rounding a negative with the result being 0, the rounding implicit in the fused-multiply-add operation can produce`-0`

, and`sqrt(-0)`

equals`-0`

. As far as I’ve read,`-0`

as an input to basically any operation functions identically to`+0`

.The primary edge case I’ve seen cited as a problem is how they in division. x/0 = Inf, x/-0 = -Inf

That’s a good one I didn’t know about, thanks.

Let’s turn the question around. Suppose you started using a language similar to Python or Javascript, except that there is no difference between +0 and -0. They are the same number:

`-0`

as an input to any numeric operation returns the same result as`+0`

, without any exceptions. In fact, the expressions`-0`

and`+0`

both print as the string “0”. This violates the IEEE standard, but would you even notice the difference? Would you care? My theory is that most people don’t know about the magic properties of`-0`

, and don’t care. For an end-user programming language like Lil, or like the new language I’m designing, getting rid of the special magic associated with`-0`

simplifies the language and removes a footgun.Let’s answer the question instead. I’m not arguing that anyone wants or needs a sign bit, I’m asking for what nasty edge cases its current implementation introduces.

You are using the word “basically” as a weasel word, meaning that -0 is equivalent to +0 most of the time, except when it isn’t. The nasty edge cases all lie in those cases where -0 isn’t equivalent to +0. Those are the cases where your code can go wrong.

People learn the properties of the real numbers in elementary and high school. The problem with the IEEE standard is that it contains two elements, NaN and -0, which are not real numbers, and which violate the axioms of important arithmetic operations. Having learned real numbers in high school, and not being trained in the correct use of floating point numbers, most people will probably just write code as if floats were real numbers, and not think through the consequences of “what if this expression returns NaN” or “what if this expression returns -0” every time they write an arithmetic expression. That creates opportunities for code to go wrong if these values arise. If you eliminate NaN and -0 from a programming language, then these footguns go away.

The infinities are also not real numbers, but they cause fewer problems in practice, because there is a sensible way to extend the real number system with +infinity and -infinity in a way that doesn’t break the axioms of real arithmetic.

One of the laws of real arithmetic is the law of trichotomy. Every real number is either negative, zero, or positive.

The NaN and -0 values in IEEE floats violate the law of trichotomy. -0 sometimes represents a small negative number, and sometimes represents zero, it depends on the context. This context is not stored in the number itself, so you may need to track it externally in order for your code to work. As a result, -0 by itself, without this context, cannot be unambiguously be treated as being definitely zero, or definitively a small negative number. Instead it is something different, just its own thing. This creates a problem when defining new numeric operations that aren’t speced by the IEEE standard. What happens when you pass -0 as an input? Often there is no right answer that produces the correct behaviour for all use cases.

For example, how would you implement the

`sign(x)`

operator, which returns 0 if the input is 0, -1 if the input is negative, or +1 if the negative is positive? If the input is a real number, then the input obeys the law of trichotomy, and the code is trivial. There are multiple ways to write the code that produce the same results. If the input is -0, then then different ways to write the code (see previous sentence) may produce different results, and these different behaviours may be unexpected if you have internalized the law of trichotomy. No matter what`sign(-0)`

returns, it will be incorrect for some reasonable use cases. Different languages produce different results for`sign(-0)`

, and in some cases this might be by accident: maybe the library code is written by somebody who unconsciously assumes that the law of trichotomy is true. Ultimately, the way you work around this problem is by providing multiple different versions of the`sign`

operator and training people on which version to use in different circumstances. This complexity isn’t necessary if you don’t include`-0`

in your language.Another issue is the equality operator. In the real number system, the equality operator

`x==y`

satisfies the following requirements:NaN violates the first requirement,

`-0`

violates the second requirement. This matters when writing code, because people internalize these axioms and may subconsciously assume that they are true when reasoning about code.The new language I’m working is very simple. All data is represented by immutable values. There is a single generic equality operator that satisfies the above requirements and works on all values, and there is code that doesn’t work correctly unless the equality operator satisfies the requirements. If I allow the NaN and

`-0`

values into my language, then I need a second floating point equality operator that satisfies the IEEE requirements. I need to give different names to the two equality operators, and I need to train users on which equality operator to use in which cases. Similar to the problem of needing two sign operators.Other language designers care less about these issues. They put multiple equality operators in their language (Scheme has =, eq, eql and equal, for example). They provide generic abstractions that fail in various ways when you put -0 or NaN into the abstraction, but so what, the code does what it does, floating point numbers are evil, just deal with it. Eliminating footguns is less important than other issues, such as simplicity of implementation and conformance to the IEEE standard. I’m not saying other people are wrong if their priorities are different than mine, I’m just saying that things don’t have to be this way, there are other ways to design a programming language.

Scheme has multiple equality relations because of mutability (and because floating-point numbers are inexact, but that’s a a whole clusterfuck I don’t want to touch right now), not because of -0 or nan. Take for example common lisp, which has the same equality relations as scheme, but which has no infinities or nans.

This is a highly misleading thing to say, when exactly the same thing applies to +0 (or unsigned 0, if it is the only 0). Toggle the inexact bit if you care.

[Comment removed by author]

In what respect am I mistaken about scheme? You said that scheme has many equality operators because of anomalies in ieee 754 (-0, nan). I said no, scheme has many equality operators

notbecause of such anomalies, but because of mutability and inexactitude, and it would continue to have many equality operators even if it did not have -0 or nan. Do you disagree? Which operators do you think could be removed?The problem I described concerning the interaction between equality operators and IEEE floats is something I learned about by reading the Scheme standard. Here’s an example.

In Chez Scheme,

and

The

`eq?`

predicate tests if two values are operationally equivalent, and it is also an equivalence relation, so it must consider NaN equal to itself, and it must consider 0.0 and -0.0 to be different numbers. The`=`

predicate implements IEEE semantics, so it cannot be an equivalence relation, and it must have the opposite behaviour. If we were using Posits as our floating point standard, instead of the IEEE thing, then this behavioural difference would go away. So this behavioural difference is not caused by inexactness, it is caused by the IEEE standard and the non-mathematical behaviour of the special entities NaN and -0, which are not real numbers.Although I am heavily influenced by Scheme, I think there is a lot of complexity that could be eliminated. Why four equality operators? Can’t you get by with just one?

I am working on a simple language that has a single equality operator. Like

`eq?`

in Scheme, it tests if two values are operationally equivalent, and it is an equivalence relation. I adopted those specifications from Scheme. My language includes floating point numbers. If I fully implement the IEEE standard, then I need two equality operators, corresponding to`eq?`

and`=`

in Scheme. I don’t want to do that. In order to simplify the number system and bring it closer to the numbers in mathematics, I decided to omit NaN and`-0`

from the language.Also, sorry about the strong language in my prior post. I can’t edit the comment, so I deleted it.

Do you have integers in your language? Or more than one size of floating-point number? If so, you must sometimes distinguish between 3 and 3.0, or between 3s0 and 3d0, but will also sometimes want to consider them the same. That is what = and eqv? are for.

Do you have mutability in your language? If so, then you will sometimes find it desirable to check whether two objects have the same identity, and sometimes whether they have the same structure. That is what equal? and eqv? are for.

eq?, it is true, is a performance hack and could have been elided from the start.

I do not find it incoherent to be completely immutable. I do, however, find it somewhat suspect to have only one number type, especially if that type is a floating-point type.

As usual, henry baker has something interesting to say.

You are asking good questions.

I have only one kind of number in my language. There are integers, but the integers are just a subset of the real numbers. I do not distinguish between

`1`

and`1.0`

, they are the same number. There is just a single set of arithmetic rules that apply uniformly to all numbers, instead of having different numeric types that obey different rules. Internally, there is more than one numeric representation, but we don’t expose these different internal representations as distinct data types that the user must choose between when writing numeric code. Non-integral real numbers are represented internally as 64 bit floats. Integers are represented internally either as 64 bit floats or as bignums.I have mutability in my language. There are mutable local variables, assignment statements, while statements, etc. So you can write imperative code. But I don’t have shared mutable state, because that creates a kind of complexity that I want to eliminate. Thus, I don’t have mutable global variables, nor do I have pointers to mutable objects as first class values (as found in Lisp and Smalltalk influenced languages). There is no predicate that tests identity of mutable objects. All data is represented by immutable values. If you assign a container value (like an array) to a local variable, then you can mutate that local variable, eg by assigning a new value to one of the array elements, and the array will be efficiently updated using copy-on-write. Sometimes this kind of language design is called “value semantics” or “copy semantics”.

One simplification this affords: we don’t need a distinction between mutable container object types and immutable container object types. This distinction will otherwise lead to complexity and weird arbitrary design choices. Eg, in Python, strings are immutable, tuples are immutable, but arrays are mutable. However strings are mutable in Scheme (except for strings derived from string literals, which are immutable).

I agree, Henry Baker is interesting. He says “object identity can be considered to be a rejection of the “relational algebra” view of the world in which two objects can only be distinguished through differing attributes.” By rejecting object identity, my language embraces the ‘relational algebra’ view of the world, and achieves a simpler semantics.

Kahan, ‘Much Ado About Nothing’s Sign Bit’

Augh, my head. XD I

thinkI managed to extract some particles of meaning out of that though, thanks.Kahan doesn’t speak for all numeric analysts. The Posit standard doesn’t have a signed zero, and fixes all the problems I described in my extended reply elsewhere in this thread.