1. 12
    1. 18

      Most successful programming languages are created by practical implementation-oriented developers, and not by theoretically-minded programming language designers.

      Creating a programming language practical enough to see widespread adoption is a huge amount of work. There is a ton of coding involved. So the people who create these languages really like coding. They probably like coding a lot more than they enjoy writing documentation and creating a comprehensive programming language specification.

      It is possible to create a comprehensive programming language specification, and it has been done, but it is a ton of work, and to enjoy that work and get it done, you have to make it your main focus, and you probably prefer theory to coding.

      But these are almost side issues. It’s difficult to create a comprehensive programming language specification that guarantees that a program has the same results in all implementations.

      • You can design the language to have bullet-proof abstractions that don’t leak implementation details, which allows these implementation details to vary without breaking programs. Theoreticians know how to do this, but pragmatic “worse is better” coder types believe it is impossible (they will cite Spolsky’s Law of Leaky Abstractions) and they also believe it is undesirable, because the cost of bullet-proof abstractions is less expressive power and being unable to program at something close to the bare metal.
      • Or you can have leaky abstractions like Python’s is operator, and then hard code all of your implementation details in the specification. For example, in Python 2.7.17, 2**8 is 2**8 = True, 2**9 is 2**9 = False, 10**2 == 10**2 = True. Why I don’t know. The specification would have to specify all this.

      The “leaky abstraction” style of language specification (for something like Python) would be such a nightmare to read, write and maintain that it would never happen. If you were committed to having a comprehensive language spec, you would redesign the language to simplify the spec, and this would lead to a simpler, more abstract and more predictable language.

      I would actually prefer to use a simple, abstract and predictable language, with a comprehensive spec, where it is feasible to learn the entire language and be able to predict the behaviour of programs from my knowledge of the language.

      1. 6

        The only language I’m aware of that fully meets this criterion is actually Standard ML. But Standard ML implementations still differ from each other, not in what this language construct will do in what situation or on what hardware, but in terms of packages that are available or unavailable. And also in things that the language specification does not discuss, like how the build system should work. And also, now that I think of it, in terms of other concerns like, does it support separate compilation, and how good or expensive are the compiler optimizations.

        Still, there is such a thing as formal semantics, and for some reason most programming language authors are not very enthusiastic about using it, probably for the reasons you gave.

        1. 6

          As someone who uses and enjoys using SML, what hurts it the most is the lack of a standardized C FFI.

          Your SML programs are (mostly) portable across implementations… provided you do not need to use C libraries.

          1. 1

            How much work is it to set up to build with different implementations? My sense is that Moscow ML wants to act like a C compiler but SML/NJ wants you to write a build configuration and find your sources somehow. But I never really got the latter to work.

            1. 3

              I’ve gotten smlnj cm files to work. Here’s a small project for reference

              sr.ht/~thon/thon

              But I haven’t tried getting it to compile across other implementations.

              1. 2

                Thank you!

            2. 2

              It is a lot of work. I don’t think there is even a portable way to query (within SML code) whether you are using this or that compiler. Effectively, you have to use makefiles or something like that.

      2. 5

        I would actually prefer to use a simple, abstract and predictable language, with a comprehensive spec, where it is feasible to learn the entire language and be able to predict the behaviour of programs from my knowledge of the language.

        Scheme seems like the obvious option?

        1. 4

          And then you find that that assumption, “I am able to predict the behaviour of programs from my knowledge of the language [specification]” isn’t actually true, in practice, for such a language either, because that approach doesn’t scale. For any serious application you are so many layers removed from the spec that you need to reason about what your functions are supposed to do. Not to mention about what your database is expected to contain, what an API is supposed to return.

          The number of bugs solved by “applying the spec” (tracing it to language semantics) is vanishingly small.

        2. 2

          I like Scheme, but it doesn’t have a “comprehensive spec” in the sense of this article. There’s a ton of implementation defined behaviour, including whether two integers are equal in the sense of eq?, which parallels the case of comparing integers using is in Python. For me, the many different Scheme equality operators are a language design anti-pattern, with their implementation defined behaviour (whatever the code happens to do in this release of the compiler or in this run of the program, those are the semantics). The hobby language I’m designing is a mostly functional language with immutable values and a single equality operator. Two values are equal if and only if they have the same printed representation. That is a far, far simpler definition of equality than is found in any Lisp or object-oriented language.

          1. 2

            Two values are equal if and only if they have the same printed representation.

            Does every value have a printed representation?

          2. 2

            How do you deal with integer and floating point comparisons? Is integer 0 the same as floating 0? (in IEEE 754 they’re bit-for-bit the same) What about integer 1 and floating 1? (in IEEE 754, bit-for-bit different)

            1. 2

              There is a generic numeric type which contains all of the representable integers and real numbers. The numerals 1 and 1.0 denote the same number. There are generic arithmetic operations which take one or more numeric arguments, and return the closest representable number to the correct result. The IEEE 754 standard has two special values, NaN and -0, which have weird equality semantics, and which are not real numbers. These are not included in the number system. Instead, 0/0 raises an exception and -0 is 0.

      3. 5

        For example, in Python 2.7.17, 2**8 is 2**8 = True, 2**9 is 2**9 = False, 10**2 == 10**2 = True. Why I don’t know. The specification would have to specify all this.

        This exact case is because the CPython implementation – and other languages do this too! – caches small integer values as a performance optimization. Every integer from -5 to 256 (inclusive) is cached, which is why it works for 2**8 – no matter how many times you write expressions which resolve to 256, the resulting int values are the same in-memory object because it’s pulled from the cache.

        There are also more complex cases where expressions can “surprisingly” return True with the is operator, which usually boil down to the CPython bytecode compiler doing constant folding and realizing it doesn’t need to create two separate in-memory objects for multiple expressions that resolve to the same constant value.

        But I don’t personally think this is a great example of specification failure. The is operator is behaving exactly as it’s defined to by the language specification: the language specification says that x is y is True if and only if id(x) == id(y). The specification of id() also says that id() is required to return “an integer which is guaranteed to be unique and constant for this object during its lifetime”. Returning the memory address of the object, which is what CPython chooses to do to implement id(), is thus permissible, and the documentation calls out that this is an implementation detail of CPython, and not a requirement for all Python implementations.

        The language specification also does not require or forbid implementation-specific optimizations like the small integer cache, or the bytecode compiler’s constant folding (in fact, the language specification does not require a bytecode compiler to exist at all). Unless your expectation is that a “comprehensive” spec would have to lay out every conceivable case in which two Python names might be bound to the same in-memory value, which for a language like Python would require either an infinitely-long spec (since it’s possible that any type might have some kind of caching or other optimization), or a spec that says any type might do this and avoids naming specific examples. But even then it wouldn’t belong in the language specification, since the use of memory address as the return value of id() is an implementation detail.

      4. 3

        I would actually prefer to use a simple, abstract and predictable language, with a comprehensive spec, where it is feasible to learn the entire language and be able to predict the behaviour of programs from my knowledge of the language.

        Be careful what you wish for https://lobste.rs/s/ylzyde/ecmascript_spec_is_ready_use_interpreter :D

      5. [Comment removed by author]

    2. 2

      I would like to see more executable specifications of languages in this form:

      • Strongly typed with algebraic data types
      • No regard for memory management – this implies GC, and leaves out plain C, C++ and Rust.
        • Also: strings must be values like in Python, not buffers as in C / C++ / Rust. (I think OCaml used to break this rule, but no longer does.)
      • Error messages are the simplest possible or “factored out” (i.e. throw an exception and then recover error state at the top level).
      • The simplest tree interpreter possible
        • This is equivalent to big step operational semantics, as far as I understand
        • There’s also the option of small step operational semantics, but to be honest I don’t see any real benefit. It seems harder to execute.
        • Rules out bytecode interpreters and compilers
      • No regard for performance otherwise

      From my experience with https://www.oilshell.org/, this is a pretty compact way to represent a language.

      Once you leave out those concerns, the implementation becomes very short.

      For example, a Rust, Go, or Zig specification in this form would be pretty interesting IMO.

      1. 2

        note for myself: “no regard for performance otherwise” includes “no threads”

    3. 2

      Other people’s mileage might vary, but I do not need “comprehensively defined” behaviors for nonsensical operations. For example, there is no useful way to define the behavior of using an invalid array index.

      What I need is an easy way to detect that I am doing something nonsensical, so that I can fix it.

      1. 3

        For example, there is no useful way to define the behavior of using an invalid array index.

        In Python, the behavior is defined: it throws an IndexError.

        In Java, the behavior is defined: it throws an ArrayIndexOutOfBoundsException.

        In C#, the behavior is defined: it throws an IndexOutOfRangeException.

        etc.

        1. 1

          The key word is “useful”. Throwing an exception instead of silently manipulating memory wrong might make your program less dangerous, but it does not make it less wrong.

          1. 4

            “Program throws an exception” is not a synonym for “program is incorrect”.

            Going back to Python as an example, several protocols in the language itself use exceptions as a signaling mechanism. Most notably iteration – Python’s own built-in iteration constructs, like the for loop, determine when to stop iterating by catching a StopIteration exception from the iterable.

            So what exactly is meant by throwing an exception is thus a language-by-language thing, and no general language-agostic statements can be made about exceptions.

            Which is why we always get long tangents in threads where, say, a person who really only knows C++ confidentaly declares that exceptions are always bad and always wrong and must always be avoided at literally any cost… and then a dozen people have to chime in in and explain that this view is not universal and plenty of languages have different approaches to exceptions.

            1. 4

              “Program throws an exception” is not a synonym for “program is incorrect”.

              This is true, but ‘Program throws an exception’ and ‘program is incorrect’ intersect. Program throws an exception is a behaviour that means that the program is correct if the author wrote an exception handler that correctly handles that condition.

              There’s been some really interesting work over the last 20 years on exposing index sets into the type system such that indexes are always defined via set-theoretic operations on an index set. This can guarantee, by construction, that an index is in bounds. A lot of it was originally motivated by the cost of Java bounds checks, but it’s useful for both performance and correctness. I’d love to see that kind of thing make it into mainstream languages.

            2. 1

              Mmm… I think it is fair to say that, if your write your program in such a way that you expect to evaluate arr[idx], where idx is not a valid index for arr… then you are writing unclear code.

              1. 2

                Again, your assertion is not portable across languages.

                See section 6.10.2 of the Python Language Reference for an easy counterexample: due to the multiple ways user-provided classes may overload operators, there’s a precedence of mechanisms the in and not in operators in Python will use. If the class defines __contains__(), then that is used directly, falling back to __iter__(), and then finally falling back to:

                if a class defines __getitem__(), x in y is True if and only if there is a non-negative integer index i such that x is y[i] or x == y[i], and no lower integer index raises the IndexError exception. (If any other exception is raised, it is as if in raised that exception).

                Please, before you make another assertion about what exceptions or particular instances of them mean, reconsider whether what you’re saying is truly a general statement, or one tied to specific languages/paradigms you’re used to.

                1. 1

                  Well, perhaps my experience is too limited, but I have never seen any practical situation where you have an array of, say, 25 elements, and you actually want to fetch the 30th. The natural reaction, if you ask any normal person, is “What the heck is the 30th element of a collection of 25 elements?”

                  Even if a particular language or language implementation defines the behavior of fetching said 30th element, that does not detract from the fact that attempting this operation is likely a mistake.

                  Of course, there are no hard absolutes. Maybe you want your array indices to “wrap around”, so that the 25th element is the 0th, the 26th element is the 1st, and so on. Or maybe you are purposefully causing an IndexError to be raised, because you want to redirect the control flow to an exception handler you have written elsewhere. Who knows.

                  But I still claim, with a reasonable degree of confidence, that, when most programmers index an array, they actually intend to use valid indices.

                  1. 3

                    And if it is a case of a logic error by the programmer, there’s a wide world of difference between languages which say “this is well-defined, and the defined behavior is we throw an exception that crashes your program if not caught”, and languages which say “this is undefined, the compiler is free to launch the missiles and reformat your hard drive”. The former is a much nicer way to work.

                    1. 2

                      Exceptions are obviously less dangerous than outright undefined behavior, but I am not really sure that they are “nice”. They are a lesser evil, but they still have a huge major downside. Namely, resource management (not just memory, which you can reasonably argue is best managed by a GC) is so much harder in the presence of exceptions and other forms of nonlocal control flow.

                      IMO, what actually would be “nice” is a semantics so clear that you have no trouble checking that your program has no undefined behavior. (Possibly after adopting certain safe programming practices. Also possibly using a combination of automatic and manual checks.) C and C++ fail this criterion, not because they have UB, but rather because UB in these languages is designed for compiler writers (who want every single opportunity to do optimizations), rather than for language users.