1. 28

  2. 26

    I’m wholly unconvinced a single number type is a good thing. It’s true, you don’t need to think as hard if you unify your integer and floating-point types. But that just provides an opening to introduce subtle new bugs in cases where the semantic differences between the two matter.

    1. 15

      I can’t even slightly imagine a world where unifying naturals or integers with reals is anything but a giant explosion waiting to happen. I do think there’s something interesting in pretending that numbers are ideal (thus infinite) by default and having Int32, Int64, etc only as special cases.

      1. 2

        Ermm, don’t Lisps do that?

        1. 8

          Scheme has something like that, but it’s a bit more finnicky than that. Instead of taking the full algebraic perspective Scheme distinguishes abstractly between exact and inexact values and computations. It’s dynamically typed, however, and so a little difficult to understand exactly whether or not it truly “unifies integers and reals” in any sense.

          Javascript is well-known for making that unification and it lacks any notion of exact/inexact. So, instead, you just face a bunch of issues.

          Real numbers probably just straight shouldn’t support equality. It’s a theoretically undecidable operation.

          1. 4

            Relatedly, does anyone know much about what it would take to replace “real numbers” with “real intervals”? It seems like something that has certainly been tried in the past. It would neatly solve equality issues by replacing them with more quantitative measures like “what is the (relative) measure of the overlap of these two intervals?”.

            Do real intervals form a field? (They ought to.) What does linear algebra look like here?

            1. 3

              Mathematically, I agree that real intervals should form a field (I haven’t looked into the corner cases,though). In the context of ‘Real numbers probably just straight shouldn’t support equality,’ I’m unsure how you’re going to define your identity values for + and *, since they depend on equality of real intervals which presumably depends on on equality of reals. There’s a similar argument for inverses.

              1. 2

                Why do they depend upon equality? You’d need some form of equality to prove their properties but their operation shouldn’t require it.

              2. 1

                Spire has an interval type but it relies on an underlying implementation (which might be “computable real”). I don’t see what you’re suggesting or how it would work - if you want to compute precisely with real intervals then you just do twice as much work. If you want to compute with single values and have strict control over the error margins, wouldn’t you just work with computable reals?

                Intervals form a semiring but not a field.

                1. 1

                  I’m more spitballing than thinking hard about it—evidenced by the fact that I didn’t even bother checking whether they were a field myself.

                  I just think it’s really interesting that real numbers are non-computable. Handling this sanely has roots in Brouwer and likely you could find even earlier ones. Andrej Bauer has some interesting papers on this as well, if I recall.

                2. 1

                  I did some experiments with plotting formulas with interval arithmetic last year. (The formula parser is still broken because all the operators associate right.) Playing around with it may give you some ideas for what works well and what works badly in naïve interval arithmetic. There’s been some work that I don’t fully understand by Jorge Eliécer Flórez Díaz extending interval arithmetic to do super-efficient ray-tracing which I think may solve some of the problems.

                3. 1

                  Real numbers probably just straight shouldn’t support equality. It’s a theoretically undecidable operation.

                  Theoretically undecidable under what model? Tarski demonstrated real closed fields are complete and decidable.

                  1. 2

                    I think the difference is that while decidability pertains to equality of arbitrary formulae over reals, I’m talking about arbitrary equality of values. If you receive two real numbers knowing nothing more about them and wish to build an algorithm for testing equality then you’ll hit an infinite regress. You can easily show two reals are equal up to some pre-defined precision, though.

                    1. 1

                      Hm. I’m going to have to go back to my modern analysis texts, but I’m pretty sure Tarski-Seidenberg provides for equality: by constructing the reals through Dedekind cuts over the rationals, you can guarantee a unique representation for each real number.

                      This is all mathematical, though - while I’m 99% sure Tarski has an algorithm to prove two reals are equal, I think the implementation is unreasonably slow. I don’t know what’s new on this front. But I know which textbook I’m cracking open over the weekend!

                      1. 1

                        I’d be very interested to see that. I believe it’s possible to prove that two reals constructed from some algebra of reals are equal (though I think the algorithm is superexponential), but I’m fairly sure there’s no constructive function (n m : Real) -> n = m. Uniqueness of construction is a little different, though. You probably would get away with proving that by stating that if there were two representations you’d derive contradiction.

            2. 7

              Agreed. In particular, many languages would benefit from the addition of first-class rationals, complex numbers, etc.

              1. 5

                Obligatory Guy Steele link: https://www.youtube.com/watch?v=_ahvzDzKdB0

            3. 27

              I’m sure there are people out there who would benefit from reading this article, since some of the things it says are true, and they are very important things about programming. But in general I find that I have little patience for its level of bombast and self-assurance, particularly the parts that consist of poorly-thought-out ideas and careless repetition of vaguely-understood rumors, and I fear that in some ways it will steer programmers who are even more novice than the author in bad directions. It’s no secret that I’m no fan of Bob Martin, and this dude seems to be imitating a lot of Martin’s weaknesses.

              The big things Seemann gets wrong

              First, the dude is not talking about language features; he’s talking about language capabilities. And it’s a deeply important insight that less expressive languages can be preferable. But he never gets to why; he seems to think that this is merely because they exclude “invalid programs”, a term he seems to have no idea what he means by; certainly he never even flirts with a definition. Were he to read, for example, Dijkstra’s short famous letter which he cites, he would immediately discover that the exclusion of “invalid programs” is no concern of Dijkstra’s here; rather, Dijkstra is concerned with our ability to “retain our intellectual grasp on the processes generated by” our programs, that is, to understand what our programs do.

              This fundamental tradeoff between analyzability and expressiveness, of course, is not a discovery of Dijkstra’s; Chomsky published his discovery of what is now known as the Chomsky hierarchy in 1956, explicitly linking it with automata theory, and incidentally destroying the credibility of behaviorism; in which Turing had already shown the halting problem intractable for Turing machines in 1936, while it’s clearly tractable for simpler classes of automata. (I include these papers not because you need to read them — they’re quite a bit denser than Dijkstra’s letter above — but just to show that these ideas date back quite a while.) Much of the success of the WWW is owed to TimBL’s informal Principle of Least Power (1998), which says that you should do as much as possible with languages that are as unexpressive as possible, specifically so that they can be analyzed.

              But Seemann shows no sign of having heard of any of this stuff, and consequently he’s sort of groping towards it but getting the main thrust wrong, while broadcasting his profoundly limited understanding as a great revelation. It’s not really that hard to find examples of the λ-calculus online, which is a sort of reductio ad absurdum of his position: you can easily have a Turing-complete programming language with nothing more than functions, and you can even implement it efficiently, but it’s still hard to program in, and it’s still extremely bug-prone. Adding some type-checking and some syntactic sugar is what brings us to Haskell.

              Second, in addition to his historical ignorance, Seemann seems to be almost totally unaware of the things going on in the world of programming languages today. Sure, he mentions assembly, F#, Haskell, C#, Java, C, C++, JS, and T-SQL, and even, with some uncertainty, OCaml; but what about Golang, whose use of error codes is apparently completely outside his experience because it totally demolishes one of his minor supporting points? What about Erlang, which has exceptions that aren’t transfers of control? What about Scala? What about Clojure, whose pervasive homoiconicity nevertheless did not remove its need to have a powerful reflection API, demolishing another of his minor supporting points? What about Rust, whose lifetimes provide memory-safety without a garbage collector? What about the continued pervasive use of C++, apparently because there are places where there is apparently no alternative — AAA games and the Microsoft Windows kernel — which suggests that his thesis of minimalism having drawbacks is at least somewhat uncertain? And these are not obscure niche languages like T-SQL; they’re arguably among the most fashionable and exciting languages out there at the moment!

              The small things Seemann gets wrong

              The point at which this article really began to get my hackles up was when I saw the “All possible instructions” (maybe this is supposed to mean “sequences of instructions”) vs. “Valid programs” Venn diagram, which makes some pretense at proportionality. Of course, since Seemann never bothers to define, even informally, what he means by “valid”, it’s hard to say for sure, but for any reasonable definition, the set of valid programs is insanely tiny compared to the set of all possible programs. Like, it’s not 10%, or 1%, or 0.1%; it might be 10⁻¹⁰⁰⁰, if you’re on a microcontroller with very little program space.

              A central concept in Procedural [sic], Imperative [sic], and Object-Oriented [sic] languages is that you can change the value of a variable while executing your program. That’s the reason it’s called a variable.

              No, it’s called a variable because that’s what they’re called in mathematics. If f(x) = x² + 3, then x is a variable in f, but that doesn’t imply that its value can change while evaluating f. Instead, it means that it can vary between evaluations of f. We got this terminology via FORTRAN, the FORmula TRANslator; pre-FORTRAN programming languages typically did not call them “variables”, typically referring to memory locations as “registers” or, as in FLOW-MATIC, “data items”..

              Early computer programmers quickly discovered that writing in machine code was extremely error-prone, and the code was also as unreadable as it could be. One way to address this problem was to introduce assembly language, but it didn’t solve the underlying problems: most assembly languages were just ‘human-readable’ one-to-one mappings over machine code, so you still have unlimited ways to express incorrect programs.

              Nobody who has programmed in machine code would say this. If “the underlying problems” are error-proneness and unreadability, assembly language is dramatically better than machine code: it calculates jump offsets for you, allocates space to your “variables” for you, and lets you use human-readable names for those variables; and consequently it is enormously more readable. Better assemblers can do arithmetic calculations at compile time, conditionally exclude sections of code, and do code generation with user-written macros. Also, there isn’t a one-to-one mapping between machine-code programs and assembly programs: a single assembly program can map to many different possible machine-code programs, depending on where you tell the assembler to put things, and a single machine-code program can be produced by an incomprehensibly large number of different possible assembly programs. And all of this has been true since the 1950s, even though the machine code of the machines of the time was considerably easier to read.

              In C, you can express almost all valid programs. …You’re okay with that, because the options that were taken away from you were bad for you.

              This first statement is so vague that it cannot be said to have a truth value, since not only do we not know what a “valid program” is, we also don’t have a metric defined over the space of programs. If by “program” we mean “possible blob of machine code”, then it’s clearly false, since the vast, vast majority of possible blobs of machine code, even valid ones for some reasonable definition of “valid”, are not a possible output from any given C compiler. Again, we’re not talking about being able to produce 10% of all possible valid programs; we’re talking about more like 10⁻¹⁰⁰⁰, or worse if you’re programming something with more memory than your dishwasher.

              If we cast about desperately for some sense in which this statement could be true, we are left with the possibility that we are talking not about the number of programs but about the number and importance of things the program can do. But in fact this isn’t true either. In pure C, you can’t make a system call; you can’t bitbang; you can’t write constant-time code; you can’t handle an interrupt; you can’t set up the TLB; you can’t write to I/O ports unless they’re memory-mapped; you can’t set up a stack and therefore you can’t switch threads; you arguably can’t even do a nonlocal exit, although setjmp and longjmp are in the standard library if not the language. In fact, very nearly the only useful programs you can write in C, without invoking library functions written in assembly language, are /bin/true and /bin/false.

              And yet C is dramatically more expressive and also more analyzable than assembly language. Also, it’s usually more portable. Why is that?

              I would say it’s not because you can’t specify which registers your variables and temporaries go into, or where they’re stored in the stack frame, or which instruction to use to calculate x + 4 * y. You could add all those abilities to C, and nothing would change. (GCC does in fact add the ability to do things like write interrupt handlers or invoke SSE instructions to C.) It’s because you don’t have to; the compiler lets you program at a higher level of abstraction, marshaling the machine resources to achieve the goals you’ve set. It’s not about taking away capabilities. It’s about automating drudgework. Seemann’s analysis is just wrong, and in fact not taking Seemann’s approach is one of the reasons C was popular in the 1970s and 1980s, compared with languages like Pascal, whose standard version wouldn’t let you write a function that could operate on an array of any size.

              This sparked a decade-long controversy, but these days we’ve come to understand that Dijkstra was right [and that goto should not be included in programming languages].

              And yet the just-designed Golang has goto; the just-written Linux kernel has a hundred thousand gotos in it; we just hit a serious security vulnerability in the just-written WebKit that famously involved goto (although it was caused, not by goto, but by what is arguably a bug in C’s syntax that has been corrected in different ways by languages like Perl, Ruby, and Python, and was never present in e.g. the line leading to OCaml); and although Scheme’s more powerful version of goto, call-with-current-continuation, has had its removal mooted for R7Rs, but R7RS-small still has it as standard in §6.10.

              As a side note, I just this weekend translated a goto-filled tokenizer from C to Java, which supposedly has no gotos, through the simple expedient of translating

              goto foo;


              foo: for (;;) {
                continue foo;

              and translating

              goto foo;


              foo: for (;;) {
                break foo;

              with some renaming and elision of redundant elements (dead break statements, multiple redundant labels in the same place, etc.) to preserve comprehensibility.

              (I do think each use of goto needs to be defended, and here I defend my C by noting that one of the uses Structured Programming with go to statements calls out as acceptable is implementing a finite state machine like this. Still, the exercise made me feel that perhaps the nested-loops version is actually more comprehensible, and even in this case goto was a detriment to readability, however much easier it may have been to write in the first place.)

              So I think Seemann’s “we” here includes only a small, provincial group of novice programmers with little contact with the outside world.

              Since GOTO [sic] has turned out to be an entirely redundant language feature, a language without it doesn’t limit your ability to express valid programs, but it does limit your ability to express invalid programs.

              In fact, as Böhm and Jacopini famously proved in 1966, at the beginning of the whole controversy, the set of computations you can express with goto, and with just the Kleene operators of concatenation, selection, and repetition, are precisely equivalent; and of course goto does not give you any new ways to crash. There are some cases where this results in duplicating parts of your program, but keep in mind that in the worst case you can rewrite any flow-control graph with a single auxiliary state variable:

              int state = 0;
              for (;;) {
                switch(state) {
                case 0: /* do stuff; */ if (p) state = 1; else state = 2; break;
                case 1: /* …

              But this, of course, completely eliminates the structure and clarity that Dijkstra and Knuth and so on were seeking to achieve.

              If by “programs” Seemann means not “computable functions” but rather “flow-control graphs”, this is still false in the presence of multi-level break and continue, as in Java; I showed this informally above, but it was proved formally by Kosaraju in 1973. Also, it is not clear in what sense a flow-control graph could be “invalid”.

              If by “programs” Seemann again means “blobs of object code” then, again, his statement seems to be both false (in the presence of multi-level breaks) and vacuous, unless we define “invalid” circularly to mean “containing irreducible control-flow subgraphs”.

              everyone seems to agree that errors should be handled via some sort of exception mechanisms; at least, everyone can agree that error codes are not the way to go (and I agree).

              Except, apparently, in Golang. I’m not saying Golang made the right decision here (although the fatal compiler warning when you fail to use an error code that was returned to you helps quite a bit), but it does kind of, you know, exist. Oh also C++ at Google doesn’t allow exceptions. And John Carmack isn’t convinced. Nor was Joel Spolsky, back when he was still programming. And while Yosef Kreinin is guardedly more positive on them in C++, he still recommends using error codes in most cases.

              As Robert C. Martin also pointed out, in older languages, including C and C++, you can manipulate pointers, but as soon as you introduce polymorphism, you no longer need ‘raw’ pointers. Java doesn’t have pointers; JavaScript doesn’t do pointers;

              JS has almost nothing but pointers. Every variable is a pointer, unless it’s a number, which in some implementations is also implemented transparently as a pointer. The thing that distinguishes C and C++ (and Golang) from object-graph languages like JS, Java (except for primitive types), Smalltalk-80, and LISP (1959! “older languages”, my fetid buttocks!) is that they have both pointers and things that aren’t pointers. Like, you can nest a struct inside another struct, and you don’t have to allocate the inner struct separately on the heap. The pointer-related features removed going from C to Java are pointer arithmetic, pointers-to-pointers, explicit referencing and dereferencing, and function pointers.

              This is a good example of how Seemann is talking about removing expressiveness rather than features. To compensate for the absence of pointer arithmetic in Java, you need first-class arrays, strings, and arguably the Iterator interface and Arrays.asList, and you still lose the ability to write offsetof and to treat a substring of an array as an array without copying its contents. (Golang does a better job at this by replacing pointer arithmetic with “slices”.) To compensate for the absence of function pointers, you need dynamic dispatch. To compensate for the absence of explicit referencing and dereferencing, initialization now must be explicit, and now has nondeterministic runtime and may throw an exception. To boot, in Java, any access to a part of any non-primitive object can potentially throw a NullPointerException. These are good tradeoffs for many applications, but the consequence is that Java is a much less minimalistic language than C, with many more features which interact in complicated ways.

              you don’t need access to pointers in order to be able to pass values by reference. Pointers can be abstracted away.

              You do need pointer semantics for that — also known as reference semantics — and the only way to “abstract away” pointer semantics is to eliminate mutability. Surprising aliasing bugs are so widespread that they commonly appear in programming FAQs for object-graph languages, along with the surprising limitations. The only way you could think this amounts to “abstracting away” pointers is if you have gone a long time without trying to introduce someone to programming.

              Most strongly typed languages give you an opportunity to choose between various different number types: bytes, 16-bit integers, 32-bit integers, 32-bit unsigned integers, single precision floating point numbers, etc. That made sense in the 1950s

              In the 1950s almost everybody was programming in machine code or assembly, referring directly to registers, which generally were not 16-bit or 32-bit. The high-level scientific languages from the 1950s (early versions of FORTRAN) did not support multiple precisions or unsigned numbers, although they did by necessity support both integer (“fixed point”) and floating point, generally without explicit variable declarations. The high-level business languages from the 1950s (basically FLOW-MATIC, though A-0 and B-0 existed, a little) did of course support different field widths, in part because that was also their output format. None of these were typically 16-bit or 32-bit. The weirdo languages from the 1950s (LISP, ALGOL-58) didn’t have separate numeric types either.

              With the modern computer resources we have today, a programming language that would restrict you to a single, sane number type would be superior to the mess we have today.

              What would that mythical single, sane number type be? Perhaps we should use 64-bit machine integers, giving up the possibility of representing fractions entirely? Or perhaps we should use 64-bit machine floating-point, so that it is sometimes the case that x == x + 1, you can’t calculate ((a+b)!/a!/b!) for reasonable values of a and b, addition is no longer associative, and there’s no way to statically verify that you’re not trying to index into an array with a fraction? (We’ve tried that in JS and Lua, and Crockford likes it but wishes it was decimal, which is how Seemann knows it’s the wrong answer.) Maybe we should follow Perl 6’s apparent lead and simply use arbitrary-precision fractions for everything, so that running n iterations of z ← z² + c takes O(2) time and memory, and any arithmetic operation can potentially throw an OutOfMemoryException or take several minutes to complete? There are leaky abstractions, but every number type on a computer is an abstraction with Niagara Falls running through it.

              And I’m not sold on this idea that performance is no longer important. Sure, there are a lot of things that performance isn’t important for. But in fact there are a lot of things where we just scale up the problem to the performance of our available machines: image rendering, machine learning, signal processing, finite element analysis, HFT, and apparently debug log parsing. If all you want to do with computers is what you could already do with computers 20 years ago, sure, use Python, use 64-bit ints for everything integer (but be careful with those doubles! Caveat calculator!), go wild. If you want to do something that is only possible with this year’s GPU or FPGA, or that really needs all those gigabytes of RAM, you are going to have to care about whether your numbers are 8 bits or 64 bits. You might even have to care about whether they’re 15 bits or 16 bits if you’re on an FPGA.

              In a complicated program with a big call stack, it’s impossible to reason about the code, because everything could mutate, and once you have tens or hundreds of variables in play, you can no longer keep track of them.

              While I’m sympathetic overall to the argument against mutability, this argument goes much too far. Nobody has ever been able to reason about a program that has more than 20 mutable memory locations? Goodness, what did he write this post with, a pencil? Am I reading it on a microfilm Memex? Does he think web browsers arise by natural selection, unaided by human cognition?

              There are some important drawbacks to mutability that Seemann is totally missing here, because his concept of reasoning about programs is, I guess, really vague. He calls out the issue of how a function you call could mutate a parameter in a surprising way, but the problems with mutation are much bigger than that:

              • The very simplest problem is that when you’re reading a piece of code and you want to know where the value of a variable came from, you have to look at all of the code between where you are and where the variable initially came into being, rather than just looking at its declaration. Even without things like subroutine calls, loops, or if statements, in between, that could still be a chunk of code.
              • Aliasing means that mutating a thing via reference a may also mutate, behind your back, a thing accessible via reference b, if they happen to be the same thing. The smallest example of this is probably that the no-temporary-variable in-place swap trick *x ^= *y; *y ^= *x; *x ^= *y; is buggy if x and y happen to refer to the same memory location: it will zero it, while a correct swap implementation would do nothing.
              • That means that the reason you need the ability to test reference equality (Seemann’s next complaint) is that you have mutability.
              • Re-entrancy problems generally arise from getting re-entered when you’re partway done with a mutation.
              • Similarly, when you lock something, it’s usually because it’s mutable, and you want to prevent it from being observed partway through a mutation when its invariants are broken. Other threads can mutate things, and they can do it in strange ways.
              • Similarly, exception-safety problems are invariably a result of partially-completed mutation. (Unless there’s an exception I don’t know about?) The original C++ STL was chock-full of exception-safety problems, as is Google’s code base.
              • Mutation, although it can enable optimizations, also causes performance problems.
              • Pervasive mutability makes it really hard to extend the language in new ways, and for example it’s really what doomed STM.NET.
              • In the absence of mutability, parametrically polymorphic container types have covariant typing: if RGBAColor is a subtype of RGBColor, it’s perfectly safe to treat an immutable List<RGBAColor> as an immutable List<RGBColor>, using C++/Java syntax. With mutability, parametrically polymorphic container types have invariant typing, which is a royal pain in the ass, because treating them as covariant would permit invalid mutations, and treating them as contravariant would permit invalid retrievals. (Note that Java’s typing rules for arrays do not take this into account and are therefore unsound; storing a statically-type-checked object into a valid index in an array in Java can thus raise an exception.)

              many people might object that Haskell is too difficult and unintuitive, but to me, that kind of argumentation is reminiscent of the resistance to removing GOTO [sic]

              To me, this kind of argumentation is reminiscent of things written by people who don’t know Haskell. (Not that I do.)

              Most of Haskell’s difficulty, as far as I can tell, doesn’t have to do with immutability. It has to do with its type system, maybe its syntax, and maybe the level of abstraction necessary to write programs in what is basically syntactic sugar for the λ-calculus.

              (still flaming…)

              1. 3

                I had to tweet a link to this. The depth and breadth of this comment is bracing.

                1. 3

                  Thank you, I’m flattered.

                2. 3

                  Thank you for this excellent comment! I’d love to hear your thoughts of this research project I am working on which attempts to name and score various unsafe language vectors. I’ve had a hard time getting feedback, so any would be really appreciated.


                  1. 3

                    This is a really interesting idea. I agree with the concern about overfitting, but the overall idea certainly merits research, I think. Right now it seems to start with errors created by programming languages (such as null dereferences) and give credit for avoiding them. A less well fitted version might start with a formal proof and add or subtract for ways the implementation guarantees alignment with or could diverge from the formal system, respectively.

                    1. 2

                      It’s a fun idea! It seems potentially prone to overfitting, though. And it’s going to be especially difficult to convince people that you didn’t overfit.

                      1. 1

                        Do you have any suggestions for avoiding overfitting with this model? Or improvements?

                        1. 1

                          That’s a really, really hard problem, so I don’t know what to recommend. I mean, the point is to use criteria that matter in practice, right? But that means that inevitably you are going to choose criteria that favor the languages that work best in practice, which is to say, in your own judgment. In theory, maybe interval math, dimensional analysis (like in Frink), or integer range types (like in Ada) would be really useful.

                          1. 4

                            I had written a long comment here elaborating on this theme, and then tried to delete a word with ^W and lost it. I was going to call out a few different things that make this difficult:

                            • Unfamiliarity: the vast majority of possible safety features are so unfamiliar that we can barely guess if they’re really important or not. Interval math, dimensional analysis (I think Fortress also had this? Also Boost.Units), integer range types, other type-system-based ways of showing array indices valid, total functional programming, loop variants, transactions, type-system-guaranteed purity (like in Haskell), guaranteed maximum memory usage (to prevent out-of-memory errors, as in MISRA-C or COBOL).
                            • Language context-dependence: nullability is a much bigger problem in Java than in C++ or Golang because so many things could potentially be null, and they all default to null.
                            • Domain context-dependence: dimensional analysis doesn’t matter at all in compilers. Sum-type switch case analysis probably doesn’t matter at all in numerical programming, the kind with big arrays of floats.
                            • Begging the question: maybe compile-time safety isn’t the end to aim for, but merely a means to that end. Maybe other things that we could do with potential errors, depending on the domain, might include:
                              • detecting them automatically at run-time, as with run-time type and bounds checking;
                              • automatically generating test cases to find them, as with afl-fuzz or QuickCheck;
                              • fixing them interactively in the debugger when they happen, as in Smalltalk or Minsky on ITS;
                              • proving them correct by a means other than a type system that produces compilation errors, as with coqasm;
                              • logging them comprehensively so that we can reproduce them later, as with mec-replay, rr-project, or ocamldebug time travel, or just regular logging;
                              • running them under an operating system that does comprehensive security labeling of data and processes to ensure that even if they are buggy they cannot reveal secrets;
                              • wrapping them in transactions so that we can prevent crashing programs from affecting users, except to temporarily deny them service, as I suppose in CICS;
                              • checking the output at run-time against a mechanically-checkable spec, as with AppArmor;
                              • restarting failing subsystems, as in Erlang or Fox’s granular-restarts paper;
                              • peremptorily rebooting the computer.
                            • Degrees of freedom: if you have N languages and compare them on N merits, you can almost surely find a linear weighting of merits such that any of the N languages comes out on top (N×N matrices are almost surely nonsingular), although in that case you might be constrained to make some of the weights negative. If you permit nonlinear transformations of factors (and surely you won’t claim that your existing weighting is in any sense linear) or if you permit combinations of factors, you can do the same thing with many fewer than N merits.

                            So, those are the problems leading to, in some sense, “overfitting”, that you have to overcome.

                    2. 1

                      Correction: I said “his thesis of minimalism having drawbacks” when I meant “his thesis of minimalism having no drawbacks”.

                      1. 1

                        That was a fantastic comment. Thank you!

                        Could you recommend some good reads about the history of computing and computer science?

                        1. 1

                          Thank you! You’re welcome!

                          I really enjoyed Gleick’s The Information, related to which the Dead Media Working Notes are fascinating in a dead-end kind of way. There’s a lot of stuff on Tom Jennings’s site (previously wps.com). (Yes, the inventor of FidoNet.) Reading random EWDs and Turing-award lectures is also surprisingly informative. And of course the Computer History Museum has a lot of great stuff online.

                        2. 1

                          This is a really great and useful rebuttal. Thanks for having taken the time to write this. The article deserves it, in my opinion.

                          1. 1

                            Thank you! You’re welcome!

                          2. 1

                            You should definitely put this on a blog or somewhere more permanent than a comment.

                            1. 1

                              Thanks! Yeah, I need to revive kragen-tol.

                          3. 14

                            Daira Hopwood gave an amazing talk at the Emerging Languages Camp at Strange Loop 2013 where she introduced her language, Noether. The core conceit is that it decomposes into sub-languages, hierarchically, which each sub-language adds a single new capability (thereby potentially introducing a new class of bug). The idea would be that you write your feature in whatever sublanguage gives you only those features you need, without pulling in higher-level capabilities (and their attendant bugs) unless you specifically need them. If you do, you block off that unit of code in its own sort of sandbox.

                            Really neat stuff. https://thestrangeloop.com/sessions/noether-symmetry-in-programming-language-design

                            1. 6

                              A recording of the presentation: http://www.infoq.com/presentations/noether

                              1. 5

                                This was indeed a really interesting talk. Many parts were relevant to this discussion. One I particularly like is the notion of separating the ability to throw exceptions from the ability to catch them. Throwing “out of memory” makes sense (even if it is a goto in disguise!); catching it does not, for the most part.

                              2. 7

                                Most strongly typed languages give you an opportunity to choose between various different number types: bytes, 16-bit integers, 32-bit integers, 32-bit unsigned integers, single precision floating point numbers, etc. That made sense in the 1950s, but is rarely important these days; we waste time worrying about the micro-optimization it is to pick the right number type, while we lose sight of the bigger picture.

                                There are, of course, many other reasons than efficiency. For example, to specify a quantity (which cannot be negative) in a type-safe way, you need an unsigned integer.

                                Java doesn’t have pointers; […] As these languages have demonstrated, you don’t need access to pointers in order to be able to pass values by reference.

                                I am not even sure what this sentence means. Either you pass by value, or you pass by reference. Java does have pointers (it passes by value), it’s just that you cannot apply pointer arithmetic and they are called references (to make things more confusing).

                                However, it turns out that mutation is also a large source of defects in software, because it makes it much harder to reason about code. Consider a line of C# code like this: (example of a map follows)

                                This is only a problem if your language cannot mark mutability. In Haskell or Rust, you can read if a function is mutating using its signature.

                                1. 8

                                  For example, to specify a quantity (which cannot be negative) in a type-safe way, you need an unsigned integer.

                                  I don’t see this one as very compelling, at least for a “traditional”, fairly limited type system. Yes, this-value-must-be-nonnegative is a kind of constraint you might have on some values, and it would be nice to enforce it. But so is this-value-must-be-between-0-and-59 (in a “minutes” field, say). These are just integer-range constraints. If a language wants to support integer-range constraints—or go in whole-hog and support full dependent types—then I don’t have a problem with that, if it’s worktable. But special-casing an integer >= 0 as the only range constraint supported: meh.

                                  1. 4

                                    Re: “Java doesn’t have pointers”, there’s an attempt to distinguish pointers from other forms of reference which, ultimately, are implemented as pointers but with the compiler enforcing restrictions on their behavior. In the context of that distinction, “pointers” means “… with type coercion and arithmetic, like C does it”. It seems that you did get the distinction but just not the terminology? I’m not a huge fan of it either; I’d rather call references “non-aliasing pointers” or something to that effect. But oh well.

                                    1. 4

                                      I am familiar with the terminology, I wrote too much C++ :). Maybe I was nitpicking, but my point was that even with pointers you do not pass by reference. Unless you pass a pointer by reference, of course :).

                                      1. 6

                                        Oh. :) Yes, I see what you mean - the pointer’s value gets copied down to the callee, which can mutate it without the caller caring, which is pass-by-value. That’s a sensible remark. I’m realizing how messed-up the terminology in this area is. :)

                                      2. 3

                                        Well, true references simply don’t have NullPointerException.

                                    2. 4

                                      For what it is worth, ocaml hits many of the points on the list. But has other complexities.

                                      1. 4

                                        What would you consider the complexities of OCaml to be? (Serious here, I’m very interested in evaluating OCaml use in production :)

                                        1. 4

                                          exceptions are one thing ocaml did not leave out. i’ve been going through some of my old ocaml code and rewriting it to strictly use option or result types rather than throwing exceptions. i’ve been helped by the decision to use jane street’s “core” library everywhere (see their stance on exceptions here: https://blogs.janestreet.com/core-principles-uniformity-of-interface/)

                                          1. 4

                                            I really like core, but have a hard time, so far, just diving in and using it everywhere. Core_kernel helps a bit.

                                            It’s interesting how completely SML embraced exceptions and the consequences.

                                            1. 3

                                              IME, most people start off loving exceptions but as they start feeling the pain of difficult-to-predict control flow, thanks to exceptions, they tend to want to reduce the use of exceptions. SML not being so widely used in industry might be a reason for the community not pushing back against exceptions. That is, by no means, a knock against SML, but just a possible reason.

                                              The suggestion in Ocaml is to not let exceptions escape an API. One should prefer an error monad for API-crossing errors. I have found this to be really great and enjoyable.

                                          2. 3

                                            While I love the type system, HM is known for having difficult error messages. Obviously all explicitly typed programs fall into the same circle as valid programs. I happen to think the trade-off of HM error messages is correct but I’ve learned the tricks of figuring them out.

                                            Polymorphic variants are great, but also lead to terrible error messages.

                                            The lack of a good concurrency model in Ocaml means we currently have two competing solutions, which sucks. A lot. So there is extra complexity there because of simplicity.

                                            IMO, these less is more language wars never really capture the true problem. Making a language simple doesn’t mean everything is simple, it might mean that complexity gets pushed elsewhere. Ocaml’s runtime is simple because it doesn’t deal with concurrency or parallelism. But that means the complexity gets shifted over to the user. Is that better? I’m not sure. I think “no”. Go compiler is simple because of no generics, but it means that burden is now on the developer. Java is a pretty simple language but the runtime is hugely complex. But by the same token, C++ has a lot of stuff in it and IMO nobody needs a huge chunk of it, so removing it from the language does not necessarily shift the complexity to the user, it just removes it from everywhere.

                                            1. 4

                                              IMO, these less is more language wars never really capture the true problem. Making a language simple doesn’t mean everything is simple, it might mean that complexity gets pushed elsewhere. Ocaml’s runtime is simple because it doesn’t deal with concurrency or parallelism. But that means the complexity gets shifted over to the user.

                                              This is a very important point; something like garbage collection may look complex, and one may want to avoid it in their language, but that implies that all users must do manual memory management in all applications. It seems worth taking a hit on simplicity for the sake of taking away extra complexity for users.

                                        2. 3

                                          Among the various bubbles he has, I would like to see “understandable programs” as a subset of valid programs.

                                          Broadly, I think Kolmagorov Complexity and other stuff show that the power of a valid program can be huge compared to it’s length but usually such programs are also very hard to understand, understanding usually requiring breaking things down into predictable parts.

                                          One of the aims of object oriented programming is to be able to turn a valid but not understandable program into an understandable program and I think in contrast functional programming aims to make sure programs remain understandable through their lifespan .

                                          1. 4

                                            Dijkstra didn’t state that GOTO should be eliminated entirely, but within the context of the state of programming in 1968, it caused a huge number of problems. This was before constructs like switch/select and other control structures were commonplace so there was a lot of branch and jump style programming going on. He even says that GOTO is useful for fatal errors, i.e. an exception-like mechanism.

                                            Here’s a neat annotated version of Dijkstra’s letter

                                            1. 2

                                              Less is more…. until you need that wee bit “more” is needed an appropriate..

                                              Then you have a hell of kludges and backdoors and halfarsery to do what needs to be done.

                                              I quite like the D language approach….

                                              Give you all features, but allows / encourages you to work in the smallest safest paradigm that is appropriate, and provides sane well thought out “gateways” between the domains.

                                              1. 1

                                                Then you have a hell of kludges and backdoors and halfarsery to do what needs to be done.

                                                Were you thinking of something in particular? I find the likes of unsafePerformIO and rust’s unsafe blocks to be clear and simple.

                                              2. 2

                                                That made sense in the 1950s, but is rarely important these days; we waste time worrying about the micro-optimization it is to pick the right number type, while we lose sight of the bigger picture.

                                                I’m shortening i64s to i32s or lower all day in Lucene to make them fit memory better and run faster. Aggregations really love if you are doing that, due to better packing in memory.

                                                There’s a dangerous trend of “I certainly don’t need it, so no one needs it”, especially when it comes to machine numbers. The same goes for floats etc. Maybe, if you are building “just normal” applications, this is uninteresting, but for many fields (statistics, metrics gathering, databases, embedded devices,…) these things are interesting.

                                                1. 2

                                                  It’s an interesting article but unfortunately the author mixes some sound concepts like orthogonal language design with bald assertions like:

                                                  everyone seems to agree that errors should be handled via some sort of exception mechanisms; at least, everyone can agree that error codes are not the way to go

                                                  Both points are wrong - many consider exceptions harmful and at least one modern language encourages the use of error returns over its exception mechanism.

                                                  And the assertion that having a single number type is a good idea… I have no words…

                                                  1. 2

                                                    Go is the black sheep of modern languages; a lot of its design goes against the current consensus.

                                                    Those who dislike exceptions are (IME) generally advocating error-or-success types (Option, Either), which are very different from error codes.

                                                    1. -1

                                                      Looks like you mixed up some links. You write “at least one modern language”, but link to http://blog.golang.org/error-handling-and-go.

                                                      1. 3

                                                        zik’s links seem correct to me: from errorhandling-and-go (emphasis added): Go “language’s design and conventions encourage you to explicitly check for errors where they occur (as distinct from the convention in other languages of throwing exceptions and sometimes catching them).”

                                                        1. 0

                                                          But that wasn’t the point that I found confusing, right?