1. 17
  1.  

  2. 8

    A couple notes on the article (specifically, the one it links to at the beginning, The Logical Disaster of Null).

    Null is a crutch. It’s a placeholder for I don’t know and didn’t want to think about it further

    I disagree. In C, at least, NULL is a preprocessor macro, not a special object, “which expands to an implementation-defined null pointer constant”. In most cases, it’s either 0, or ((void *)0). It has a very specific definition and that definition is used in many places with specific meaning (e.g., malloc returns NULL on an allocation failure). The phrase, “It’s a placeholder for I don’t know and didn’t want to think about it further”, seems to imply that it’s used by programmers who don’t understand their own code, which is a different problem altogether.

    People make up what they think Null means, which is the problem.

    I agree. However, again in C, this problem doesn’t really exist, since there are no objects, only primative types. structs, for example, are just logical groupings of zero or more primative types. I can imagine that, in object-oriented languages, the desire to create some sort of NULL object can result in an object that acts differently than non-NULL objects in exceptional cases, which would lead to inconsistency in the language.

    In another article linked-to in Logical Disaster of Null talks about how using NULL-terminated character arrays to represent strings was a mistake.

    Should the C language represent strings as an address + length tuple or just as the address with a magic character (NUL) marking the end?

    I would certainly choose the NULL-terminated character array representation. Why? Because I can easily just make a struct that has a non-NULL-terminated character array, and a value representing length. This way, I can choose my own way to represent strings. In other words, the NULL-terminated representation just provides flexibility.

    1. 4

      “On Multics C on the Honeywell DPS-8/M and 6180, the pointer value NULL is not 0, but -1|1.”

      1. 3

        The C Standard allows that. It basically states that, in the source code, a value of 0 in a pointer context is a null pointer and shall be converted to whatever value that represents in the local architecture. So that means on a Honeywell DPS-8/M, the code:

        char *p = 0;
        

        is valid, and will set the value of p to be -1. This is done by the compiler. The name NULL is defined so that it stands out in source code. C++ has rejected NULL and you are expected to use the value 0 (I do not agree with this, but I don’t do C++ coding).

        1. 2

          I believe C++11 introduced the nullptr keyword which can mostly be used like NULL in C.

          1. 1

            Correct. Just for reference, from the 1989 standard:

            “An integral constant expression with the value 0, or such an expression cast to type void * , is called a null pointer constant.”

        2. 3

          I would certainly choose the NULL-terminated character array representation. Why? Because I can easily just make a struct that has a non-NULL-terminated character array, and a value representing length. This way, I can choose my own way to represent strings. In other words, the NULL-terminated representation just provides flexibility.

          That’s not a very convincing argument IMO since you can implement either of the options yourself no matter which one is supported by the stdlib, the choice of one doesn’t in any way impact the potential flexibility. On the other hand NULL-terminated strings are much more likely to cause major problems due to how extremely easy it is to accidentally clobber the NULL byte, which happens all the time in real-world code.

          And the language not supporting Pascal-style strings means that people would need to reach for one of a multitude of different and incompatible third-party libraries and then convince other people on the project that the extra dependency is worth it, and even then you need to be very careful when passing the functions to any other third-party functions that need the string.

          1. 1

            You make a good point. Both options for strings can be implemented. As for Pascal strings, it is nice that a string can contain a NULL character somewhere in the middle. I guess back in the day when C was being developed, Ritchie chose NULL-terminated strings due to length being capped at 255 characters (the traditional Pascal string used the first byte to contain length). Nowadays, since computers have more memory, you could just use the first 4 bytes (for example) to represent string length, in which case, in C it could just be written as struct string { int length; char *letters; }; or something like that.

            From Ritchie: “C treats strings as arrays of characters conventionally terminated by a marker. Aside from one special rule about initialization by string literals, the semantics of strings are fully subsumed by more general rules governing all arrays, and as a result the language is simpler to describe and to translate than one incorporating the string as a unique data type.”

        3. 4

          It seems to me like the problems with null all stem from the fact that it is implemented as a bottom type, not the fact that it exists at all. Are there languages that implement null as a separate type, and not a subtype of all types? Is there a good reason why most languages don’t do this?

          1. 12

            Are there languages that implement null as a separate type, and not a subtype of all types?

            At this point you’ve got option, because it’s no longer legal for a variable with type T to contain a null. Instead, to contain a null that variable must be of a union type T|Null, a.k.a. option a.k.a. maybe. There are lots of languages that do this (Kotlin, Swift, Haskell, Scala, Rust).

            1. 3

              Ruby, but that’s probably not what you mean.

              1. 3

                TypeScript does this with the strict null checks flag on.

                1. 2

                  Yes, in .NET F# is a good example of this if you’re not interoping with C#. In C# we’re finally reaching a point where nullable reference types will no longer be the default and must be explicitly declared and will have proper compile time checking. It will be a warning by default for new projects, and you can also choose to make it a compile time error if you want.

                2. 2

                  Bear with me, this might sound dumb, but I find it super confusing when you have some object reference, which might be null itself, and it’s also got object values/references inside, which could also be null. So a value can be more-or-less null/unusable in multiple ways, but sometimes it will(!) be usable with almost nothing not-null, depending on context. Each time I step into the code I’ve got to re-establish which things are going to be present and why, depending on context. And add null-checks everywhere. I wish I knew the name for this pattern. (the errorless data structure, the bag of holding, &c) I’m totally down with make illegal state unrepresentable but it’s hard to refactor once the code is already written, inherited-from, corner-case’d, and passed around everywhere.

                  It’s the same with functions. I swear I saw a line of code today that was like below (paraphrasing). I mean sure, I can get used to anything, but it just looks to me like a failure mode.

                  return service.Generate(data, null, null, null, null, null, null);
                  
                  1. 1

                    I wonder what would happen if say, 64K of data was mapped to virtual address 0 and made read-only [1]. That way, NULL pointers wouldn’t crap out so much, A NULL pointer points to an address of NUL bytes, so it’s a valid C string. In IEEE-754 all zeros represents 0. All pointers lead right back to this area. If you use “size/pointer” type strings, then you get 0-byte (or 0-character) strings. It just seems to work out fine.

                    It’s probably a horrible idea, but it would be fun to try.

                    [1] I would be nice if this “read-only” memory acted like ROM—it could be written to, but nothing actually changes.

                    1. 2

                      I’ve had some fun thoughts about this before :D

                      I was sketching out ideas of a microcontroller design that could potentially “not have registers” and also try to avoid lots of arbitrary hardcoded memory addresses. In practice it always ended up having registers in the form of a couple of internal busses and some flags, but it would look like it mostly didn’t have registers as far as programmers were concerned.

                      I wanted to make the “program counter” a value stored at memory address zero. This would also mean the ‘default’ value of memory address zero set in the ROM would be the entry point in the code, which I thought was pretty.

                      This also simplified a few things from the circuitry point of view:

                      • No JMP instruction, just MOV 0,value
                      • No halt instruction, just MOV 0,0

                      Some thought later made me realise that using low memory addresses for critical things was a bad idea. When a program wigs out it can start writing to arbitrary random addresses, and address zero is a very common target in many bugs. Overwriting address zero would make the CPU jump to new code and potentially make things harder to debug.

                      In the end I thought it best to setup the first 64 bytes or so of memory to be an intentional trap instead. ie any read or write to those bytes would immediately halt the processor. A lot less elegant, but a lot more practical.


                      Back to your idea.

                      Letting the first 64K of memory be usable would allow a lot of programs to keep running, a lot like the old “Abort/retry/ignore” allowed us to do in the DOS days. For some bugs this would be brilliant and let you try and gracefully recover (eg finish saving a document).

                      Alas there would also be a chance of data being damaged (eg files getting overwritten) if you continue into unknown territory; so I think it would still be worth bringing up an A/R/I style dialog. Even if only so we can blame the users if something goes wrong :P

                  2. 1

                    Suggesting that Haskell has no null is doubtful. A diverging computation, e.g. null = 1 + null is typable for, say Integers, but admits no value. So, there you go: another bottom value that fits in every type.

                    1. 6

                      The nice thing about bottom, compared to null, is that it’s not observable. In other words, we can’t do something like:

                      myValue = if isBottom myOtherValue
                                   then thingA
                                   else thingB
                      

                      In this sense, bottom acts more like a thrown exception (control flow) than a null (data value); although it looks like a null since laziness combines data flow with control flow.

                      In a sense, this more faithfully represents an “unknown” value, since attempting to figure out which value it is also ends up “unknown”. In contrast, null values are distinct from other values. For example, if I don’t know the value of a particular int, the answer to “is it 42?” should be “I don’t know”, not “no”. Three value logic attempts to act like this, but is opt-in.

                      1. 1

                        Please could you explain how “laziness combines data flow with control flow”?

                        1. 2

                          Just the idea that evaluation proceeds based on the definition of data, rather than separate control mechanisms, e.g. we can write ifThenElse as a normal function:

                          ifThenElse True  x y = x
                          ifThenElse False x y = y
                          
                          1. 2

                            Oh, I see. Thanks. I was thinking that the usual C-style control mechanisms, e.g. if and for, still kind of entangle control flow and data flow, albeit with special syntax rather than just functions. I wonder if it is possible to disentangle this? What would that look like?