1. 56
    Lets stop copying C c plt programming eev.ee
  1.  

  2. 28

    so old that it probably couldn’t get a programming job

    ~_~

    1. 12

      Nice list. Some of these are echo’d in “things Rust shipped without”, e.g. accidental octal and bitwise precedence.

      http://graydon2.dreamwidth.org/218040.html

      1. 10

        On this topic, Richard Gabriel’s ‘The End of History and the Last Programming Language’ is a great read from the past. It goes into why so many languages look like C and what it takes for a language to be successful. It’s funny that elsewhere around the same time he predicted that if C was ever replaced, the replacement language would look like C and be more dynamic. This was before Java. Javascript took it even further.

        (essay begins on p13 of the PDF)

        http://www.dreamsongs.com/Files/PatternsOfSoftware.pdf

        1. 4

          Thanks for the pointer – I have this book! I bought it maybe 10 years ago after enjoying a lot of Richard Gabriel’s writings.

          I didn’t remember the essay but went back and read it. What really struck me is how much the world changed since he wrote it.

          The last line is Right now the history of programming languages is at an end, and the last programming language is C. That line doesn’t appear to have aged well, but I would say that he’s correct in the sense that C turned out to be a great language for writing other languages, and it has almost no competitors in that regard. Python/Ruby/JS/JVM/etc. are all written in C or C++.

          He was also wrong about C++ – he predicted it would fail because the performance model was too complex, and that it required too much mathematical sophistication.

          Instead it was a fantastic success in terms of adoption, with every browser being written in C++, and languages like Rust and Go being specifically designed to replace C++. Even if you discount all existing C++ code, I’m sure there’s more of it written every day than Rust and Go combined, maybe even 10x more.

          I would say that C++ succeeded because of compatibility with C. That’s pretty much it. Every other reason is dwarfed. Parts of it require an incredible degree of “mathematical sophisitication” (his term), and it’s often not straightforward to reason about its performance from the source text, but that doesn’t matter. Compatibility wins.


          I feel a little bad for Gabriel, because RIGHT after he wrote that essay, there was a huge change in the landscape of programming languages.

          I would say what really dictated the change in languages was the platforms – the Web, and then mobile phones. The Web gave us Perl, Python, PHP, Ruby, JS, etc. He mentions this, but it isn’t the core of the essay.

          The four points to his theory are:

          1. Language are accepted and and evolve by a social process
          2. Successful languages must have modest or minimal resource requirements
          3. Successful languages must have a simple performance model
          4. Successful languages must not require users to have mathematical sophistication

          I would say that #2-#4 were true in the 1990’s for competitors to C, but they are no longer true, because the environment changed. Computers got bigger and server computing became more important. Python and Ruby are FAR from the most efficient languages, even for interpreted languages. They have tons of bells and whistles and hidden indirection that even Lisp didn’t have.

          #1 is of course true. It feels obvious at this point, but that’s not a knock on Gabriel writing in the 1990’s.

          I would say the most important consideration now is compatibility with existing code, and a smooth transition path from legacy code.

          This theory predicts TypeScript over other JS alternatives, because it appears to be the one with the smoothest transition path.

          Another example is that (obviously) ANY systems language needs some kind of C interop to be adopted. It can’t start the world from scratch again. C++ interop is even better, although that is a very difficult problem.


          Anyway thanks again for the pointer… It was a good read, but it did remind me of how much the world changed. He talked about “local wizards” porting languages too, but this is in an era when people learned programming from printed books (as I did) rather than the web, and before git, Github, etc. I hope Gabriel is happy with the language diversity we now have!

          EDIT: In retrospect I contradicted myself – there are 2 important factors: compatibility, and drastic changes in the computing environment, and the second is probably more important. Maybe AI or quantum computing will change everything. I don’t think that will happen in the next 20 years, but it’s possible.

          Then again, Unix has not only survived, but thrived, throughout the Web and mobile era, with perhaps a dip in the PC era. So despite drastic changes, we’ll likely still have much of the same code and languages around. It seems like what happens is that systems get added to, rather than replaced. It’s rare that anything gets replaced.

          C will still be there, and Python will still be there, but maybe they’ll be sitting below something else, like magic intentional AI code :)

          1. 2

            I would say that he’s correct in the sense that C turned out to be a great language for writing other languages, and it has almost no competitors in that regard. Python/Ruby/JS/JVM/etc. are all written in C or C++.

            Does that make it so, though? I hold a healty dose of skepticism toward any language that is not implemented in itself. And, perhaps surprisingly, many languages are fortunately made with that kind of obvious dogfooding in mind. So maybe it’s not that C and C++ are particularly great at the task, but that they are the most popular languages so you’d expect them to be used for more of everything – including compiler construction.

            1. 2

              Yeah I should say I don’t think C is a great language for writing other languages, from the viewpoint of 2019. My own language Oil is not written in C :) I’m explicitly trying to avoid too much low-level detail in the implementation.

              But it is absolutely great compared to assembly! Language designers don’t worry about portability nearly as much as they used to, largely due to C. And LLVM, which is written in C++.

              C really is portable assembler. And that was a huge advance for the field of programming. It used to be that programs were obsolete when the hardware they ran on became obsolete!

          2. 1

            Nit pick: I’m seeing the essay on page 122 of the PDF (or 111 by the numbers at the bottom of the page).

          3. 19

            On the one hand, this makes sense if you’re writing code for a machine that has many gigabytes of RAM and a CPU with a clock speed of several GHz, and your code doesn’t have to touch the hardware directly.

            On the other hand: if the hardware doesn’t allow for such luxuries, several of these points don’t make much sense (multi-variable return through tuples, iterators, ..), so the only languages that still make a fair comparison are probably Forth and Fortran.

            I’ll note some of my thoughts:

            C is fairly old — 44 years, now!

            HTTP turns 30 this year, and TCP/IP is more than 10 years older than HTTP. It’s a bit weird that people think that anything that has a double-digit age is necessarily bad.

            Alas, the popularity of C has led to a number of programming languages’ taking significant cues from its design

            Of course, stupidly copying already-existing things isn’t a good idea (and it’s especially hard to notice them if they’re the only possibilities you know of), but then again, if you can afford it, you aren’t forced to use C. (But don’t overdo it, Electron programs are very unusable on my machines.)

            Textual inclusion

            That’s an artifact of the hardware it was first developped on. (Although the compiler could be made to read symbol information from already-built object files, I guess.)

            Optional block delimiters

            Braces and semicolons

            Same thing, as it makes the parser much easier to implement.

            Bitwise operator precedence

            Increment and decrement

            !

            Assignment as expression

            Switch with default fallthrough

            These are quirkynesses or legacy cruft indeed. (Although, somehow, chained assignments like a = b = c = d result in better optimized code on some platforms[citation needed])

            Leading zero for octal

            That made sense for PDPs, which had 18-bit words.

            No power operator

            Integer division

            Another artifact of the PDP hardware: there was no hardware instruction for pow, nor did it have an FPU, so you’d still have to have something if you wanted to divide numbers. (The majority of the hardware I wrote code for doesn’t have an FPU either. And yes, most of those are made after 2000. Then again, some of them don’t have a division — or sometimes even a multiply — instruction either.)

            C-style for loops

            As iterators would generate too much cruft (and LTO-style optimizations weren’t really possible), this was the most expressive construct that enabled a whole range of iteration-style operations.

            Type first

            I doubt it’s the ‘type first’-part of the syntax that causes the problems, but rather how pointer and array types are indicated.

            Weak typing

            Again, if you’re working close to the hardware, you want to be sure how things are actually represented in memory (esp. when working with memory-mapped IO registers, or when declaring the IDT and GDT on an x86, or …), as well as type-punning certain data.

            Bytestrings

            Single return and out parameters

            More instances of a mix of the need to know the in-memory representation, and legacy cruft.

            Silent errors

            Exceptions require a lot of complex machinery (setjmp/longjmp, interrupt handling, …), which mightn’t be feasible due to a number of reasons (CPU speed, need for accurate/‘real time’ timing, …). The “monadic” style seems to be implemented with a lot of callbacks, which isn’t that useful either. (Of course, there could be a better way for implementing those.)

            Nulls

            On some platforms, dereferencing a null sometimes does make sense: on AVRs, the general registers are mapped at 0x0000 to 0x001F, on the 6502, you’d access the famous zero page (although C doesn’t work that well on the 6502 to begin with), for some systems, the bootloader/… resides there (and is not readable in normal operation mode), and even on Linux, you can do this:

            // needs root --->
            int fd = open("/proc/sys/vm/mmap_min_addr", O_WRONLY);
            write(fd, "0\n", sizeof("0\n"));
            close(fd);
            // or echo 0 | sudo tee /proc/sys/vm/mmap_min_addr
            // <---
            
            // or create an ELF file whose segment headers map data to address 0.
            void* map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, 0, 0);
            
            *((size_t*)map) = 0; // works!
            

            (EDIT: re: ELF file that maps something to 0: see also, note the ORG 0)

            And that’s why it’s considered undefined behaviour.

            No hyphens in identifiers

            That’s another choice in syntax when using infix operators. Whitespace is unimportant or hyphens in identifiers, pick one. (Or use a different symbol for subtraction, but that’d very probably result in something silly.)

            1. 22

              “this makes sense if you’re writing code for a machine that has many gigabytes of RAM and a CPU with a clock speed of several GHz”

              There were people using more memory-safe, non-C-like languages in the 80’s. Modula-2 was bootstrapped on a PDP-11/45. Amiga’s had Amiga-E. Ada and Java subsets are used in embedded systems today. People used Schemes, Ocaml, and ATS with microcontrollers.

              You’re provably overstating what ditching C requires by a large margin.

              1. 3

                You’re right in that aspect, although I still have to resort to hand-coding assembly now and then (avr-gcc isn’t that great, so I doubt OCaml, let alone Scheme, would be faster). But then again, it’s not always the case that 128k of data needs to be processed within a millisecond (because of weird memory timings).

                1. 10

                  Meanwhile Forth lets you code even closer to assembly than C does without forcing you to give up on interactivity.

                  1. 5

                    , so the only languages that still make a fair comparison are probably Forth and Fortran.

                    Hence what I wrote in the original comment:

                    […], so the only languages that still make a fair comparison are probably Forth and Fortran.

                    I usually tend to roll my own Forth compiler (or ‘compiler’) when I need to write a lot of boilerplate code and there’s no (good enough) C compiler available.

                    1. 7

                      ‘Steve, did he just tell me to go Forth myself?’

                      ‘I believe he did, Bob.’

                      1. 1

                        I usually tend to roll my own Forth compiler

                        Well, I do admire that you straight up roll your own on the platform. The fact that it’s easy to do is one of Forth’s design strengths. I do wonder if you just do it straightforward like an interpreter does or came up with any optimizations. The Forth fans might be interested in the latter.

                        1. 2

                          There aren’t much optimizations in it (because all the code that needs to run quickly is written in assembly), hence the quotation marks. It is compiled (at least to a large extent), though, there’s no bytecode interpreter.

                          Also, before you lose your sleep: this is definitely NOT for products that will be sold, it’s only for hobby projects (demoscene).

                          1. 2

                            Oh OK. Far as sleep, all the buggy products just help justify folks spending extra on stuff I talk about. ;)

                    2. 4

                      That’s more fair. Resolving the weird stuff at language level might be trickier.

                  2. 8

                    Optional block delimiters Braces and semicolons Same thing, as it makes the parser much easier to implement.

                    S-expression parsing is far easier to implement — the sort of thing a high-school student can do in a weekend. S-expressions always seem like such a win that it’s remarkable to me how few languages use them.

                    1. 6

                      That’s also true. (I once wrote a Lisp interpreter in TI-BASIC. In high school, indeed.)

                      EDIT: although, it’s still easier than whitespace-sensitive syntax, which the article was comparing it to. (I should’ve been more explicit.)

                      1. 2

                        S-expressions always seem like such a win that it’s remarkable to me how few languages use them.

                        Yes, they’re extremely easy to parse, but there’s a reason Lisp is said to stand for Lost In Stupid Parentheses. Obviously structuring your code nicely can alleviate most of the pain, but I definitely see the appeal of a less consistent structure in favour of easy readability. IMO some of the parser programmer’s comfort is a price worth paying to improve the user’s experience.

                        (For serialization, on the other hand, S-expressions are indeed very underrated.)

                        1. 1

                          Which is what Julia did. I read that it was sugar-coated syntax around femtolisp.

                      2. 7

                        On some platforms, dereferencing a null sometimes does make sense: on AVRs, the general registers are mapped at 0x0000 to 0x001F, on the 6502, you’d access the famous zero page (although C doesn’t work that well on the 6502 to begin with), for some systems, the bootloader/… resides there (and is not readable in normal operation mode), and even on Linux, you can do this….. And that’s why it’s considered undefined behaviour.

                        Dereferencing the macro NULL or (void *)0 is as far as I know always undefined behavior. Even if you have something at address 0x0000, the bit representation of NULL doesn’t have to be identical to that of 0x0000. According to the abstract C machine NULL simply doesn’t point to any valid object, and doesn’t have to have any predefined address other than that expressed by (void *)0.

                        1. 5

                          This. A null pointer isn’t “a pointer to address 0”, it’s “a pointer to nothing valid that can be used as an in-band marker for stuff”.

                          1. 2

                            ‘Username checks out’, as I believe the kids say nowadays

                            1. 0

                              Well, technically, your response is true.

                              On the other hand, NULL is pretty much always defined as (void*)0, and dereferencing it pretty much always gets compiled to something like mov var, [0] or ldr var, [#0] or … The standard is only a standard :), I consider the compiler extensions etc. as part of the language when coding, for practical reasons.

                              1. 1

                                On the other hand, NULL is pretty much always defined as (void*)0, and dereferencing it pretty much always gets compiled to something like mov var, [0] or ldr var, [#0] or … The standard is only a standard :), I consider the compiler extensions etc. as part of the language when coding, for practical reasons.

                                This is simply incorrect and you’re missing the point completely. (void *)0 is a valid definition for NULL because (void *)0 is the C language’s null pointer. If 0x0666 is the actual address of the null pointer, then it’s your compiler’s responsibility to translate each (void *)0 to 0x0666. In math you often use 0 and 1 to represent neutral elements of operations, even if the operations don’t actually happen on numbers (for example: the zero-vector (…), the identity function, etc), and specially not on the actual 0 and 1 we know.

                                Here’s what GCC 8.2 x86-64 does, on the probably most used architecture right now?

                                P.S: I keep using (void *)0 but that’s equivalent to 0 which is equivalent to NULL in a context involving pointers.

                            2. 2

                              write(fd, “0\n”, sizeof(“0\n”));

                              sizeof("0\n") is three, not two.

                              re: ELF file that maps something …

                              45 bytes: https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

                              1. 1

                                sizeof(“0\n”) is three, not two.

                                $ printf '0\n\0' | sudo tee /proc/sys/vm/mmap_min_addr
                                0
                                $ hexdump -C /proc/sys/vm/mmap_min_addr
                                00000000  30 0a                                             |0.|
                                00000002
                                

                                It’s not a disaster :)

                                45 bytes: https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

                                I’m very familiar with that, I’ve made some programs that misuse the knowledge presented there (a, b), and furthermore, this trick makes the program linked from here work again:

                                Here you can find an even smaller hello-world program, apparently written by someone named Kikuyan (菊やん). It is only 58 bytes long, […]

                            3. 6

                              Oops, I just realised this article is from 2016. I hope Eevee doesn’t mind that I just sent her some comments about this article.

                              My choice quotes:

                              Lisps exist on a higher plane of existence where the very question makes no sense.

                              and

                              The alternative is dromedaryCase, which is objectively wrong and doesn’t actually solve this problem anyway.

                              1. 4

                                Interestingly enough, C95 specifies and, or, not, and some others as standard alternative spellings, though I’ve never seen them in any C code and I suspect existing projects would prefer I not use them.

                                Heh, so it does this by letting you #include <iso646.h>, which just contains some lines like #define or ||. I think actually using this “feature” with existing code seems pretty error prone. If anyone has decided that compl is a good variable name they’re going to be wondering why the compiler is treating it like a ~ character…

                                1. 3

                                  A very interesting read, thank you for posting!

                                  It’s interesting to see though that regarding curly braces and indents the writer seems to have a contradiction. On the one hand he suggests to use braces for clarity and making the code less error-prone here, while he suggests eliminating braces (or well, indents) here.

                                  Also, man, it’s weird what syntax decisions some languages come up with… Looking at you Perl, using :8<777> for octal number notation.

                                  It’s really great to see though that programming languages have improved throughout the years, even though everybody seems to be copying each other. That’s how I see it anyways.

                                  1. 1

                                    Agreed on the syntax decisions. At least we are slowly meandering towards using 0o777 for octal for parity with 0xFF for hex, or just omitting it entirely since it’s so seldom useful these days. Python3, C# and Rust all take one of these decisions, for instance. Maybe Go 2 will do it too someday?

                                    Lisp’s that use # for starting many different non-symbol data types get an honorable exemption from this, though annoyingly Clojure seems to use 0-prefix alone for octal numbers. Ah well.

                                  2. 2

                                    Ruby doesn’t automatically give required files their own namespace, but doesn’t evaluate them in the caller’s namespace either.

                                    I don’t think this is true. The reason it seems that way is that module and class definitions in Ruby are scope guards (meaning the scoping rules are somewhat non-lexical). An example to demonstrate that requires are indeed evaluated in the caller’s context is as follows

                                    # a.rb
                                    Module A
                                      def self.call_b
                                        B::call
                                      end
                                    end
                                    
                                    # Note that B is nowhere in sight
                                    A.call_b
                                    
                                    # b.rb
                                    Module B
                                      def self.call
                                        puts "Called inside module B"
                                      end
                                    end
                                    
                                    # c.rb
                                    require './b'
                                    require './a'
                                    
                                    1. 4

                                      You’re right but it’s worth understanding how it ends up this way. It’s been many years since I’ve written significant Ruby code so I could be fuzzy on the details but I’ll take a stab at it.

                                      The reason it works is that constant lookup and definition (capitalized names) follows different rules from other kinds of definitions: local variable, class variable (@@), instance variable (@), and method (def) (ignoring globals ($) and special variables and special constants for now). Most of the various scoping rules have useful properties but it does make Ruby a complicated language if you want to really dig down into the details.

                                      One way to start looking into constant lookup behavior is calling the Module.nesting method. When that list has been traversed and it still hasn’t found the constant then it checks the top level scope (I believe this order changed between Ruby 1.8 and 1.9, so I’d have to check). In the case of the Ruby top-level scope, this happens to be equal to opening Object (if you check what the class of self is at the top level you get Object). So in your example, the cross-file sharing is happening because b.rb adds B to Object. a.rb then looks for A::B and fails to find it, falling back to Object::B.

                                      The scope instance at the top level of each module is also the default target for method definition outside of a class or module block so the following is true:

                                      def abc; end
                                      abc # valid call
                                      Object.instance_methods.include? :abc #=> true
                                      

                                      However if I use self at the top-level:

                                      def self.xyz; end
                                      xyz # valid call
                                      Object.instance_methods.include? :xyz => false
                                      

                                      This method ends up on the eigenclass/singleton class of self (NB: singleton class always seemed to confuse non-ruby coders as it sounds similar but is not a singleton object).

                                      Likewise, setting a local or instance variable does not leak across file scopes. Class variables do and you can test this in irb with two prompts like:

                                      > @@foo = 42
                                      > irb # spawn a new irb evaluation context, similar to having a new file
                                      > @@foo #=> 42
                                      

                                      So with that tested, we can see that Ruby files “leak” quite a bit but with modules being the preferred container for writing out code at the top-level, this is rarely a problem outside of lazy or novice coding. This seems similar to Python’s we’re all adults approach to public fields on classes. The fact that Ruby doesn’t enforce rigid separation is double edged sword but for the ~10 years that I wrote Ruby full time, it was never an issue. The main recommendation is to guard yourself when bringing in third party code. I usually had a script run that would diff constants and methods before and after a require call to see what something touched. It can be pretty telling if something finds it to be an advantage to implicitly modify a bunch of things and I generally avoided using those libraries. Those that haven’t tell me their horror stories of “that time they had to use Ruby” which almost always boils down to this issue.

                                      1. 1

                                        Your explanation should be a blog post.

                                        I’ve been writing Ruby for a while as well but the scoping rules still bite me and I use a subset I can reason about by just reading the source to avoid accidentally evaluating something in a caller’s context. Usually that means putting things inside modules or classes and avoiding script level evaluation like in the example I posted.

                                        1. 2

                                          Thanks for the suggestion. I’ve been meaning to check up on how Ruby has changed over the past few years so this might be a good way to do so.

                                    2. 2

                                      I’ve read a bit about Oberon and I really found myself enjoying the simplicity of the language. I know Oberon was used to write the Project Oberon operating system, but I wonder what the ergonomics of having Oberon interact with the processor was like.

                                      1. 2

                                        A lot of these points seem very nit-picky, or dependent on the goals of the language. Many of these options are not inherently bad (e.g. for loops). Switch with default fallthrough is fine imo. The default behaviour is really just preference and what the designer thinks will be used the most. In languages which default to break, there should definitely be a fallthrough statement, though. As @pcy said, the type weirdness is mostly due to pointer/array semantics (and typedef). Weak typing is mostly due to everything historically being int by default. Integer division isn’t that bad (since not all computers have FPUs). Having separate operators is fine as well. Increment and decrement are again not “bad.” They do not hinder the language. ! is completely fine; this is probably the most baseless complaint. Might as well remove * and & while you’re at it.

                                        Returning an error code kinda sucks. Those tend to be important, but nothing in the language actually reminds you to check them

                                        Non-standard C to the rescue! __attribute__ ((warn_unused_result)) (for gcc/clang) and _Check_return_ (for msvc) allow you to make sure code does something with the return value. You may object that this isn’t in the core language, but you can always just use some defines/ifdefs to solve that.

                                        Assignment as expression

                                        Once again, compiler features come to the rescue. -Wparentheses (enabled by -Wall in gcc and clang) or /W4 (for msvc) will warn if you don’t enclose your assignment in parenthesis when used as a conditional. e.g.

                                        int x;
                                        /* warns */
                                        if (x = 0)
                                          ;
                                        /* fine */
                                        if ((x = 0))
                                          ;
                                        

                                        This extra syntax draws attention to the assignment (and will warn if you accidentally type = instead of ==). Personally, I almost never use this syntax (preferring to do err = func(); if (err) /* ... */; over if ((err = func())) /* ... */;).

                                        Braces and semicolons

                                        I like my braces. You get to choose what indentation style to use, without getting yelled at by the compiler all the time. As someone who uses tabs, the default of spaces for indentation is a sore spot.

                                        1. 1

                                          The first point is a “maybe” for Forth. You’d namespace it in a new vocabulary if you wanted. Forth didn’t have braces removed - they were never there to begin with. Braces are a spook. You’ll have NEGATE to invert a condition for an IF..THEN statement. You can also have hypens in Forth words. I could make a word : 2*7-5 7 5 - 2 * ; and the interpreter wouldn’t even blink an eye.

                                          Stories with similar links:

                                          1. Let’s stop copying C via jamesnvc 2 years ago | 68 points | 32 comments