The article presents a lot of interesting and useful facts. However, whenever I read this, I ponder. Are there many languages that are lower level than C? (Besides assembly, of course.) I get that architectures nowadays are much more complex than the PDPs of yore, and that much of them is not exposed to C (or assembler, for that matter!), but if C is close to being the lowest-level language available, I think it’s very nitpicky to say that one of the lowest-level languages available is not low level.
(edit: also, if examples of lower-level languages are few and far between, and obscure… I think that kinda reinforces the point instead of disproving it.)
Plus, if you play a bit with GodBolt, really a good amount of C maps quite clearly to CPU instructions. (Even if there’s microcode below and all that yada yada.)
I insist: it’s good to forget that C maps very closely to the underlying hardware. But I’m not a huge fan of expressing that as “C is not a low level language”. Although I really cannot come with an alternative title that is concise and accurate.
Are there many languages that are lower level than C?
Part of my point was that ‘low level’ is not a one-dimensional property. CUDA, for example, is a lower-level language on an NVIDIA GPU, because it more directly exposes the memory model and the scheduling. It is a higher-level language on a CPU because that scheduling and batching model needs to be emulated.
The problem with thinking of C as ‘low level’ is that both hardware and compilers have built a lot of complex abstractions to try to maintain that fiction. We’re starting to hit some walls there. Cache coherency protocols really struggle over about 128 cores but are necessary for presenting a C-like abstract machine to programmers. A language that enforced shared XOR mutable would not need this and could allow the compiler to insert explicit cache management operations (safe Rust has this property but unfortunately you can’t write any multithreaded code in Rust without at least some standard-library code that violates it in unsafe blocks, though you could potentially rewrite this code to do the right things with caches).
Within my limited knowledge, I think I agree completely with the point you are making and I think your arguments are solid (and interesting to me!).
But reading some replies in this thread, I’m starting to think that people in computing we don’t have a definition of what a “low level language” that we agree on.
Also I asked if there are many languages lower level than C- I know they exist. (I even mentioned assembly. Now I’m wondering if we also have a definition of what a “programming language” is, because there are discussion about that too.) But if we find few examples, then for sure C is not the lowest level language available, but maybe it’s “low”? (Although personally, I’m fine calling it a medium level language. I think it sits in a distinct tier than CUDA on a GPU and assembly.)
(Still, I’m discussing semantics and word usage. Unfortunately, this seems to be an unsolvable problem.)
edit: re-reading the article, I think I agree with Perlis’ definition of low/high level, and that is why I think C is one of the lowest level programming languages out there. Maybe we can accept this definition and say “languages being low level or high level according to Perlis’ definition… have some connection to performance, but not as much as you would expect”?
Plus, if you play a bit with GodBolt, really a good amount of C maps quite clearly to CPU instructions. (Even if there’s microcode below and all that yada yada.)
As I understand it, the CPU instructions have evolved alongside C, even if it’s not actually close to what the processor is doing in practice. These days they are possibly better thought of as an API for the processor - when it has to do a bunch of work to undo the “low level” PDP-11-style programming patterns in order to get things to run fast and efficiently. From what I understand, the author of the article is advocating for processors that support higher performance programming patterns for languages other than C, to avoid this dance entirely.
Yup, you quoted my mention of “microcode”. (Although I’m not 100% sure it’s the accurate term. I mean the stuff that executes below assembly.)
But I believe say on x86 you cannot easily access that with any programming language, right? You may or may not strike upon the right incantations in your code that will get you the “best performing microcode”, but the chances of doing that might not be so much related with the specific programming language you use, and whether it’s high level or not.
(I realize that’s likely one of the points of the article. But still, I think the main “issue” is that we don’t agree on a definition of what “low level” means, and so it’s hard to discuss what’s lower level than C, etc.)
I think arguing whether or not something is a low level language is fruitless because everyone is bringing a ton of baggage to the conversation.
The point for me is that C is no more or less “low level” than many other compiled languages. Maybe there could be (or is) a language that more cleanly maps to what a modern CPU does but I’m not sure it’s worth calling that “more low level”. Perhaps saying it’s “a better abstraction” would be the way to phrase it.
That’s more or less what I’m thinking. I’m saying that if “low level language” is a valid term, few languages other than C would qualify; but saying “low level language” is a “useless” term is also quite valid.
I think there’s a useful “classification” where C is near to one end, and most other common languages are closer to the other end. Because historically in this business we are terrible at agreeing on clear terminology, maybe it’s a fool’s errand to try and articulate this, though.
On one paw, yes you can often predict the way that small snippets of code will compile in Godbolt, especially without LTO and PGO or running BOLT. But I have also seen beginners (including myself) confidently state misconceptions about if statements and pointers that they learned from a quick rundown on YouTube, a Tweet, or in classes, and they miss the huge amount of nuance in how compilers decide to compile these.
I’ve spent plenty of time massaging GCC and Clang into both compiling a function how I want it to compile, and I’ve noticed plenty of times that different kinds of IR constants affect the way that if statements vectorize or generate cmov’s, and the fact that it’s difficult to make two different compilers agree on what code should look like is in my opinion a strong indicator that these concepts aren’t very low-level. I recall in my college days one student telling a C++ teacher to turn an if-else chain into a switch statement for performance reasons that he couldn’t articulate (he got this from a Twitter post about YandereSim code, which also isn’t even C/++). Afterwards I put the teacher’s code in question into Godbolt, enabled optimizations, and saw the two equivalent implementations generate the same code.
Telling beginners that a switch statement is a jump table is misleading, as is telling them that pointers are indirections, and telling them that if statements are branches. So I’m skeptical that we can generally predict what generated code will look like in a useful way.
Certainly, that is a misconception we need to get rid of. Like “writing assembler is the most effective way to get the most performance”, which I think fell out of style some time ago.
Performance is hard. Fortunately, most (by my own insufficient personal experience) people live in spaces where getting the performance they want can be done by reasonably simple benchmarking and experimentation, and not by intimate knowledge of modern computer architecture, which evolves so quickly that most of us are hopelessly outdated from when we got our general computing education.
I feel the message is: once measurement and experimentation fail to get you the performance you need, abandon all hopeprevious knowledge that you thought you had about how a computer works, including anything you thought you knew about C from decades ago on your CS degree.
Might be totally misremembering, but I think I remember having seen it explained that the single biggest thing that made C– used in GHC quite different from C is that in C– some things like stack manipulation were explicit.
Edit to clarify: I am not at all sure that the thing that was called “C–” in GHC had any lineage in common with the one mikea linked above. I am almost certain there have been at least two entirely unrelated languages both called “C–” because of having similar aims.
B was not lower-level than C. It was word-oriented which makes it a bad fit for byte-addressed computers. And for space reasons (it needed to fit on a PDP-7 with 4K 18-bit words) it compiled to a threaded code interpreter.
There were some stack machines built to execute Forth directly; Chuck Moore himself got into that near the end of his career, after all the important ships had long since sailed. The Forth community couldn’t agree on much, in terms of language features. But there was a broad consensus that the success of C over Forth was a terrible tragedy in terms of expressiveness and abstraction; basically the same complaints as the Lisp community had.
The article gives an example of how Fortran is lower-level than C in at least one important respect. I’m not an expert but I believe many other languages are better than C in the same important respect (aliasing) - I often see OCaml and Ada mentioned.
And surely many many functional languages have better low-level control over shared memory than C does? The article discusses Erlang and Smalltalk in this respect. (Side issue, but the main implementations of Smalltalk, although they’re usually used in a high-level way, have extremely low-level primitives.)
Hmm, I’m not an expert, but in my view, lack of aliasing is not a “low level” characteristic. It’s more how often “lack of features” help optimization.
(For example, SQL being high-level and very declarative makes it easier for the optimizer to find parallelization opportunities, but it doesn’t make SQL a low level language, but rather the opposite.)
I feel the article fails in providing a clear, unambiguous definition of what is a “low level language”.
The one I used for myself, is that a programming language is low level if the generated binary is linear in size of the original code. Mainly saying that every instruction in the language is directly mapped to a set of instructions in assembly without any loop. If you don’t use C preprocessor instructions, this was probably the case before, but probably not anymore.
But this article appears to tell that even if C was “low level” according to my definition. We could still consider that it is not “low level” on what most people would agree low level is. Mainly, when you code in C you cannot have a clear idea about the instructions that will be executed because the most recent compiler made so much optimisations some of them are hidden and implicit.
What I take from this article, is that, unless coding in assembly, the CPU gained so much complexity, it became almost impossible to really have a full understanding of how your program will behave once compiled even with C which, I also consider as a “low level” language.
And from my perspective, this gives, then another good reason to consider that C is not fully suitable for “low level programming” anymore, and for most use case we might now give other programming languages a chance.
Unless you are programming for a specific hardware you control and that cannot use these latest CPU optimisations, you should probably consider using a higher level programming language, and you will probably not lose much control, but gain a lot more ability to express yourself.
I would also add, that not even coding in assembly is close enough to the real metal!
As I mention in the linked post, I think low-level languages have some connection to performance, but it’s not a strong connection. I think I already said it somewhere, but using higher-level languages can make optimizations easier… and this is before considering complex modern architectures. (E.g. the “lower level” a language is, the more complex it might be for a compiler/interpreter to infer parallelization automatically.)
Wow, this is now the fourth most downloaded article in the AC, digital library, and the second most recent of the top eight. Amazing what clickbait titles will do to your bibliometrics.
I still find this argument as unconvincing as it is intriguing. As far as I can tell, the first part is about hardware being far more complex than what “C” gives you access to. Yes, computers are not fast PDP-11s any more.
However…most of this hardware is just as hidden from assembly language and the machine instruction set. Is machine language “not a low-level language”? Seriously?
Of course, the title is true in one sense, but not in the sense intended by the author: when C came out and until at least the mid to late 80s, C was considered a high level language, not a low-level language. Because the low level language was assembly.
The article presents a lot of interesting and useful facts. However, whenever I read this, I ponder. Are there many languages that are lower level than C? (Besides assembly, of course.) I get that architectures nowadays are much more complex than the PDPs of yore, and that much of them is not exposed to C (or assembler, for that matter!), but if C is close to being the lowest-level language available, I think it’s very nitpicky to say that one of the lowest-level languages available is not low level.
(edit: also, if examples of lower-level languages are few and far between, and obscure… I think that kinda reinforces the point instead of disproving it.)
Plus, if you play a bit with GodBolt, really a good amount of C maps quite clearly to CPU instructions. (Even if there’s microcode below and all that yada yada.)
I insist: it’s good to forget that C maps very closely to the underlying hardware. But I’m not a huge fan of expressing that as “C is not a low level language”. Although I really cannot come with an alternative title that is concise and accurate.
Part of my point was that ‘low level’ is not a one-dimensional property. CUDA, for example, is a lower-level language on an NVIDIA GPU, because it more directly exposes the memory model and the scheduling. It is a higher-level language on a CPU because that scheduling and batching model needs to be emulated.
The problem with thinking of C as ‘low level’ is that both hardware and compilers have built a lot of complex abstractions to try to maintain that fiction. We’re starting to hit some walls there. Cache coherency protocols really struggle over about 128 cores but are necessary for presenting a C-like abstract machine to programmers. A language that enforced shared XOR mutable would not need this and could allow the compiler to insert explicit cache management operations (safe Rust has this property but unfortunately you can’t write any multithreaded code in Rust without at least some standard-library code that violates it in unsafe blocks, though you could potentially rewrite this code to do the right things with caches).
Within my limited knowledge, I think I agree completely with the point you are making and I think your arguments are solid (and interesting to me!).
But reading some replies in this thread, I’m starting to think that people in computing we don’t have a definition of what a “low level language” that we agree on.
Also I asked if there are many languages lower level than C- I know they exist. (I even mentioned assembly. Now I’m wondering if we also have a definition of what a “programming language” is, because there are discussion about that too.) But if we find few examples, then for sure C is not the lowest level language available, but maybe it’s “low”? (Although personally, I’m fine calling it a medium level language. I think it sits in a distinct tier than CUDA on a GPU and assembly.)
(Still, I’m discussing semantics and word usage. Unfortunately, this seems to be an unsolvable problem.)
edit: re-reading the article, I think I agree with Perlis’ definition of low/high level, and that is why I think C is one of the lowest level programming languages out there. Maybe we can accept this definition and say “languages being low level or high level according to Perlis’ definition… have some connection to performance, but not as much as you would expect”?
As I understand it, the CPU instructions have evolved alongside C, even if it’s not actually close to what the processor is doing in practice. These days they are possibly better thought of as an API for the processor - when it has to do a bunch of work to undo the “low level” PDP-11-style programming patterns in order to get things to run fast and efficiently. From what I understand, the author of the article is advocating for processors that support higher performance programming patterns for languages other than C, to avoid this dance entirely.
Yup, you quoted my mention of “microcode”. (Although I’m not 100% sure it’s the accurate term. I mean the stuff that executes below assembly.)
But I believe say on x86 you cannot easily access that with any programming language, right? You may or may not strike upon the right incantations in your code that will get you the “best performing microcode”, but the chances of doing that might not be so much related with the specific programming language you use, and whether it’s high level or not.
(I realize that’s likely one of the points of the article. But still, I think the main “issue” is that we don’t agree on a definition of what “low level” means, and so it’s hard to discuss what’s lower level than C, etc.)
I think arguing whether or not something is a low level language is fruitless because everyone is bringing a ton of baggage to the conversation.
The point for me is that C is no more or less “low level” than many other compiled languages. Maybe there could be (or is) a language that more cleanly maps to what a modern CPU does but I’m not sure it’s worth calling that “more low level”. Perhaps saying it’s “a better abstraction” would be the way to phrase it.
That’s more or less what I’m thinking. I’m saying that if “low level language” is a valid term, few languages other than C would qualify; but saying “low level language” is a “useless” term is also quite valid.
I think there’s a useful “classification” where C is near to one end, and most other common languages are closer to the other end. Because historically in this business we are terrible at agreeing on clear terminology, maybe it’s a fool’s errand to try and articulate this, though.
On one paw, yes you can often predict the way that small snippets of code will compile in Godbolt, especially without LTO and PGO or running BOLT. But I have also seen beginners (including myself) confidently state misconceptions about
ifstatements and pointers that they learned from a quick rundown on YouTube, a Tweet, or in classes, and they miss the huge amount of nuance in how compilers decide to compile these.I’ve spent plenty of time massaging GCC and Clang into both compiling a function how I want it to compile, and I’ve noticed plenty of times that different kinds of IR constants affect the way that
ifstatements vectorize or generate cmov’s, and the fact that it’s difficult to make two different compilers agree on what code should look like is in my opinion a strong indicator that these concepts aren’t very low-level. I recall in my college days one student telling a C++ teacher to turn an if-else chain into aswitchstatement for performance reasons that he couldn’t articulate (he got this from a Twitter post about YandereSim code, which also isn’t even C/++). Afterwards I put the teacher’s code in question into Godbolt, enabled optimizations, and saw the two equivalent implementations generate the same code.Telling beginners that a switch statement is a jump table is misleading, as is telling them that pointers are indirections, and telling them that if statements are branches. So I’m skeptical that we can generally predict what generated code will look like in a useful way.
Certainly, that is a misconception we need to get rid of. Like “writing assembler is the most effective way to get the most performance”, which I think fell out of style some time ago.
Performance is hard. Fortunately, most (by my own insufficient personal experience) people live in spaces where getting the performance they want can be done by reasonably simple benchmarking and experimentation, and not by intimate knowledge of modern computer architecture, which evolves so quickly that most of us are hopelessly outdated from when we got our general computing education.
I feel the message is: once measurement and experimentation fail to get you the performance you need, abandon all
hopeprevious knowledge that you thought you had about how a computer works, including anything you thought you knew about C from decades ago on your CS degree.I remember SPJ had something called C— (https://www.cs.tufts.edu/~nr/c--/). I think they wanted to have a universal backend for ghc in pre-llvm era.
I don’t think C– is any lower level than C in the sense that this article is talking about, and less so in some ways (GC runtime interface, etc..)
Might be totally misremembering, but I think I remember having seen it explained that the single biggest thing that made C– used in GHC quite different from C is that in C– some things like stack manipulation were explicit.
Edit to clarify: I am not at all sure that the thing that was called “C–” in GHC had any lineage in common with the one mikea linked above. I am almost certain there have been at least two entirely unrelated languages both called “C–” because of having similar aims.
There’s B, the language that directly influenced C. And possibly Forth.
B was not lower-level than C. It was word-oriented which makes it a bad fit for byte-addressed computers. And for space reasons (it needed to fit on a PDP-7 with 4K 18-bit words) it compiled to a threaded code interpreter.
There were some stack machines built to execute Forth directly; Chuck Moore himself got into that near the end of his career, after all the important ships had long since sailed. The Forth community couldn’t agree on much, in terms of language features. But there was a broad consensus that the success of C over Forth was a terrible tragedy in terms of expressiveness and abstraction; basically the same complaints as the Lisp community had.
The article gives an example of how Fortran is lower-level than C in at least one important respect. I’m not an expert but I believe many other languages are better than C in the same important respect (aliasing) - I often see OCaml and Ada mentioned.
And surely many many functional languages have better low-level control over shared memory than C does? The article discusses Erlang and Smalltalk in this respect. (Side issue, but the main implementations of Smalltalk, although they’re usually used in a high-level way, have extremely low-level primitives.)
Hmm, I’m not an expert, but in my view, lack of aliasing is not a “low level” characteristic. It’s more how often “lack of features” help optimization.
(For example, SQL being high-level and very declarative makes it easier for the optimizer to find parallelization opportunities, but it doesn’t make SQL a low level language, but rather the opposite.)
I feel the article fails in providing a clear, unambiguous definition of what is a “low level language”.
The one I used for myself, is that a programming language is low level if the generated binary is linear in size of the original code. Mainly saying that every instruction in the language is directly mapped to a set of instructions in assembly without any loop. If you don’t use C preprocessor instructions, this was probably the case before, but probably not anymore.
But this article appears to tell that even if C was “low level” according to my definition. We could still consider that it is not “low level” on what most people would agree low level is. Mainly, when you code in
Cyou cannot have a clear idea about the instructions that will be executed because the most recent compiler made so much optimisations some of them are hidden and implicit.What I take from this article, is that, unless coding in assembly, the CPU gained so much complexity, it became almost impossible to really have a full understanding of how your program will behave once compiled even with C which, I also consider as a “low level” language.
And from my perspective, this gives, then another good reason to consider that C is not fully suitable for “low level programming” anymore, and for most use case we might now give other programming languages a chance. Unless you are programming for a specific hardware you control and that cannot use these latest CPU optimisations, you should probably consider using a higher level programming language, and you will probably not lose much control, but gain a lot more ability to express yourself.
Yeah, I think I’m coming to similar conclusions: https://lobste.rs/s/xjwix5/c_is_not_low_level_language_2018#c_up17ze
I would also add, that not even coding in assembly is close enough to the real metal!
As I mention in the linked post, I think low-level languages have some connection to performance, but it’s not a strong connection. I think I already said it somewhere, but using higher-level languages can make optimizations easier… and this is before considering complex modern architectures. (E.g. the “lower level” a language is, the more complex it might be for a compiler/interpreter to infer parallelization automatically.)
abstract and metadata
previously (23 comments) previously (10 comments) previously (45 comments)
Wow, this is now the fourth most downloaded article in the AC, digital library, and the second most recent of the top eight. Amazing what clickbait titles will do to your bibliometrics.
I still find this argument as unconvincing as it is intriguing. As far as I can tell, the first part is about hardware being far more complex than what “C” gives you access to. Yes, computers are not fast PDP-11s any more.
However…most of this hardware is just as hidden from assembly language and the machine instruction set. Is machine language “not a low-level language”? Seriously?
The second part is about optimizations. It vastly overstates the effectiveness and importance of fairly fringe optimizations and mistakenly characterizes these optimizations that go against the spirit of C as essential to the language. See Proebsting’s Law, Fran Allen got all the good ones, What every compiler writer should know about programmers or “Optimization” based on undefined behaviour hurts performance and The death of optimizing compilers.
Of course, the title is true in one sense, but not in the sense intended by the author: when C came out and until at least the mid to late 80s, C was considered a high level language, not a low-level language. Because the low level language was assembly.