1. 26
  1. 8

    “fast” and “slow” are slightly misleading euphemisms. The CPU runs at the same speed. A CPU-inefficient language takes uses more instructions deliver the same result.

    Pity Nim is not in the comparison. It should be very close to C in most regards.

    1. 4

      Is Faster Greener?

      This is even more complicated than what the article is talking about. For example, new Intel CPUs only enable AVX hardware when it’s being used (see latest PoC||GTFO), and I dare to bet the same goes for floating-point things. Interrupt-heavy tasks probably use much more energy because of the hardware machinery involved, as well as communication between CPU cores.

      Tasks that need the GPU as well are probably even more demanding, but including those in comparisons is cheating.

      The Pros of Compiled Languages

      This isn’t exactly a surprise, less abstractions (incl. the JIT or interpreter) the code has to go through to do “a single thing” means less energy used. Same goes for the imperative vs. the rest comparison.

      1. 4

        I’m a little confused by the different numbers for javascript and typescript, since ts compiles to js. I’m guessing they ended up running two very different sets of programs, without realizing they’re the same language. :)

        1. 3

          Assembly on 4-bit MCU’s or maybe smallest MCU on most cutting-edge, low-power node with power-optimized cells.

          1. 5

            A carefully engineered ASIC. Then you can skip all the instruction decoding cruft etc.

            Although that isn’t exactly software anymore.

            1. 4

              I was going with software because they were. That said, the title question says programming languages. There are HDL’s, HLS, and so on that produce ASIC’s. Programming languages for hardware. You win under that interpretation. :)

            2. 4

              Careful there, because not all assembly (on the same architecture, on the same computer) is not equal.

              Back in college, I wrote a program in 8088 assembly that needed to calculate the sin and cos of values. So for example, if I wanted x * sin(50) (since I was working with degrees), I would take x, multiply it by 7660 (the sin of 50 multiplied by 10,000) and then divide the result by 10,000. Faster than floating point and good enough. But instead of creating a table of 0 to 1 in 10,000 increments, I did a table of 0 to 1 in 65,536 increments, I could have eliminated the DIV entirely [1].

              Ah, experience.

              Also, a binary search in C (with a large enough array) will beat a linear search in assembly, even with poor code generation from the C compiler. As Michael Abrash said: “always know your data.”

              [1] On the 8088, a MUL instruction would leave the result in DX:AX (DX containing the high word of the result). If the sin table of 0 to 1 was in 65,536 increments, I would have had to shift the result right by 16 bits, but AX is already 16 bits, so the value I would want would be in DX. No shift necessary.

              1. 2

                Yeah, definitely gotta watch the timings. Nice tricks. Thanks for Abrash link, too. I will note that search example isn’t apples to apples given I’d do binary search in assembly for the comparison.

              2. 3

                You mean ArrayForth on a GreenArrays GA144? http://www.greenarraychips.com/

                1. 1

                  Great example! Maybe depending on goal posts. Really energy-efficient. But…

                  “and a prototype in 130 nm has also performed well.”

                  That’s a limitation of him insisting on his primitive, DIY, CAD tools. If he targeted later node, his numbers would be even more impressive. He’s holding himself back with same tools that he made himself so strong with. Just imagine what he could do on 28nm SLP or lower even synthesizing his otherwise highly-efficient designs. He won’t understand all its inner workings. Buyers will understand its value. He seems to not accept that, though. The world’s loss…

                  1. 2

                    I think it’s all about the economics. For chip designers, they were working on a shoe-string budget. Tape-out is always really expensive, but older processes are much more affordable. If GreenArrays could find customers willing to buy their designs in quantity, moving to a finer process wouldn’t be too much of a stretch, even for their homegrown tools. Starting with an old, cheap process was just good risk management.

                    Commercial EDA tools are pretty expensive too, but I doubt Chuck (or the others) would have used them even if they were free. For starters, they never really believed in synthesis. They wanted simplicity and complete control over the entire design, and were willing to trade a lot for that. Can’t be better without being different!

                    1. 1

                      That’s true. Yet, small firms crank out standard, cell designs in shuttle runs all the time ranging from 90nm-28nm. Like with your company, one can also get piles of money from DARPA or something to fund the initial R&D costs. eASIC used to do prototypes with their eBeam machines for around $50k, too.

                      All that’s about economics, though, which isn’t the cause here. You already figured that out:

                      “but I doubt Chuck (or the others) would have used them even if they were free. For starters, they never really believed in synthesis. They wanted simplicity and complete control over the entire design”

                      There it is. As I said, he is ideologically about a specific language, unusual bit size, total control when it’s unnecessary (outside Intel etc), and compatibility with obsolete-for-current-nodes EDA. That’s quite a box he drew around himself. Being an elite engineer, he pulled off some amazing design with impressive numbers despite the constraints. I even watch his progress since high-security might require old nodes with future tech emulating his.

                      Nonetheless, his limitations are self-imposed based on personal preferences more than economic. That’s why Adapteva managed to crank out several iterations with a few million mostly funded through sales. They cut obstacles out instead of added them. Maybe add the mining ASIC’s, too, with one 28nm design costing a few, hundred thousand due to its simplicity.

                    2. 1

                      Buyers will understand its value.

                      I’m very skeptical of this. There are so many barriers for buyers in terms of the massive paradigm shift that I doubt pushing even more power efficiency would make much of a difference. The objections people have to using GreenArrays have everything to do with having to learn everything over from scratch.

                      1. 1

                        There’s other companies in that sector selling both normal and odd cores low-power. They seem to be making money. All he’s gotta do is make something compatible with existing toolchain and I.P. with better numbers. Instead, he goes extra weird with it forcing a paradigm shift plus product evaluation.

                        Great tech. Bad marketing. Common problem.

                2. 3

                  I’m surprised that Go uses less memory than C does.

                  Kinda weird that they considered Java to be an interpreted language; it’s been effectively JIT compiled for years now.

                  1. 13

                    You shouldn’t be surprised. Go binaries are thoroughly stripped down, they don’t even depend on libc. Go’s runtime barely has any static state. But libc has to keep around quite a bit of legacy cruft. That’s why Pascal does even better than Go for memory use, it barely has a runtime at all.

                    I’m not a big fan of Go, but I will admit the toolchain does a great job linking.

                    1. 8

                      On the other hand, one could statically link C code with musl, enable LTO, and use -ffunction-sections -fdata-sections -Wl,--gc-sections to get rid of unused symbols, in order to get rid of most of the legacy cruft.

                      However, nobody does this (except maybe LTO), and none of it is enabled by default (for good reason).

                    2. 1

                      JIT by definition is only a partial compilation. It is also a run-time procedure, and the effectiveness is never in the neighbourhood of promises in whitepapers.

                      1. 1

                        This article demonstrates otherwise ;-p