1. 2

    This article seems a little oversimplified. In a complex ISA like x86, some instructions expect things to be in particular registers, or write to particular registers. Some instructions can operate on multiple registers, but not all registers. Some constraints exist by convention, because if a function calling convention places the first four arguments in rcx, rdx, r8 and r9, the optimal place for data might be determined by a later function call. Some registers are guaranteed to be preserved across a function call, but others are not, which limits the lifetime of registers. At some point this becomes like most compiler optimizations, where the compiler has to balance competing factors: is the benefit of having this value in edi to perform an iteration worth the cost of pushing other data onto the stack to make way for it? Etc.

    1. 1

      I agree, there are things that I left out, especially in regards to applying this to a real system. I didn’t want to muddy up the idea with some of the nuances that complex ISAs have. Linear scan does a pretty terrible job at keeping track of these things, so most “real” compilers using it (LLVM < 3) have a second rewrite stage. LLVM is very interesting in this regard because the current algorithm they use is very similar to linear scanning in terms of its greedy roots, and the heuristic weights for choosing a register to assign or spill are generated by analyzing the code. It does not need to go back and forth, because the “weightings” are pre-determined. I’m interested to see how LLVM handles limited instructions as you mentioned, I’ve never thought deeply about that.

    1. 1

      About a month ago, I started my blog not intending to share it anywhere. Someone found one of my posts and shared it here and it took off. Since then, besides college apps, I’ve been writing blog posts about topics that I think are poorly covered by other resources. I’m pretty new to writing so I would love feedback on my style and explanations.

      1. 1

        I read a comment on HN claiming that LLVM uses a neural net for register allocation - can anyone here confirm (ideally with some kind of a reference/citation link) or deny?

        1. 2

          Sort of, LLVM has their own improvement upon the basic linear scanning algorithm which allows for multiple different weights and heuristics along the way. They use something called a hopfield network, a 1 layer NN, to determine which variable gets spilled.There’s also been research, not under LLVM as far as I’m aware, on using neural networks to decide which algorithm to run at compile time.

          Sources: https://www.slideshare.net/mobile/chimerawang/llvm-register-allocation-59885569 https://llvm.org/doxygen/SpillPlacement_8cpp_source.html

        1. 2

          The author provided GPT-3 with training data where all the questions have definite answers, and the answer is always “X is Y”.

          Unless I missed something, there was no training data where the answer is, “That makes no sense.”, or “I don’t know.” or “Actually, the U.S. did not exist as such in 1700, so there was no president.”

          Is it any wonder that GPT-3 followed suit and answered all the questions the same way, with the best approximation?

          I don’t think I would expect any more from a human either, if the human’s knowledge base was somehow made a clean slate, e.g. a child human.

          If you were training a child in this manner, you’d probably get similar results.

          Also, there was no opportunity for re-training. When you’re teaching a child, they’re bound to get some answers wrong, and then you would correct them, and they would also learn from that.

          No such opportunity was provided here, though I don’t know if that is technically possible with GPT-3.

          1. 3

            The author provided GPT-3 with training data where all the questions have definite answers, and the answer is always “X is Y”.

            The training data for GPT algorithms is just massive amounts of English language text from the Internet, isn’t it? I’m not sure how that’s consistent with “questions that have definite answers” - most of the training data text wouldn’t be in the form of any kind of question because most English language sentences are not questions.

            1. 2

              Training data is the wrong term - this is better termed “prompt data”, which is used to “set the stage” for GPT predictions.

            2. 2

              I’m unsure if GPT-3 can respond like that, although that would be an interesting thing to add to this. Another option would be to create some sort of autoencoder framework that lets the network determine when it is responding to something it’s never really seen before. Uber has a very interesting write-up about doing that.

            1. 5

              the python graph library networkx has an implementation of kosaraju’s algorithm that we used to replace a bunch of ad-hoc cycle detection code in our python dependency graph building library ‘importlab’

              1. 1

                That’s super interesting. I’d think there are better cycle detection algorithms than just counting SCCs, but that’s definitely a quick way to do it.

                1. 1

                  we specifically wanted SCCs as part of the deal because if a group of dependencies form a strongly connected component we need to analyse them all as a single unit, but then we can treat that unit as a node in a later graph traversal.

                  1. 1

                    Oh nice. I wonder if you could apply SCCs to control flow graphs as well, I think they translate well to general compiler problems.

              1. 1

                I would have enjoyed code over pseudocode, as the difference between the two would have been minimal, but I understand that it’s not the point of the post. I have a soft spot for learning-through-teaching, and I find it tempting to stop at a point that’s sufficient for me to move onto the next thing.

                Here’s code with as few differences from the post as I felt comfortable, someone please let me know if there’s a mistake: https://gist.github.com/aneksteind/d822c2ebebd04adb3d6014e9c0e163be. Lots of better and faster implementations are a google away

                edit: linked to code

                1. 1

                  At a quick glance, your code looks alright. If you’re going to use global variables, you’ll have to reset them correctly. I actually recommend implementing this iteratively, as it’ll run way faster.

                  I was considering sharing my actual code since I had to write a solution for the course anyway, but as you said, that wasn’t really the purpose of the post. Do you think a gist somewhere near the end with an actual implementation might be better?

                  1. 1

                    Yes, agreed on the iterative implementation; the naive recursive implementation will also blow the stack for a sufficiently sized graph, but I wanted to change as little as possible from your post as mentioned. A link to your gist in the post would be a great touch, IMO.

                    edit: there’s an iterative implementation as OP recommended on Rosetta code (https://rosettacode.org/wiki/Kosaraju#Python)

                1. 6

                  This is nice.

                  Being old skool, I would routinely start my bash scripts with a usage() function (containing a bunch of echo statements amounting to the same thing) that would be called by any failure in the getopts processing but this a simple alternative that is useful both for the user and for a casual reader of the script.

                  1. 2

                    Yeah, this is a really clever way to make your code more readable and make it easier to get help.

                  1. 1

                    Oh hey, this is me. I didn’t intend to share this until I made some more progress on the series. This is my first attempt at technical writing, so please tear it apart. I’m wondering, how did you find this?