1. 3

    What are the benefits wrt another high performance GC? O(1) allocation and de-allocation is the norm.

    1. 7

      Thank you for asking. This implementation provides short O(1) pause times, regardless of working set size, and it also compacts. Maybe such things do already exist, but I couldn’t find such things when I searched as I was thinking about this project, so I thought this would be new. I’m happy to compare to another implementation if you could point me to one.

      1. 3

        I see. Given the expensive read barrier, I think there are other ways to achieve similar characteristics with better trade-offs (I can find references tomorrow, but it is a bit late here, sorry). However, your design seems to make a lot of sense for a compacting database (the expensive read barrier can be factored in the cost to access external pages, storage overhead might be a problem though).

        Small question: when GC thread is active, what happens if the main thread fills the region before than the GC can complete?

        1. 1

          Thanks I’ll be happy to read them. Regarding your question, I use the term “region” in the code to refer to a range within the CB/ring-buffer. That region is allocated by the main thread for use by the GC thread, and then the main thread never touches it. If you’re referring to the CB/ring-buffer itself, as in “what happens if the main thread fills the CB before the GC completes?” then the answer is that the main thread will resize the CB upwards to the next power-of-2, copy all of the contents of the old CB (including the still-being-written GC consolidation region), and then when the main thread observes the GC to have completed, it will copy forward the data from the old CB’s consolidation region to the equivalent range in the new CB. In this way, the GC thread only knows about the CB that was given to it during the start of the GC/consolidation, and the main thread is responsible for the shift of the CB to a larger one.

          Thanks! –Dan

      2. 2

        Yeah, it looks like an exercise to write such GC. Other pauseless GC’s has similar characteristics and arguably cheaper in many cases. This implementation for lookups requires a hash table on-the-side and most likely requires a read-barrier (when rewire objects moved from zone A to B to C).

        1. 3

          Thanks for taking a look! Yes it does require a read-barrier for the objects which have shifted locations. I’d be interested if you could link me to the best implementations you know of.

          1. 3

            Read what you did led me immediately to Azul’s Pauseless GC. Both does tons of work on read side to ensure GC thread can finish marking and compacting in predictable way. In their case, I think they trap the pointer read to outdated pages to update the reference (in compacting phase).

            Edit: thinking through their marking phase and yours (from my understanding, your zone B is for marking). One thing I don’t quite understand is how you can make sure mutations in zone A has no pointers back to zone B / C objects, hence, you can do marking phase correctly without consulting zone A?

            1. 2

              I cheat by not using pointers but instead handles. Each allocation is just given an increasing integer ID, which is “dereferenced” indirectly through the O(log32(N)) objtable. This should be a reasonably shallow data structure, but is definitely not free, so adds overhead to the dereferences.

              1. 1

                Yeah, by pointers, I meant handles. I guess whenever zone A tries to references an object already moved to zone B, you copied it out of zone B back to zone A? Otherwise I cannot see how you can avoid in marking phase missed some objects referenced from zone A?

                1. 1

                  Yeah, the vision is that all objects are either “small” (e.g. tuples under some arity [envisioned, but not yet present in this implementation]) and are copied from either B or C into A when they need to become mutable, or they are larger (e.g. maps/collections) but are persistent objects which are mutated via path-copying of the path-to-mutated-leaf, where such copied path is written out in A and is similarly considered “small”.

            2. 3

              If you haven’t seen it already, you might find some of the work on Shenandoah interesting - specifically how they’ve optimized forwarding pointers/read barriers.

              https://developers.redhat.com/blog/2019/06/27/shenandoah-gc-in-jdk-13-part-1-load-reference-barriers/ https://developers.redhat.com/blog/2019/06/28/shenandoah-gc-in-jdk-13-part-2-eliminating-the-forward-pointer-word/

        1. 5

          What does O(n) memory usage mean? Every language has O(n) memory usage, where n is the number of allocated objects.

          1. 3

            Give that this is intended to work on a text file, I assume n is the size of this input.

          1. 1

            I slightly changed my mind about error reporting after implementing a parser myself.

            Yes, getting all syntax errors at once is nice, but it tends to considerably bloat the code. So I just try to generate a precise message on the first error and exit.

            1. 3

              (Whoa, I am putting my hat on!)

              This is mostly about what you value.

              People praise Rust’s error reporting. Also, most compiler projects put some value on implementation simplicity. These two facts are related. Rust’s error reporting is superb, because Rust has near complete disregard for implementation simplicity. According to Rust project’s value as I understand it, as long as it’s simple for users, implementation can be arbitrarily complex almost without limit. This is different from most other projects, also this is probably not what you want.

              1. 1

                Exhibit A: Rust does this. “notice the capitalization”.

                1. 1

                  This is a type error. Do you have a reference about how Rust handles parse error?

                  1. 2

                    Parse errors in Rust are handled by the strategy of special-casing known issues, so it’s a mixed bag. If you feed it randomly broken code, you’ll get the usual crappy “expected x or y, but found z”. But there are cases where it will try a fix and re-parse. There are cases where it’ll go extra mile to show a good error, e.g. for unbalanced parenthesis or missing brackets it tries to guess which one was missing.

                    1 | fn foo() {}}
                      |          --^ unexpected closing delimiter
                      |          |
                      |          this block is empty, you might have not meant to close it
                    
                2. 1

                  Sorry, I didn’t articulate it well the first time. Trying again:

                  • I still very much appreciate getting all syntax errors at once
                  • for my (very limited use case) implementing a full-blown error recovery wasn’t worth it
                  • having faced the complexity of it I have more compassion for “line N: syntax error”.
              1. 25

                Deno is worth checking for people that want JS/TS but don’t want Node’s design flaws. Comes from Node’s original creator.

                TSC must be ported to Rust. If you’re interested in collaborating on this problem, please get in touch.

                For once, I agree with a RIIR.

                1. 7

                  I’m excited for this as well. See also: https://github.com/swc-project/swc

                  1. 1

                    RIIR

                    I don’t understand why. Rust does not seems particularly suited for the task.

                    1. 2

                      For which reasons do you believe that to be the case?

                      1. 3

                        A compiler needs to do a lot of symbolic manipulations.

                        More often than not, especially with a non-trivial type system such as typescript one, these implies manipulating directed acyclic graphs. Or even worse, arbitrary directed graphs if the processed language has a strong enough notion of recursion. Reasoning with lifetime and sharing of graphs is notoriously difficult. Even with logic stronger than the borrow model of Rust, such as separation logic. This alone is very annoying. If the graphs are local enough, as is the case with CFGs when doing intra procedural analysis, you can just keep allocating and release everything with when done with the function. But type systems not always lend themselves to local analysis. Then, like C/C++, Rust code tend to be efficient if computations have a regular structure (e.g. traversing arrays), not chasing pointers with hardly predictable lifetimes. Which again, is often the case when doing symbolic/static analysis.

                        So two of “Rust selling points” are not really applicable, quite the opposite.

                        1. 8

                          I firmly believe that Rust today is the best language for compilery tasks, based on two observations:

                          • The language for writing compilers is ML
                          • The language of choice for production-ready compiler implementations is C++

                          Rust, as an ML in C++ clothing, combines the relevant benefit of the two.

                          I specifically disagree that “it’s hard to work with graph data in Rust” criticism is relevant. Even in a GC language, representing complex graph as a literal graph of objects is often inconvenient/inflexible, and you would use arena and indices, just like in Rust.

                          My anecdotal experience tells me that Rust is a more convenient language than Kotlin for compiler front-ends.

                          I think I can get behind an idea that a sort of “idealized ML” would be a better language to implement a compiler in, but I think Rust’s quality of implementation, today, in practice, blows all existing functional languages out of the water.

                          1. 2

                            And if you don’t care about a C++ clothing, you can just use ML. Rust compiler used to be fast when it was written in OCaml. ;)

                            1. 1

                              … and way less features and checks…

                              1. 1

                                About C++ clothing and quality of the implementation :-) For example, Windows support and parallelism are just two major benefits of Rust in comparison to OCaml, which are mostly unrelated to language per se.

                            2. 2

                              Being honest, anything but JS would be an improvement in this task.

                              I’m okay with Rust because being already used in Deno and the community has a considerable inertia.

                              Your points seem reasonable. Would love to see, in general, web-related compilers/transpilers written in more efficient stuff than JS

                              1. 1

                                Working on a compiler in Rust. It doesn’t seem to be an issue.