I love this project
I see it as extremely ambitious & modest at the same time.
Ambitious in that re-implementing a well used language as big as ruby must be.
And modest because from what I understand, the project is done purely for the fun of it & not trying to change the PL world.
Its wonderful to follow along with the video updates & seeing how a language like ruby would be implemented.
Ruby is an underrated language in the PL community & its always nice to see people giving it more attention.
Thanks so much for the kind words! It’s definitely just for fun. It turns out that enough people are confused by the just-for-fun concept I made a website to explain it: https://justforfunnoreally.dev/
Awesome – just linked to this from my website.
Andreas Kling, author of Serenity OS, also made a video about contributing a performance enhancement to the Natalie project with some really novel (to me) tools and techniques: https://www.youtube.com/watch?v=b4PZgvPYkP4
Hello Lobste.rs friends! Thanks for giving my little for-fun project a look.
So this takes ruby code and transpiles it to C++? Is the goal just a faster ruby? So then, at a high level, is it solving a similar problem to Crystal?
Crystal is a bit different. They have static types with a Ruby-like syntax. Natalie is aiming to be more compatible with “real” Ruby, at least as much as possible. (Certain things just aren’t possible due to Natalie being an AOT compiler, e.g. runtime eval and runtime require/load of other files.)
Interesting. What GC does the generated C++ code use? How is the performance of the generated code? Why do you use autotools?
It looks like it uses Ruby Rake, not Autotools, fwiw
Autotools is mentioned in the readme on github:
Probably to build onigmo, a dependency
Yep, we just need autotools for Onigmo. But someday I’d love to learn how to implement my own regex engine!
I keep wanting to implement https://swtch.com/~rsc/regexp/regexp2.html
It looks wonderfully simple and easy
Also, looks like mark/sweep with freelist
Interesting, in that case I wonder how they track gc roots on the stack; will have a look at the source code.
Hm yeah this is a little unusual, they can the stack, with a range determined by a dummy var and setjmp():
And then they query the allocator for known pointers:
Interesting. Looks like a conservative GC for potential roots on the stack which point to memory managed by the allocator.
Could someone explain why is this valid? https://github.com/natalie-lang/natalie/blob/d0d3acf1179618dbea3cb8027d41a7d3fb9a49e2/src/gc.cpp#L39
I would expect the stack to contain a lot of values which can’t be treated as a pointer. Especially if you traverse partially-ffi stack. How can they just deref every single intptr_t-aligned value?
That’s what “conservative” means !!! It’s a big hack :)
“Conservative” as in “we have false positives for pointers (integers/strings/etc. that look like pointers) but no false negatives”. False positives means we leak memory; false negatives mean we free memory that’s used (in tracing GC, which this is.)
I prefer to call it imprecise! “conservative” makes it sound like a good thing!
This is how Boehm GC works. I believe there were some experimental evaluations decades ago that argued that conservative GC is OK (and it is in almost all cases).
And yes the main motivation for it is that it relieves you of the precise rooting problem! I pointed out this useful SpiderMonkey article in my blog posts on GC:
This is a tough problem which I underestimated before implementing Oil! A surprising benefit of the Python to C++ translation is that it helps with this problem – we can do the rooting automatically in most cases, because of the code generation.
With plain C/C++, you have to manually tell the GC about your heap-allocated objects, or do imprecise scanning.
With reference counting, you have to manually jiggle ref counts everywhere (like CPython).
One of the points of PyPy is that you don’t have to “hard-code” this decision everywhere in the interpreter, and go through big/painful migrations like SpiderMonkey did when they wanted to change their strategy. Instead, the interpreter is “parameterized” by this decision at code gen time.
So Oil also has that ability, although it’s much simpler than PyPy. Memory management isn’t littered all over the codebase.
Tbh, I know about the conservative GC, just got confused about that line specifically. I thought it derefs the value on the stack rather than the stack pointer - it was late already :-)
But that’s a good GC summary, thank you!
That’s pretty wild!
Curious what you find out :P