So merry ChristMatz 2022, we got 3.2 where YJIT is stable but off by default. 2023 it 3.3, YJIT’s on by default in a paused state so you can enable it in code, rather than when compiling and passing a flag to executables. No relevant change this year.
Anyone got a sense of when it’ll be on by default? Or remove the flag altogether? Why might someone avoid YJIT, and why can’t we give em an LTS version?
I dunno, seeing all that feature flagging in the low level implementation puts a squirm in my spine. I don’t think it’s like wrong, I just wouldn’t want to be maintaining it. OTOH, that’s all focused and prolly low churn stuff, so I suppose it ain’t hurting no one.
I believe another reason it is still off by default is more people-related: ruby-core is mostly Japanese and C-based. YJIT is maintained by Shopify employees and Rust-based.
Ruby has many of the same problems as Linux – some groups want to move to Rust for safety and speed but the existing maintainers often don’t know Rust at all.
Ruby has improved quite a lot. I love Ruby and used it extensively in the late 2000s. MRI was a disaster. Slow and had memory leak issues in long-running processes. Right now, it’s much faster and more stable. Night and day difference. Leaving Ruby aside, it never ceases to amaze me how good and performant the JVM is. The results in this benchmark are quite stunning.
yea, it’s a great question, and something I thought of mentioning in the article (then forgot).
I think the primary reason it’s still off by default is because of memory overhead. any JIT will usually add a not insignificant amount of memory usage on top of the runtime.
That said, YJIT is really configurable about the amount of memory it will consume. By default I think it’s 48 megs per process max? I know Github uses it but tunes it down a bit to more like 16megs. So possibly in the future it will be on, but by default set to a lower max.
Would be curious to hear from the YJIT team on their thoughts on that!
48MiB is how much executable memory it will use, but it also has to keep track of some metadata which usaly account for 2 or 3x the executable memory. So you’ll easily end up with a 150-200MB overhead with the default settings.
And yes you are right, the main blocker for enabling it by default really is extra memory usage, even though the YJIT team never formally proposed it, but from informal discussion with other Ruby committers, it’s clear that it would be the blocker.
I think the main reason to not flip it on by default in 3.3+ was that it could break container / memory-constrained environments if folks upgrade blindly and don’t monitor and increase mem limits appropriately It also can increase startup time for interactive use cases, such as CLIs and such.
I dunno if that was really the right call, but it seems that more conservative approach still holds: I haven’t heard any change in the default YJIT settings for 3.3+.
It’s interesting. I would think that you’d want everything written in Ruby (which would allow for JIT optimizations), and then hand pick things to optimize in C-Ruby (for interpretation), but they’re doing the exact opposite! :D
If they were starting from scratch, but with YJIT, I’m sure they would. They’re undoing earlier manual optimizations in favor of the new thing that’ll do it automagically.
In the past (10+ years ago) the primary path to improve ruby MRI perf was to optimize all the hot code paths via C. Or you could go alternative ruby runtimes with rubinius or jruby or one of the many others, but mri has always been the primary runtime. This pre-dated any sort of production-ready JIT development for MRI.
So now I think there a lot of perf sensitive code paths where you have to carefully unwind C (or essentially feature flag, as the post shows) in ruby-core to let YJIT do its magic.
IIRC Ruby 3.3 does not have YJIT on by default but 3.4 will. With that change they can modify the codebase to favor refactors that help JIT versus today they need to balance both on and off execution mode performance.
I don’t know anything specific about the Ruby interpreter/yjit, but a semi-educated guess would be that the C implementation has to handle every possible case, like if the object has this method overridden or if it throws an exception (which has very specific control flow in ruby, and you can jump back to the source, or use it to do some dynamic behavior, e.g. execute code based on method name), possibly calling back-and-forth the C and Ruby word multiple times.
Meanwhile, a JIT compiler can simply ignore these cases in the happy path, and do an expensive, but rare de-optimization when it realizes it was wrong.
Also, the FFI overhead is non-negligible, what C has to optimize is not the whole function being re-written in C, but a Ruby code that repeatedly calls into C for very very short segments and with overhead. There is a fundamental limit here similarly to Amdahl’s law on how much speedup we could get this way.
This boils down to the age-old interpreter-vs-native-code discussion. Nothing special about Ruby doing a byte-by-byte decision of which piece of C code to call to do something, versus just doing something instead.
Interpreters can be fast. Not having to interpret is obviously faster.
I love these posts man. Very deep dives into lots of interesting topics. Still a consistent story line that’s easy to follow.
Thanks schneems! I had alot of fun writing this one. Something I haven’t really dug into before.
They’re even fascinating for ppl like me who don’t know much about Ruby at all.
So merry ChristMatz 2022, we got 3.2 where YJIT is stable but off by default. 2023 it 3.3, YJIT’s on by default in a paused state so you can enable it in code, rather than when compiling and passing a flag to executables. No relevant change this year.
Anyone got a sense of when it’ll be on by default? Or remove the flag altogether? Why might someone avoid YJIT, and why can’t we give em an LTS version?
I dunno, seeing all that feature flagging in the low level implementation puts a squirm in my spine. I don’t think it’s like wrong, I just wouldn’t want to be maintaining it. OTOH, that’s all focused and prolly low churn stuff, so I suppose it ain’t hurting no one.
I believe another reason it is still off by default is more people-related: ruby-core is mostly Japanese and C-based. YJIT is maintained by Shopify employees and Rust-based.
Ruby has many of the same problems as Linux – some groups want to move to Rust for safety and speed but the existing maintainers often don’t know Rust at all.
Ruby has improved quite a lot. I love Ruby and used it extensively in the late 2000s. MRI was a disaster. Slow and had memory leak issues in long-running processes. Right now, it’s much faster and more stable. Night and day difference. Leaving Ruby aside, it never ceases to amaze me how good and performant the JVM is. The results in this benchmark are quite stunning.
yea, it’s a great question, and something I thought of mentioning in the article (then forgot).
I think the primary reason it’s still off by default is because of memory overhead. any JIT will usually add a not insignificant amount of memory usage on top of the runtime.
That said, YJIT is really configurable about the amount of memory it will consume. By default I think it’s 48 megs per process max? I know Github uses it but tunes it down a bit to more like 16megs. So possibly in the future it will be on, but by default set to a lower max.
Would be curious to hear from the YJIT team on their thoughts on that!
48MiB is how much executable memory it will use, but it also has to keep track of some metadata which usaly account for 2 or 3x the executable memory. So you’ll easily end up with a 150-200MB overhead with the default settings.
3.4 will have a much more ergonomic memory setting: https://github.com/ruby/ruby/pull/11810
And yes you are right, the main blocker for enabling it by default really is extra memory usage, even though the YJIT team never formally proposed it, but from informal discussion with other Ruby committers, it’s clear that it would be the blocker.
Ah right, thanks for the clarification byroot. Not the first time I’ve thought through that incorrectly - glad to have a clearer setting. Thanks!
I think the main reason to not flip it on by default in 3.3+ was that it could break container / memory-constrained environments if folks upgrade blindly and don’t monitor and increase mem limits appropriately It also can increase startup time for interactive use cases, such as CLIs and such.
I dunno if that was really the right call, but it seems that more conservative approach still holds: I haven’t heard any change in the default YJIT settings for 3.3+.
I think
Integer#successis a typo (I don’t seeridocs for it). But good post, I enjoyed it!It should be Integer#succ.
Whoops! Thanks for the typo find! Fixed.
[Comment removed by author]
It’s interesting. I would think that you’d want everything written in Ruby (which would allow for JIT optimizations), and then hand pick things to optimize in C-Ruby (for interpretation), but they’re doing the exact opposite! :D
If they were starting from scratch, but with YJIT, I’m sure they would. They’re undoing earlier manual optimizations in favor of the new thing that’ll do it automagically.
In the past (10+ years ago) the primary path to improve ruby MRI perf was to optimize all the hot code paths via C. Or you could go alternative ruby runtimes with rubinius or jruby or one of the many others, but mri has always been the primary runtime. This pre-dated any sort of production-ready JIT development for MRI.
So now I think there a lot of perf sensitive code paths where you have to carefully unwind C (or essentially feature flag, as the post shows) in ruby-core to let YJIT do its magic.
IIRC Ruby 3.3 does not have YJIT on by default but 3.4 will. With that change they can modify the codebase to favor refactors that help JIT versus today they need to balance both on and off execution mode performance.
No, YJIT still isn’t on by default in 3.4.
Perhaps there could be some analysis why the C implementation isn’t/cannot be as efficient as YJIT generated code.
I don’t know anything specific about the Ruby interpreter/yjit, but a semi-educated guess would be that the C implementation has to handle every possible case, like if the object has this method overridden or if it throws an exception (which has very specific control flow in ruby, and you can jump back to the source, or use it to do some dynamic behavior, e.g. execute code based on method name), possibly calling back-and-forth the C and Ruby word multiple times.
Meanwhile, a JIT compiler can simply ignore these cases in the happy path, and do an expensive, but rare de-optimization when it realizes it was wrong.
Also, the FFI overhead is non-negligible, what C has to optimize is not the whole function being re-written in C, but a Ruby code that repeatedly calls into C for very very short segments and with overhead. There is a fundamental limit here similarly to Amdahl’s law on how much speedup we could get this way.
This boils down to the age-old interpreter-vs-native-code discussion. Nothing special about Ruby doing a byte-by-byte decision of which piece of C code to call to do something, versus just doing something instead.
Interpreters can be fast. Not having to interpret is obviously faster.