My primary project at work has as a primary metric number-of-units-processed-per-syscall. We’re sitting at around 100:1 right now, which is about a 5x improvement from a few months ago. The number of system calls absolutely matters. Prior to the ratio of about 20:1, we were losing work units due to buffer exhaustion, and now we’re scaling considerably higher on the same hardware.
Gack! That isn’t a micro-optimization.
That is “Completely replace your core threading model with something radically different.”
Hi! Thanks for reading my post.
Flipping that configure switch does not radically change the threading model. It simply changes the method by which pre-emption occurs. Let me explain further.
–enable-pthread: This enables the use of one single OS level thread which runs a simple loop. It essentially “pings” the Ruby VM at a set rate. The Ruby VM uses this ping to know when it is time to switch between threads.
–disable-pthread: This disables the use of one single OS level thread. Instead, it uses a timer signal (VTALRM) to send periodic pings to the Ruby VM at a set rate. The Ruby VM uses this ping to know when it is time to switch between threads.
The actual threading implementation itself is unchanged. It is simply the source of the timer that changes. 20 million sigprocmask calls is quite a price to pay for time tracking.
I thought I’d take a closer look…. so I pulled the latest ruby 2.4 source and ran ./configure –help and found….
--enable-pthread obsolete, and ignored
So I pulled the oldest ruby on the ruby website. 2.0.0-p648
So I took a closer look at your post. Oh yes. You’re using 1.8.7
Ok, so you probably right.
But I will say that micro-optimization doesn’t matter.
I think you will find moving to 2.4 will speed things up way way more than tweaking that option. (Plus give you a lot of very nifty new stuff).
Anyhoo, pull 1.8.7-p374, yup, what you’re talking about is in eval.c
Hmm. The get/set context seems to be in longjmp/setjmp implementation. Which is used in the ruby thread context switching.
Hmm. The fact that shaving a setjmp/longjmp allows them to get by without sigprocmask makes me worried. It sort of implies that there is (possibly narrow) window in which signal delivery at thread context switch time might do unfortunate things.
Threading is very hard. Threading in the presence of signal delivery is extremely hard to get perfectly right.
I became a little obsessed with this post since commercial reasons forced me against my will to write a swapcontext based scheduler…. And yes, sigprocmask is a hot spot and no I can’t get rid of it without creating a horrid little window of nastiness.
It’s one of these things where the devil is in the fine fine fine details.
I’m glad you’re looking at this a bit more. I really wanted to go back much earlier and look at the development of this feature to see when, if ever, a regression was introduced. Presumably there was some ruby version without this feature? Or a glibc that didn’t call sigprocmask? And then somebody made a change “it’s only one syscall” that destroyed performance, but nobody seemed to notice until much later.
I’d would look for signal handlers doing the wrong thing on thread context switch.
But I sort of wouldn’t bother trying to debug something that is way beyond end of life.
It’s a pretty heavy cost to pay for programs that don’t even use threads, however.
Every time somebody says, “Let’s add threading to something. If you don’t need it it won’t cost you.”…. I sigh.
It always costs.
In hidden complexity, in hidden bugs, even (in the case of Java) hidden threads whirring about doing stuff you didn’t explicitly ask for.
Every time somebody says, “Let’s add threading to something. If you don’t need it it won’t cost you.”
People say that? Where do they buy their drugs, because I definitely don’t want anything from their supplier?
I have no idea where they buying what….
Follow up to: https://blog.packagecloud.io/eng/2017/02/21/set-environment-variable-save-thousands-of-system-calls/