Note: this seems to be an old presentation Microsoft just recently uploaded.
The majority of ‘common coupling’ that they identified was thru the ‘current’ global variable. This seems really dubious if you consider that ‘current’ is a pointer to the currently running task. This is a place to store per-task information that is not typically shared between multiple tasks. It’s essentially thread-local-storage in the kernel. So, it’s not really very global. The hardware will switch out the ‘current’ pointer whenever a context switch occurs. It’s hard to say if this is really bad for maintainability or not.
They also avoided using tools other than LXR and their eyes. Some perl scripts were apparently involved to collate the numbers. A good question is whether, after reviewing 3 MLOC you can trust these numbers. Also, the presenter/PI mentioned that the existing tools were not possible to do this analysis. I find that hard to believe, even in 2002 (the year of the study). Maybe a lot has happened since then, but seems like symbolic execution would be the way to go for this kind of study.
One of the big things that were missed but were also planned for further study is aliasing. So, let’s say instead of doing current->foo = 1;, you use a helper function/macro like: set_foo(current, 1); Well, their study would not catch set_foo() as contributing to common coupling.
current->foo = 1;
One question I have is whether the BSDs use a style where most global variables are referenced via helper functions. If so, it seems reasonable that they’d over-count the number of common couplings in Linux and under-count them in BSDs.
Whole thing smelled pretty bogus to me. He explicitly acknowledges the vagueness of “maintainability”, and then proceeds to use a single apparently arbitrarily-selected metric to quantify it. Empirically, some ~14 years have gone by, with numerous corresponding MLOC added to Linux, and it continues to be…maintained (further developed, even!), despite all prophecies of impending doom.
(And given the context, his use of the terms “kernel” and “module” were really needlessly confusing – to put it kindly.)
I kinda want my hour back.
Although the presentation was interesting - and the video is worth watching, this is from March 2005.
So why is Microsoft Research publishing this now? 11 Years later? Perhaps to prove that the thesis is wrong…
Somebody found it in the archives? I mean, you find an old unpublished video. Is the alternative to just never publish it?
It would be really interesting to see an update of this paper/presentation, to see how things have changed since then.
Thanks MS this was really interresting.
How useful is a statistical analysis like this of such a large C codebase?