1. 6

So I have come across a very simple refactoring, a very simple change to the way in which we do refactoring.

It is deceptively simple and obvious, no deep magic, no deep thought required, just a small subtle shift in what we do every day.

Prefer parameter passing to function invocation.

And I’m starting to think this is the most important refactoring of them all.

Quite a bold claim, so needs some bold justifications.

I have become obsessed with this question….

“If Coupling is Bad, what is a Constructive approach to producing Decoupled Designs?”

So I shepherd a multimegaline body of C code for a living.

It has 17 real time years / 300 man years of stepwise refinement, D.R.Y. refactoring and feature addition…

Mostly Good Stuff, but more and more coupling.

And I have finally worked out why, and what to do about it.

Imagine you have that most common of C language code smells.

The Number 1, Top of the List, Technical Debt Item in every C program everywhere.

A too loooong function.

result_t looooongFunction( input_t input)
{
   result_t result;
.
.
.
.
// carries on for several screenfulls.
.
.
.
.
   return result;
}

So what do we do with it?

Obvious! What we always do!

Extract Method!

intermediate_t guts( input_t input, other_t otherStuff)
{
    intermediate_t intermediateValue;
.
.
.
.
.
   return intermediateValue;
}

result_t looooongFunction( input_t input)
{
   result_t result;
.
.
.
.
   intermediate_t intermediateValue = guts( input, otherStuff);
.
// carries on for fewer screenfulls.
.
.
.
.
   return result;
}

No problem! We have all done this, many times.

In fact we may even do it several times on the same function.

intermediate_t guts( input_t input, other_t otherStuff)
{
    intermediate_t intermediateValue;
.
.
.
.
.
   return intermediateValue;
}

other_t other( input_t input)
{
   other_t result;
.
.
.
   return result;
}

result_t looooongFunction( input_t input)
{
   result_t result;
.
. // A little stuff
.
   other_t otherStuff = other( input);
.
. // A bit more
.
   intermediate_t intermediateValue = guts( input, otherStuff);
.
// carries on for fewer screenfulls.
.
.
.
.
   return result;
}

Now it’s looking really clean! Grreat! We starting to see which way is up…

looooongFunction
  |->other
  |->guts

In fact, we may tackle the insides of the stuff we extract….

inner_t deepArcana( other_t otherStuff)
{
   inner_t inner;
.
.
.
.
   return inner;
}

intermediate_t guts( input_t input, other_t otherStuff)
{
    intermediate_t intermediateValue;
.
.
    inner_t inner = deepArcana( otherStuff);
.
.
   return intermediateValue;
}

other_t other( input_t input)
{
   other_t result;
.
.
.
   return result;
}

result_t looooongFunction( input_t input)
{
   result_t result;
.
. // A little stuff
.
   other_t otherStuff = other( input);
.
. // A bit more
.
   intermediate_t intermediateValue = guts( input, otherStuff);
.
// carries on for fewer screenfulls.
.
.
.
.
   return result;
}

All this is perfectly standard, this is the organic way programs grow from “hello world” to major programs.

Except eventually we notice the call graph from looooongFunction has explosive fan out.

looooongFunction
   |--other
   |--guts
       |--deepArcana

After many years, after many iterations of doing this…

We notice looooongFunction has exploded into a nested tree of functions many level deep.

Often with much higher than a factor of two fan out at every level….

(I have changed the function names format to save me trying invent names)

looooongFunction
|->A
|  |->A1
|  |  |->A11
|  |  |  |->A111
|  |  |  |->A112
|  |  |->A12
|  |  |  |->A121
|  |  |  |->A122
|  |->A2
|  |  |->A21
|  |  |  |->A211
|  |  |  |->A212
|  |  |  |  |->A2121
|  |  |->A22
|  |  |  |->A221
|  |  |  |->A222
|->B
|  |->B1
|  |  |->B11
|  |  |  |->B111
|  |  |  |->B112
|  |  |->B12
|  |  |  |->B121
|  |  |  |->B122
|  |->B2
|  |  |->B21
|  |  |  |->B211
|  |  |  |->B212
|  |  |->B22
|  |  |  |->B221
|  |  |  |->B222

Now usually there is a bunch of D.R.Y. refactoring going on so instead of a tree, you (hopefully) have an Acyclic digraph with very high rank…

looooongFunction
|->A
|  |->A1
|  |  |->A11
|  |  |  |->A111
|  |  |  |->A112
|  |  |->A12
|  |  |  |->reuseA111
|  |  |  |->A122
|  |->A2
|  |  |->reuse A11
|  |  |->A22
|  |  |  |->some nidjit has totally crossed up the layering, reuses B1
|  |  |  |->A222
|->B
|  |->B1
|  |  |->B11
|  |  |  |->B111
|  |  |  |->B112
|  |  |->B12
|  |  |  |->reuse A121
|  |  |  |->access the database. Just try test anything in the B or A tree without a full database! Hah!
|  |->B2
|  |  |->B21
|  |  |  |->B211
|  |  |  |->B212
|  |  |->reuse B12
|  |  |  |->B221
|  |  |  |->B222

We find looooongFunction is actually getting to be hell to test. (Well, I bet it wasn’t under a unit test to start with…)

Can we do better?

Try this idea.

ie. Instead of a callgraph that looks like…

 looooongFunction
    |--other
    |--guts
        |--deepArcana
Can we make the call graph shallower?
looooongFunction
   |--other
   |--deepArcana
   |--guts

Instead of using function calls as a mechanism, use parameter passing.

Instead of guts() increasing the depth of the call graph, we can pass the value of deepArcana() it needs as a parameter.

inner_t deepArcana( other_t otherStuff)
{
   inner_t inner;
.
.
.
.
   return inner;
}

intermediate_t guts( input_t input, other_t otherStuff, inner_t inner)
{
    intermediate_t intermediateValue;
.
.
.
.
   return intermediateValue;
}

other_t other( input_t input)
{
   other_t result;
.
.
.
   return result;
}

result_t looooongFunction( input_t input)
{
   result_t result;
.
. // A little stuff
.
   other_t otherStuff = other( input);
.
. // A bit more
.
   intermediate_t intermediateValue = guts( input, otherStuff, deepArcana( otherStuff));
.
.
// carries on for fewer screenfulls.
.
.
.
.
   return result;
}

By making our call graph shallower, we make our code easier to test, easier to reuse, less coupled.

With luck we can play this trick several times.

With luck and thought we can stop storing intermediate value, and use parameters.

This trick is particularly powerful if any of the fan out calls is very hard to test.

For example any service.

A service is a function whose effect, and result, depends on things other than its parameters or state. eg. Timers and I/O.

If anywhere deep in the bowels of a deep call graph you invoke a service, it is hard to test, hard to reuse that code.

Repeated use of this refactoring is going to have two obvious side effects.

Side Effects Shallow but Explosive Fan Out call graphs.

Some where near the top of the call graph you are going to have explosive fan out… to very shallow small call trees, with the functions invoked “Yoda” style.

Yoda Style - Invoke the function you are going to need, pass result as a parameter instead of invoking the function where you need the result.

looooongFunction
|->A121
|->access the database. 
|->B
|  |->B111
|  |->B112
|  |->B11
|  |->B12
|  |->B1
|  |->B21
|  |->B211
|  |->B212
|  |->B221
|  |->B222
|  |->B2
|->A
|  |->A111
|  |->A112
|  |->A11
|  |->A122
|  |->A12
|  |->A1
|  |->A22
|  |->A222
|  |->A2

And the other obvious side effect is the number of parameters are going to explode.

Coping with too many parameters. From long parameter lists to objects.

Suddenly having way way too many parameters is actually A Good Thing!

Why? Because you will start to notice common bundles of parameters, sharing common precondition expressions.

Guess what! They are objects.

And the common precondition is the class invariant for the object.

result1_t fn1( b_t b, c_t c, d_t d)
{
   precondition_assert( aPredicate( b, c) && anotherPredicate( d));
   .
   . // Do stuff with b, c, and d
   .
   return result1;
}

result2_t fn2( e_t e, b_t b, c_t c)
{
   precondition_assert( aPredicate( b, c) && someOtherPredicate( e));
   .
   . // Do stuff with b, c, and e
   .
   return result2;
}

void fn(void)
{
   b_t b;
   c_t c;
   d_t d;
   e_t e;

   .
   . // Establish precondition for b and c and d
   .
   result1_t result1 = fn1( b, c, d);
   .
   . // Establish precondition for e
   .
   result2_t result2 = fn2( e, b, c);
   .
   . // Do stuff
   ,

}

So bundle them up into an object and pass that instead.

class BC {
   b_t b;
   a_t a;

public:

   BC(...) { 
       .
       . // Establish precondition aPredicate(b,c) AKA the class invariant for b and c
       b = ...;
       c = ...;
   }

   // Provide only getters, since we now have ensure the precondition aka the class invariant _always_ holds.
   b_t b() const { return b;}
   c_t c() const { return c;}
}

result1_t fn1( BC_t bc, d_t d)
{
   // precondition for bc guaranteed by class invariant and static type checking!
   precondition_assert( anotherPredicate( d)); 
   .
   . // Do stuff with bc.b(), bc.c(), and d
   .
   return result1;
}

result2_t fn2( e_t e, BC_t bc)
{
   // precondition for bc guaranteed by class invariant and static type checking!
   precondition_assert( someOtherPredicate( e));
   .
   . // Do stuff with b, c, and e
   .
   return result2;
}

void fn(void)
{
   BC_t bc(...)
   d_t d;
   e_t e;

   .
   result1_t result1 = fn1( bc, d);
   .
   . // Establish precondition for e
   .
   result2_t result2 = fn2( e, bc);
   .
   . // Do stuff
   ,

}

Yup. We do fan out one more level….

We changed from….

fn
 |->fn1
 |->fn2

to…

fn
 |->BC::BC()
 |->fn1
 |   |->BC::b
 |   |->BC::c
 |->fn2
     |->BC::b
     |->BC::c

but the leaf nodes are just const getters. Pure functions. Simple.

So we add an proviso… it’s OK to fan out to getters.

  1.  

  2. 4

    This works really well to untangle some resource or lock acquisition nests.

    1. 2

      To give the standard ‘moderation’ response: clearly there is a balance. Any complex program will have a high rank in the call graph. And that’s probably good. I think, in general, this criteria is not actually that good though.

      In C, I think the criteria for how to construct a program should be focused around ownership of resources. In general, it is preferred to keep resource creation and destruction in the same level. Using this criteria, one naturally tends towards flatter call graphs in their C code.