1. 38

  2. 41

    Somebody on reddit points out:

    This study is looking at java and I think that skews the result. As the author said, java has a bunch of getters and setters that are one line each. Since the metric here is “average length of function in class”, any object with a lot of properties (and this getters and setters) would have a low average, even if it is filled with complicated, long other functions

    So another conclusion might be: “objects with a lot of internal state are error prone”.

    Maybe the study would be better if getter/setters are excluded, just don’t count them at all.

    1. 13

      lies, damned lies, and statistics

      1. 4


        there are some zero-length functions. These are likely methods of abstract classes, meant to be overwritten.

        It’s possible that the conclusion should actually be that virtual functions / abstract classes are a large source of defects:

        1. Having multiple implementations (some or all of which may not be visible to the person calling the method) makes it much harder / impossible to reason about what the method might actually do.
        2. Scattering the implementations across multiple different code bases makes it harder to see whether a) The code calling the functions is correct. b) The code implementing the functions is correct.
        3. Often the way a virtual function is called imposes certain requirements on its implementation which aren’t all adequately expressed by the type system, and may not be well documented.
        4. In many cases the actual implementation is delegated to a third party, who may not fully understand the intention of the person who created the abstract class.

        The first graph could do with some error bars, or some measure of standard deviation, or possibly it’s just not a very useful graph, since we can’t tell the difference between a class with a few long methods and a bunch of zero-length virtual methods or single line getters and setters and a class which has lots of medium-length methods.

        1. 4

          Java? Reminds me of Uncle Bob’s advice on this…

          1. 2

            I don’t at-all think this only applies to java because of the side-effect of exposing statefulness.

            A couple years ago I wrote a blog post along similar lines in Ruby (though not scientifically) calling out what I called The Local Variable Aversion Anti-Pattern. When I posted it to reddit, my favorite response was this monstrosity which was a PERFECT example of exactly what I was talking about. I’m not sure there’s a lot of statefulness there, but I have major problems understanding that code vs writing it as a 5 line method. (3 lines depending on your interpretation)

            class ClientsController < ApplicationController
              def show
                client = Client.find(params[:id])
                client_serializer = ClientSerializer.new(client)
                render json: client_serializer.serialize(params[:fields])

            … I don’t even want to inline the reddit version, but it’s like 27 lines of nonsense.

            1. 2

              At the risk of making a fool of myself.

              The graphs suggest that the defect density is high for very short functions: 3 lines on average max. But when you get to 4 lines, the defect density is low in both graphs.

              For example, in a class with 10 properties with a getter and setter for each, you have 20 one-liners. Then you can reach 4 (rounding up from 3.5) by adding just one real 55-line function, or two real functions making 57 lines, or three real functions making 61 lines, etc.

              For a class of N properties with 2N getter/setter lines-of-code and M real functions with Lm lines-of-code, in order to reach the ‘4-bucket’, this needs to hold for Lm:

              (Lm + 2N) / (M + 2N) >= 3.5
              Lm + 2N              >= 3.5(M + 2N) = 3.5M + 7N
              Lm                   >=               3.5M + 5N

              So, in order to escape the ‘high-defect-density+very-short-functions’ category a class needs to have at least

              • 3.5 lines of code per real function on average
              • for every property, an additional 5 lines of code in the set of real functions

              Obviously getters/setters skew the results, but does their presence push your average ‘many-properties class’ into the ‘high-defect-density+very-short-functions’ category? I don’t know, but wasn’t pre-2010 Java a pretty verbose language (meaning the average real function also has a high line count)?

            2. 24

              There’s nothing I hate more than having to break up clean logic and pollute the class/module/whatever namespace by breaking a single function into many functions that are useless on their own, just to please some stupid code complexity / function size metric.

              1. 13

                It depends on how big the “clean logic” is. I worked on a code base that had a method with a cyclomatic complexity of 712. I wanted to break it up into a few smaller functions when I needed to modify it, but was told not to touch it. We later had a bug in it, so I had to re-read it. I found a suspect area in a 50-line chunk of quantitative logic that I then extracted into its own method and tested exhaustively (e.g. for all reasonable combinations of inputs) and immediately found a hole in the logic.

                Now, I’m not saying that it was wrong to initially write the thing in one big procedural blob, but there’s a point at which you should probably think about refactoring.

                The issue I have is with rules like “methods should never be more than four lines long,” which cause me to immediately write off anything else the author has to say.

                1. 1

                  Any time you can extract a pure computation which isn’t simple enough to be “obviously correct” (defining “binaryMultiply(a, b int)” is probably a mistake, but “GCD(a, b int) int” probably isn’t) then I think that’s a good thing.

                  Many functions (and applications for that matter) run along the lines of:

                  • unpack/read/unmarshall/deserialise data on which to operate
                  • check constraints, return errors
                  • operate on data
                  • pack/write/marshall/serialise data

                  the “read” and “write” might be database, http request, etc.

                  In this case, the “operate on data” is a prime candidate for extraction - it should require no external dependencies (those are the things from which you’ll read and write) and can often be expressed as a pure function.

                2. 1

                  If you do this well, you end up with one function that looks like clean pseudocode of the algorithm and a bunch of functions marked always-inline that describe one step of the algorithm and are well named. You can review the pseudocode and the small functions independently: if the small functions don’t do exactly what you’d expect from the name, they’re wrong, if the pseudocode doesn’t describe the algorithm you’d expect, it’s wrong.

                  You can also do this very badly and just pull everything out into chunks that correspond to sequences of operations but not to logical structure in the algorithm and end up with spaghetti code that’s even harder to review than the original.

                  1. 1

                    The problem with metrics isn’t that they are wrong, but that we don’t get buy-in from all team members on them.

                    Rewriting bad code as worse code to “please the metric” is of course a terrible thing. But the goal of the metric (and what we need to work to get more buy in from teams using metrics on) is to inform you of where your code has gone wrong – to invite you to rethink how it should be. Taking a big procedural blob and breaking it into 6 “step” procedures and a main one that calls them each in order is of course a disaster. But banishing procedures from the codebase will make maintainability nicer in the long run.

                  2. 8

                    I feel there’s a fine line between breaking up functions for no reason and adding clarity. If you have a 2-liner of code that is 2 x 10 chained function calls, wrapping that up in a function and calling it extractXfromY helps explain what the code does implicitly. Taking 5 lines of simple variable assignment, arithmetic, database calls or similar and pushing into a separate class does little more than to lower the line count of your functions.

                    Just as line count is a terrible indicator of productivity and progress, it is also a terrible indicator of code quality. A 100 line function that has a clear purpose, with a few inline comments sprinkled into it, might be vastly preferable to unraveling the same function by having to jump back and forth between the “main” function and smaller bites of code just to figure out what is happening. As with all things, it depends.

                    1. 1

                      I agree that SLOC is not a perfect metric, and of course in different languages a reasonable SLOC will look different. In a language with tonnes of lines of boilerplate a 4 line function might be unacheivable. In a higher-level language, a 100 line function/procedure/class/thing would be madness. Depends on context.

                    2. 6

                      I always start by writing my logic as a single monolithic blob. Then I split for one of these reasons:

                      1. I notice that a part of the blob can be extracted and given a more generic type and a name that perfectly captures the behavior such that you’d never have to check the implementation unless you’re debugging it
                      2. I need to reuse a part of it somewhere else and that means those two places in the codebase need to be “in sync”, so if one place starts behaving differently, the other needs to change as well, and extracting functions and reusing them enforces that structurally.

                      More often than not, (2) happens because I’m writing tests for a piece of code, so I need to reuse the bits I’m testing in a way that is in sync with the original. But it can also happen due to new features that are somehow similar and related.

                      1. 5

                        In Java most methods are short, and 60% of Java code appears in methods containing five of less lines. See figure 7.22 in http://knosof.co.uk/ESEUR/ESEUR-draft.pdf

                        It is not surprising that most reported faults occur in short Java methods.

                        1. 3

                          This is something I’d like to see studied more, but at the same time, it makes a sort of intuitive sense (which may be the most dangerous type of making sense).

                          Humans are much more equipped to spot patterns in longer code than they are to effectively learn a new set of language, which is what happens when a ton of small functions pop up.

                          1. 3

                            I think that what matters is not necessarily the length of the function, but that it keeps itself at the same level of abstraction. As many “rules” in programming, it doesn’t apply all the time and judgement should be applied.

                            1. 3

                              Yes, it is, especially when individual pieces are “useless” on their own. Even though I am sure there are more I can only think of one instance that often causes very small functions, which is getters generating keys based on properties.

                              As with many programming truths and best practices it’s always good to know why things are good/bad, instead of blindly following some rule. After all the goal is to write easy to understand maintainable code, sometimes performant code and the insights from tools telling you about code smells is to spot them early. The goal is to remove them, not directly to make it so they return better results. In most cases this is right and when fixing things please make sure you don’t just work around it, but actually fix the issue.

                              I do quite a bit of consulting work and I have seen quite a few examples of working around warnings or “technically not breaking the rule”, while actually making things a lot harder to follow. For example breaking up huge functions by splitting it functionname1, functionname2, functionname3 and passing the whole context from one to another, making sure to just end up at what they considered the maximum allowed function body length. Another example I have seen a few times is adding code documentation in the form of “ThingDoer does things” not adding anything beyond the name. In at least one case it was non-obvious and lead to bugs by it being misunderstood.

                              Not arguing against these tools, but I think it’s a good approach to compare these with other programming tools. For example one also doesn’t blindly autocomplete, but checks if one actually wants to do what is suggested. The same check should be done for any potential code smell.

                              1. 2

                                I’d like to see a similar study examine fundamentally different languages. My intuition is that these bugs are more often caused by an unintended mixing of state and effects than by messing up the business logic. In a classic imperative/impure language, it can be hard to decompose code into many routines unless the programmer is particularly careful with how and when they change the state of the world. Otherwise, it’s all too easy to make such a program feel more decomposed than it really could be. In languages like java, it’s easy for innocently-named methods to hide things like state/data dependencies, DB transactions, and network effects from the caller. Back when I worked on a large python codebase, this was probably the #1 source of bugs: calling code which transitively fired unintended effects (or more precisely, effects which were intended for one transitive code path, but not for another).

                                1. 1

                                  Fantastic. Thank you for posting and taking this approach. So many “best practices” turn into sacred cows but there’s very little scientific reasons, mostly just industry consensus. This is powerful because it enabled me to have a discussion with my team.