1. 32
  1.  

  2. 24

    Has anyone else found that making many small helper methods tends to make reading the code much more difficult though? This is something that I spend a lot of time musing about, and writing half-assed tools about (I have a bad scheme clone of imatix’s gsl).

    When I see a function that is just a list of helper functions, I have to zip around look up the implementation for all of them! Especially when they’re not re-used across the code base, you’re learning a whole set of semantics that are unnecessary for your problem domain.

    If the top level function was fully documented, then I’d only need to do that when I change something. Which is an improvement, but then it means that changing something is the slowest part, which is the opposite of maintainable code.

    A function name is only better than a line comment in that it delineates the start and the end of the code it’s referring to, unless it’s reused, but YAGNI if it’s not reused already. You could mark sections within comments like pre-function programming…

    Ah shit. I think I’m reverse engineering literate programming again.

    1. 10

      Has anyone else found that making many small helper methods tends to make reading the code much more difficult though?

      Depends on the length and complexity of the function.

      I think it’s much cleaner to have a function calling a list of well named functions each with a specific purpose than many lines of if else switch for loops etc.

      As far as maintenance, I suspect you know what you’re trying to modify and can go directly to which function is requiring the modification. Why would you need to zip around looking at all of them?

        1. 8

          I agree. My understanding is that this is partly based on people applying generally good rules in a way too draconian way. So while it’s great that code quality is something that people look at nowadays and build tools to improve it I think they arrived at a point where they might be either counter productive or too limited.

          Yeah, it’s good to have short functions that aren’t too complex, but I think everyone has seen code that became clearer when increasing it’s size.

          My favorite example is cyclomatic complexity, which is a great concept to quantify complexity and give a clue on when it’s worth to separate, however it sometimes can lead to the complexity becoming even more of a nuisance when that complexity is split up, to make the tool checking your code quality happy.

          I don’t know what the solution is and have seen different bad outcomes of bringing code quality (tools) into companies. Next to justifying insanity by throwing around some quote or some tool, I’ve seen the other side as well, which is starting to use such a tool and instead of taking its input simply configuring it so that the code just passes checks. Other than that I have seen extremely smart ways, just as silly patterns to work around code quality checkers, leading in the code being hard to read, rather than more easy.

          Something where these tools do very badly is that they are often not made with the programming language in mind and for obvious reasons not with the programming task at hand. So the outcome is usually that writing something like parsers or converters in a readable and efficient way makes most of these tool think code is horribly complex.

          I think general statements are hard to make. Sometimes I prefer to split up stuff into helpers, sometimes it is way clearer to kind of write “paragraphs” of code with some comment above them, but only when it cannot be reused as it is code that requires what happens before and after and the context/the amount of variables on the stack is both big and specific, but even here I wouldn’t consider that a general rule.

          A while ago I read an article about testing. If I remember correctly, it was linked here, but I can’t find it right now. It was about how not every test needs to be written or every single line of code needs to be tested and that despite testing being extremely important and something people should do. It still is nonsense to not test code, just like it being nonsense to have big programs in only a few functions.

          In other words: In my opinion, part of what make a programmer a good/experienced programmer is knowing what the right thing to do is.

          1. 3

            My favorite example is cyclomatic complexity

            So I decided to play with oclint and wrote essentially one function in three different styles. I also snuck a typo in one of them to see if the tool would give a shit (and it wouldn’t). oclint complains about fun2, which is arguably the most straightforward and a reasonably idiomatic way to write this kind of code in C. It does not complain about complexity in the other two functions…

            It also nags about all the variable names because they are too short. How useless. Here’s the code:

            int
            fun1(int op, int l, int r)
            {
                return
                  op == 1 ? l + r
                : op == 2 ? l - r
                : op == 3 ? l * r
                : op == 4 ? l / r
                : op == 4 ? l % r
                : op == 6 ? l << r
                : op == 7 ? l >> r
                : op == 8 ? l == r
                : op == 9 ? l != r
                : 0;
            }
            
            int /* line 17 */
            fun2(int op, int l, int r)
            {
                if (op == 1) return l + r;
                if (op == 2) return l - r;
                if (op == 3) return l * r;
                if (op == 4) return l / r;
                if (op == 5) return l % r;
                if (op == 6) return l << r;
                if (op == 7) return l >> r;
                if (op == 8) return l == r;
                return 0;
            }
            
            int
            fun3(int op, int l, int r)
            {
                switch (op) {
                case 1: return l + r;
                case 2: return l - r;
                case 3: return l * r;
                case 4: return l / r;
                case 5: return l % r;
                case 6: return l << r;
                case 7: return l >> r;
                case 8: return l == r;
                default: return 0;
                }
            }
            

            Summary: TotalFiles=1 FilesWithViolations=1 P1=0 P2=1 P3=9

            /tmp/ocl/oclint-0.12/bin/cat.c:17:1: high npath complexity [size|P2] NPath Compexity Number 256 exceeds limit of 200

            1. 1

              Now suppose our poor programmer had started with fun2 and thought that there was no way to cheat the tool and reduce complexity without breaking the function down into smaller pieces… tada, more code, a layer of indirection, and an extra pair of checks to maintain behavior with the old numbering scheme! And now the tool will happily accept it (well it still nags about names that are shorter than three letters).

              Only now it’s much harder for somebody reading the code to determine which number matches which operation. Of course one could introduce yet another layer of indirection by replacing the numbers with identifiers, and then you also need to make sure the table & defines stay in sync. If that’s too much work, we can introduce yet more complexity by having another tool generate the table at build time.. anything goes as long as the tool is happy!

              int op_add(int l, int r) { return l + r; }
              int op_sub(int l, int r) { return l - r; }
              int op_mul(int l, int r) { return l * r; }
              int op_div(int l, int r) { return l / r; }
              int op_mod(int l, int r) { return l % r; }
              int op_shl(int l, int r) { return l << r; }
              int op_shr(int l, int r) { return l >> r; }
              int op_eq(int l, int r) { return l == r; }
              
              int (* const optab[])(int, int) = {
                  NULL,
                  op_add,
                  op_sub,
                  op_mul,
                  op_div,
                  op_mod,
                  op_shl,
                  op_shr,
                  op_eq,
              };
              
              int
              fun4(int op, int l, int r)
              {
                  if (op < 1 || op >= sizeof optab / sizeof *optab )
                          return 0;
                  return optab[op](l, r);
              }
              
              1. 1

                I think the tool’s right. With fun2 the flow is much less obvious; in fun1 and fun3 it’s immediately clear that only one path will be taken.

                1. 2

                  What in fun2 would even suggest that more than one path may be taken?

                  And then why does that not apply to fun3? Both are reliant on returns, nicely lined up.

                  fun1 is the least idiomatic of them all, and while the way I lay it out should give a strong clue about its behavior, it might still take a moment longer for the average C programmer to determine that it does indeed do what it should (the sneaky typo aside).

                  1. 2

                    What in fun2 would even suggest that more than one path may be taken?

                    There are a bunch of different if statements. A priori, any combination of them is possible.

                    And then why does that not apply to fun3? Both are reliant on returns, nicely lined up.

                    It does to a certain extent, but less so, because a switch is an immediate indication that only be one path will be taken - fall-through is so rarely desirable that the reader is inclined to assume it won’t happen.

                    fun1 is the least idiomatic of them all

                    Interesting, because I found it the clearest / most readable.

            2. 4

              I like style C with blocks

              var x int
              { // minor function 1
                  ...code..
                  x = whatever
              }
              

              my coworker hates it though

              1. 4

                I like to use the same pattern. It’s nice because you don’t have to go around looking for tiny helper functions that only get called once while still properly scoping the lifetime of your variables so readers can forget about them when leaving a scope.

            3. 8

              I find the “many small helpers” style unbearable when any of the helpers mutate shared state.

              Having many short pure functions can still be confusing, but it’s vastly more manageable for me.

              1. 7

                Has anyone else found that making many small helper methods tends to make reading the code much more difficult though?

                Yes, I’ve been saying this for a while.

                Show me your half-assed tools and I’ll show you mine.

                1. 3

                  NOW THAT’S A WHOLE ASS, not half. That’s awesome!

                  Notes on yours: Reminds me a lot of the tanglec program that the axiom guy daly came up with. I like your notation a lot, but I’m not sure I like the before/after notation… I’ll take a deeper look at this and try to think more clearly about it. The testing part is fascinating, totally different ideas than I would ever think of.

                  The iMatix guys take it a different way and say that your literate programs have “models” within them that you can pull out and give some meaning to, by encoding them in well documented XML, which is then expanded into code.

                  Mine doesn’t support looping correctly yet in the template language, because I don’t want to re-invent monads. but here it is. It doesn’t cover what you’re doing quite, as I’m trying to go with a higher order concept of “models” that the gsl folks came up with.

                  I have some work locally that’s about joining these two ideas. Scheme as the meta-language, SXML/XML “models” to contain program models, and tangle/weave (with TeX or HTML, not sure yet) for source style. It’s extremely half-baked as there are a lot of missing parts.

                  1. 4

                    Thanks! Yes, I’m aware of Tim Daly’s work and we’ve exchanged comments in various places on the internet. I tend to think his emphasis on typography – shared with classic Literate Programming – is a tarpit. But it’s working for him, and I don’t really have complete confidence in my approach yet ^_^

                    I hadn’t come across the iMatix/gsl approach, do you have a pointer to learn more about it? That’ll help me gain context on your tool. Thanks.

                    1. 2

                      I tend to think his emphasis on typography – shared with classic Literate Programming – is a tarpit.

                      Having tried it, I am inclined to agree – at least for myself, who is susceptible to typography twiddling. On the other hand, are you familiar with Docco, the “quick-and-dirty, hundred-line-long, literate-programming-style documentation generator”? I have had good experiences with it.

                      Docco (and its clones) take as input ordinary source files containing markdown-formatted comments, and render them to two-column HTML files with comments to the left, and the code blocks that follow them to the right. That encourages writing the comments as a narrative, but limits the temptation to twiddle with typography – it basically lets you keep a nicely read- & skimmable version of the source code at hand at all times.

                      All of this is far more modest than what you are working on, of course. I’ll be keeping an eye on it!

                      1. 2

                        I’m a little familiar with Docco. As you said, it’s a little better because you don’t tweak the presentation endlessly. However, I think it also throws the baby out with the bathwater by dropping the flexibility to reorder parts of a program to present them in the most comprehensible way. There’s a few “quasi-literate” systems like that out there. They’re certainly better than nothing! But I think they have trouble scaling beyond short single-file examples.

                      2. 2

                        The best place to see a good example is: https://imatix-legacy.github.io/mop/index.html They are the ones that wrote ZMQ.

                        The basic idea is simple. Put same data in a serialized hierarchical format, that is also very easy for humans to read (for gsl, this is XML). It’s best if it has a CDATA like feature, for unquoted/raw data.

                        Next, there is what is called the “template”. This is another format of data, that you apply to your xml files to generate output.

                        Finally, there is what is called the “script”. This lets you do things like include xml files as subtrees of others, run them through multiple templates, and execute pretty basic code.

                        Through these pieces, you can generate what gsl calls the model. With a given set of scripts and templates, you can hypothetically reduce the “uniqueness” of your implementation to a series of XML files. zproto and other tools use this to actually generate entire server & client implementations with XML descriptions of your format. You can see a description of how zproto kind of works here : http://hintjens.com/blog:75

                        So it’s the happy combination of a templating language, some xml / hierarchy manipulation tooling, and a simple scripting language. I then took it into whacko land by saying all of that just sounds like lisp macros with some custom reader settings, so why not implement each of these three pieces as scheme, and add on top my desire for literate programming as part of the “generation” work it does. It’s not very far off the ground yet, but I do think there is something in here. We’ll see.

                        1. 3

                          Most interesting. I see the connection you’re drawing with Lisp macros. It sounds like GSL is basically lisp macros that get a tree of data rather than just individual variables. So like lots of nested pattern matching on the s-expression you pass into the macro, so that you can name different parts of it to access them more conveniently. Does this seem right? Now I wish I could read more about it, and that the original series had more posts :)

                    2. 3

                      That looks like it ends up equivalent to AOP, which is a nightmare to debug when done badly (and any widely-used programming tool ends up being used badly). I read of someone doing something like this in Ruby that was allegedly highly maintainable - writing the simple version of a function, thoroughly testing it, then locking it down and using monkeypatching to handle cross-cutting concerns rather than complicating the original function - but it’s so contrary to my experience that I struggle to believe it.

                      What I prefer is something like the “final tagless” style - just enough inversion of control, write your function in terms of commands and you can invoke the same function in either a simple or a complex context, provided it offers those commands.

                      1. 3

                        AOP is very different from MOP in terms of actual day-to-day usage and thought pattern, but I think you’re entirely right that “how will it be used badly?” is a great pattern to try and compare approaches.

                        One part of AOP that makes things more difficult is there is no way to see the steps/expansions visually anywhere other than with tooling. With MOP (largely, I’m only talking about gsl), it’s something explicit in your templates, and you can run iterations of gsl to see what is generated very easily. Hypothetically you could use gsl to implement AOP-style programming, but most of the culture around MOP is to focus on the “model” that is repeated in your business logic (think state machines) and making them easier to see. AOP is much more about taking large cross cutting things like logging or error handling and rolling them up into these aspects.

                        As I’m thinking more about this though, I’m seeing more of your point. It’s a slippery slope to just munging crap everywhere.

                      2. 3

                        This is extremely interesting! I am a big fan of literate programming because it allows me to create a narrative for understanding my program, but LP is not without its problems: you often don’t get access to your regular development tools (syntax highlighting and auto-indenting for your language, integration with IDE-like utils such as Racer for Rust) and this makes development more difficult and painful.

                        Your idea allows the creation of a similar narrative—order the fragments in an order that tells the story to the reader—but also gives the programmer access to his usual coding environment, save perhaps for a quick fix to allows for the directive at the top.

                      3. 4

                        A function name is better in another way: it doesn’t become outdated as easily as a comment.

                        Has anyone else found that making many small helper methods tends to make reading the code much more difficult though?

                        I have found it makes reading the code somewhat more difficult, on the first read through, iff I need to thoroughly understand or review all the code. Once I’ve done that, it makes reading the code easier, because I don’t need to parse every line to understand what it does: it tells me what it does and I know I can trust the name to be accurate.

                        If I only need to fix a bug and know I can trust the author, I know I don’t need to check all the implementations.

                        If you constantly need to check all the implementations, then it indeed only hurts to make many small well named methods. But I assert that means you are doing something wrong.

                        1. 2

                          look up the implementation? I’m confused why wouldn’t the implementation be in the function which contains the helper functions?

                          1. 2

                            Look at the link that mikejsavage posted above for the “helper” functions I’m thinking of.

                            Putting the functions inline inside a function is only useful in languages that support that style (which c++ does wonderfully now), but I’m thinking about a slightly higher point around documentation.

                            1. 1

                              Oh so our definitions of helper functions are entirely different things o__o. cool. I guess author did say helper methods, not helper functions.

                          2. 1

                            Nope. Small helpers make it easier to read, provided they’re well-named and well-typed. As @danielrheath says, if they mutate state they’re very hard to understand - but don’t do that, thread state through explicitly instead.

                          3. 10

                            I have read the article and agree with its examples.

                            An observation: when reading other people’s code, I have almost never thought “this comment is unneccessary and I wish it wasn’t there”. I have frequently thought “I wish this code had more comments.”

                            We should be careful, perhaps, with advice that biases programmers towards commenting less.

                            [Edited to remove stray text.]

                            1. 6

                              An observation: when reading other people’s code, I have almost never thought “this comment is unneccessary and I wish it wasn’t there”

                              I have. I’ve worked with large codebases that have every single line commented, and almost every comment was completely useless, repeating what the code did. Deleting most of them made the code far easier to follow.

                              It doesn’t help that some of the comments that didn’t describe the next line of code were just wrong.

                              1. 3

                                Yeah, even everyone’s favorite whipping boy “increment x” comment can be useful if you focus on the surrounding context.

                                1. 1

                                  Its a whipping boy when it ostensibly gets used like this:

                                  /* increment x by 1 */
                                  x = x + 1;
                                  

                                  Taking that as a stance that its never useful is a tautology, its not useful in the majority of the cases its used.

                                  Personally, I take the approach of documenting the overall what and why. And any exceptions and cases where I know someone will go: this is dumb should be done like X. When I’ve tried that and it doesn’t work i’ve written a half page treatise starting out like: You might ask yourself, why in the world do we do X? What were you smoking? and then launching into the explanation of why if you go down that road, you’re going to find it was a cul de sac and come back to the hack.

                                  All that to say, I think comments are an art form a bit like writing a book. Its easy to produce crap, its harder to be succinct and hit the sweet spot of competing concerns like: out of date comments, wrong comments, pointless comments, comments for the sake of comments, comments that make no sense, comments to appease crazy reviewers, etc.. and the thing they’re useful for imo, conveying things the code cannot.

                              2. 3

                                The one not mentioned here is the comment referencing the ticket number. That’s what the vcs is for, put your comment there.

                                1. 3

                                  That’s very succinct and useful.

                                  I’d like to add an emphasis about all of it depending very much on good judgment. Which only comes with time, so there can’t be any rule about when there are too many comments, or temp variables, or helper functions.

                                  1. 2

                                    Given that this is by someone at Google I wonder whether this is related to Go’s famous:

                                    Function should have comment or be unexported

                                    1. 3

                                      You can tell they aren’t go programmers because they recommend replacing

                                      int width = ...; // Width in pixels.
                                      

                                      with

                                      int widthInPixels = ...;
                                      

                                      while any competent go programmer would recommend replacing it with

                                      int w = ...;
                                      

                                      and perhaps even a zero-width space if go2 will allow it.

                                      (yes this is a joke)

                                      1. 1

                                        Famous it might be, but that error message is new to me, so thanks for posting it. I think it’s rather lovely that the compiler enforces that.

                                        1. 2

                                          I agree. It’s one thing I really like. Especially when it makes people reflect and describe it in a different way than the implementation. Code comments are also great to explain not what you do (the steps), but also why you do them and what the goal is and sometimes writing the documentation alone, which makes one think about a function is enough to create a clearer and “better” solution.