1. 36

Retitled to

Is legacy code hard to read or do you just not get it yet?

in response to feedback. Thanks!

  1. 33

    This POV is dangerous imo.

    Which philosophy would you rather your team adhere to?

    1. If it’s taking everyone else too much time to understand your code, you need to make it more clear.
    2. Everyone here is a professional. If you can’t read someone’s code, it’s your fault. Try harder. Spend more time.

    Which philosophy do you think will lead to a higher quality code base?

    I think most people would choose 1., and if you are inclined to choose 2. it’s because taking the time to be clear and communicate well through your code is hard or annoying for you.

    There are orders-of-magnitude difference in simplicity and readability in code bases, and I think pretending otherwise is bad code apologism.

    discover what problem it actually intends to solve, as opposed to the problem I thought it solved

    Why wasn’t that clear from the names and code structure? Why wasn’t that documented?

    figure out some complexities involved in solving that problem and stop underestimating it start understanding how the code addresses that complexity

    start understanding how the code addresses that complexity

    What if it addresses it poorly? What if what is currently addressed by 300 lines of opaque code could have been addressed in 40 lines of clear code? This is not hyperbole… such differences are common.

    start understanding how a lot of the seemingly unnecessary complexity deals with relevant use cases and edge cases

    Much more common, ime, is that after gaining a full understanding you realize a much better and simpler solution exists.

    understand that the structure of the code makes sense, solves the problem and cannot be improved in obvious ways

    This is incredibly rare.

    1. 10

      Pretending otherwise is bad code apologism.

      I don’t intend to pretend otherwise. I’m just focusing on my experience in which quite a few codebases turned out to be reasonable and especially more reasonable than I initially thought.

      Why wasn’t that clear from the names and code structure? Why wasn’t that documented?

      It was. I just hadn’t understood yet. Like reading a book for the second time and discovering all those things you overlooked the first time.

      Example: some time ago I adapted the scripts of the Debian logcheck package for our purposes. My initial reaction to viewing the code was the usual: “this seems overcomplicated and hard to follow”. I had originally read the man page, but of course I only remembered what was relevant for our purposes. Reading the code and the documentation again it became clear that it addresses many more use cases than just ours. All the use cases it addresses are clear from the documentation, names and code structure, but only once you realize what they are. Also I had to get used to reading a shell script of that size again. Before I went back and forwards a few times, I did not have a clear picture, which made the code look overcomplicated. It isn’t: it’s pretty nice and was quite easy to adapt, requiring only precision surgery, once understood.

      What if it addresses it poorly? What if what is currently addressed by 300 lines of opaque code could have been addressed in 40 lines of clear code? This is not hyperbole… such differences are common.

      I’ve encountered those cases as well, but they haven’t been common for me. Perhaps I’m just lucky.

      Much more common, ime, is that after gaining a full understanding you realize a much better and simpler solution exists.

      Does the fact that a better and simpler solution exists mean that the original code was bad? I believe a lot of code is ‘reasonable’, even if it can be improved.

      Example: I’m currently revisiting an 8 year old codebase mostly written by myself own where I can reduce the code by quite a bit by using better (and fewer) abstractions and reusing libraries. However, an important part of why that is possible is that we have developed better abstractions, better libraries and a better understanding of the more important and less important aspects of the problem we are solving since then. The solution is structurally almost the same, but simpler. Even counting the library code it’s less code and more robust. I would definitely call it ‘better’. Was the original code bad? I don’t think so. It was suboptimal, but on a scale of 0-10 it wasn’t below a 6. The possible changes weren’t obvious.

      understand that the structure of the code makes sense, solves the problem and cannot be improved in obvious ways

      This is incredibly rare.

      Perhaps we have a different understanding of ‘obvious’. I’ve worked on codebases where, after understanding the code, thinking (and experimenting) for a couple of days resulted in important improvements (which then took several additional days of work to actually perform). I don’t count such improvements as “obvious”.

      (Aside: really bad code can make finding the improvements harder, but I’m talking about ‘reasonable’ code where nevertheless significant improvements are possible, but they take quite a bit of time to figure out)

      1. 4

        Again, your point as stated above is much more reasonable and I agree with a lot of it. A few points I’d still push back on…

        All the use cases it addresses are clear from the documentation, names and code structure, but only once you realize what they are.

        I actually think not having the the code itself reflect the use-cases is a major problem. If you meant that there are exponentially many use-cases that can be fulfilled by combinations of cmd-line flags, etc, even there the kinds of use cases that are being addressed should exist in the code, if only in comments. Leaving it as implicit information that other devs must puzzle out for themselves is unacceptable. If, after puzzling out, you realize the code is a very elegant implementation, the code should still be faulted for making you puzzle it out yourself.

        I would definitely call it ‘better’. Was the original code bad? I don’t think so. It was suboptimal, but on a scale of 0-10 it wasn’t below a 6.

        I see no contradiction in saying simultaneously:

        1. Writing the code this way was the correct decision at the time
        2. This code is bad now

        And 6 is pretty bad imo. Which doesn’t mean it must be fixed right now – that’s a decision with more inputs.

        On this point we’re sort of arguing the semantics of “bad,” but I will say that I think, in general, better results come from having high-standards for yourself and others than come from “accepting that things are messy and imperfect.” I’m not saying that latter POV is never appropriate, but it is a very slippery slope, and I think it will more often lead to shoddy, lazy work than it will lead good, balanced work that’s not overly burdened with perfectionism – which I think is the implicit claim of your argument.

      2. 7

        I think everybody’s conflating a lot of things:

        1. What does this code do?
        2. What problem is it supposed to solve?
        3. What are all the intermediary pieces used to solve that?
        4. What does it do that it shouldn’t? What should it do that it doesn’t?
        5. How do I know any of this?

        The difficultly in assembling the answers to these questions, especially coming in cold, might best be called “cognitive load”

        I think there are probably automated ways to measure cognitive load. There are also some obvious trade-offs. If I have a bash script with 20 lines of code that solves a problem and is valuable? So what if it takes 2 hours to understand? Isn’t that better than a 30k codebase that solves the same problem and you spend two weeks thrashing around in the codebase figuring out what it does and how to fix it?

        Many coders would look at the bash example and balk. That’s a crazy amount of time to spend understanding just a few lines of code! But what they’re missing is that it’s not the line-of-code-count. That has nothing to do with anything. It’s the time it takes to understand everything you need to understand and provide value, no matter what the language or line count. Similarly, a huge codebase that reads like beautiful English might be a much worse situation. It could make you waste a lot of time reading and in the end give you the appearance of understanding these things without your actually understanding them.

        1. 1

          The title is also a bit… too simply put Of course, when you “get” it, it’s simple. The whole problem is that some code is hard to “get”, and becomes a time sink.

        2. 15

          Digging into unfamiliar code bases is a very important skill that not enough people have mastered. Throwing it away and rewriting “seems” easier, but that’s just hubris. Typically, you’ll have to re-learn all the things that the people working on that code base learned the hard way.

          Of course, some existing code bases really are shit :) But better suspend that judgment until you know why the code is the way it is.

          1. 17

            Of course, some existing code bases really are shit :) But better suspend that judgment until you know why the code is the way it is.

            Chesterton’s Fence is a really good first principle for code archaeology. Try to remember that the people who built this Cyclopean tangle of non-Euclidean nightmares you’re neck deep in a) didn’t do it all at once b) were operating in a different context than you are in now and c) almost certainly weren’t setting out to screw the maintainers.

            The principle of charity is an underrated tool in a programmer’s toolbox.

            1. 7

              I will add as a corollary or addendum, that many people fail to re-examine their code to check if the prior conditions justifying odd constructs remain true. It’s wonderful that you discovered this neat workaround to allow compiling for 16 bit windows. But why do we still care, today?

              1. 5

                For every workaround or ‘patch’ I insist people document when it can be removed. Or at least ‘what you can do to quickly verify it is still needed’. Usually ‘when we upgrade to version x of some library’, “when there are no instances of ‘special case’ in any customer databases anymore” or ‘when we implement a cleaner solution for use case y (see issue Y in our issue tracker)’.

              2. 3

                I have been using this without knowing its source for years so thanks!

            2. 10

              Even though I don’t agree with the overgeneralization that the title might suggest, I think the main takeaway is

              wait until you actually understand it before concluding anything

              I have seen plenty of cases of starting team members who jump into conclusions about some codebase without having really tried to understand it. Often the case is that the code is not organized as they would have done it; sometimes the domain itself has a high level of complexity that is not so obvious; sometimes people claim that “implementing X should be way simpler than that” but actually that codebase has to handle X under the circumstances of Y and Z and that’s why it’s more complex than you expected.

              These are all situations that people must learn to deal with. There are cases where all that is true but still the code is more complex than it should, so it should really be heavily refactored or rewritten; but the first step is to make an effort to understand it — or else how do you guarantee that you understand the behavior that you’re trying to reimplement?

              1. 7

                I dispelled myself of this myth by going and reading some code. There’s lots available.

                I discovered a lot by doing this! The world doesn’t have a shortage of legacy code to go try and to read.

                1. 4

                  Good suggestion! Perhaps we should have a thread recommending good code based to read and study.

                2. 5

                  I still think that code should be relatively easy to follow if you have the necessary domain knowledge. If it’s not (for example, because it’s not clear what problem the code solves), you should add a comment. I’ve seen a lot of hate to comments, and I don’t understand why. Who cares if it makes your code twice as long? I spend the majority of the time thinking anyway, typing takes like 5 extra seconds, and it helps me to “see the bigger picture”. If you don’t need to comments, fine, just skip them. In most editors they have a low-contrast color anyway, so it’s no mental burden to skip them (unlike extra code, where you have to decide for each line “does this matter?”).

                  1. 4

                    Comments go stale, the code doesn’t. There are exceptions of course. A well placed comment can be very useful on a particularly obtuse line; a line in a section that’s conserved, ie resists change due to it’s utter importance:


                    1. 1

                      Every fault-resistant system relies on multiple, redundant, sources of information to detect faults.

                      The sentence above paraphrases a tweet that somebody wrote, some time in the last few months, in reply to either @hillelogram or @graydon_pub. I looked for the tweet, but could not find it again; I wish I could credit the tweeter. I believe the context was a discussion about type systems, which are equally unable to point out where your code is wrong, only where it is inconsistent.

                      Detecting faults in the code, and keeping code correct: both are easier if the intended behaviour is recorded outside the code as well as inside it. Then wrong code will manifest as a code-comment inconsistency, and so will a wrong comment. To create a hidden fault, you need to mess up both the comment and the code.

                      Comments go stale, the code doesn’t.

                      There is only one, limited, sense in which code ‘does not go stale’: its behaviour will always be the actual behaviour. What we care about, however: does the actual behaviour match the correct behaviour? Here, comments and code become stale in the same way: when they cease to describe the correct behaviour.


                      D. Richard Hipp (of the SQLite fame) made a convincing case that comments make you articulate the idea two ways, which can reveal errors in the thought process. – @laakeus [My addendum: it might be in this Wrocław 2009 presentation, p80-p94. SQLite, A Lesson In Low-Defect Software - or - A Journey From A Quick Hack To A High-Reliability Database Engine, and how you can use the same techniques to reduce the number of bugs in your own software projects –Sietsebb]

                      1. 1

                        About slide 84 if anyones interested. Dr. Hipp’s opinions do carry some weight, but I would point out that Sqlite is a large project with relatively few, high calibre contributors whereas the typical commercial project has developers of varying levels of ability and engagement. And “correctness” isn’t really relevant other than marketing isn’t complaining about the site (a banal example).

                        1. 1

                          Good point! Hipp would probably grant you that point, too: in an earlier slide he writes

                          End result of DO-178B/ED-12B….

                          • Software that has very few defects also…
                          • Expensive software
                          • Software that takes a long time to bring to market
                          • Boring software

                          The points ‘expensive & slow to market’ are not often what small companies can afford.

                          As for correctness of websites, that does indeed mostly take place outside the realm of ‘verifiably correct implementation of algorithms on state+inputs’. I would add one exception, though: websites v. often process personal information, even personally identifying information. The code we write has to be so that we can guarantee it keeps PII secure; that is part of our professional duty.

                    2. 3

                      We agree that code should be relatively easy to follow. The question is: when do we think code is ‘easy to follow’? I believe that is almost never the case the first time you read it, but can often become the case after you’ve spent some time getting acquainted with it. So we shouldn’t judge code based on our initial impression, but on our verdict after we’ve spent some time getting acquainted with it. And we should be wary of the ghost of the initial impression.

                      1. 3

                        This is fair, and I’m more inclined to agree with the thesis as stated above than the way it was presented in the article.

                        Still, though, the “length of time it takes to get it” is a very important metric, and code bases can vary widely in that metric. I’ll grant you that “the experience of the dev reading it” is one input to how long it takes, but clear naming, good architecture, modularity, etc… are also major inputs.

                        So while you’re right that it’s always good to step back and ask if your own ignorance or inexperience with the design principles being used are what’s making it hard to read… that doesn’t mean you shouldn’t ever look for fault in the code itself. And often, if you are experienced, it’s crystal clear when the code was written by less experienced people (or rushed people, etc) who have simply made objective mistakes, or sub-optimal decisions. I mean, that is a real thing. The title of your piece seems to suggest it’s not.

                    3. 5

                      I wonder if some of the people taking issue with the article aren’t missing the forest for the trees.

                      I see this piece as an exhortation to treat every obstacle as an opportunity and to stop fueling the rampant negativity I see wearing away at many people in tech, causing them to think the grass is greener on the other side of the fence, take a new job, and only then realize that all they really gained is a short lived dose of novelty to hold off the tedium and cynicism that will soon overwhelm them again.

                      I don’t see it as an excuse to write bad code or lower the bar for readability and maintainability.

                      1. 4

                        Then there’s the legacy code that even the owners have disowned.

                        Reload is the GCC equivalent of Satan. […] What does reload do? Good question. The what is still understandable. Don’t ask about the how.

                        Face it: some code is just garbage, and there’s no easy rule to quantify it.

                        1. 4

                          {a subject} isn’t hard, you just don’t get it yet.

                          This is how knowledge acquisition works. If you get it, then it’s easy.

                          1. 3

                            Very true, but we’ve also learned a lot about how attitude and mindset can influence how quickly you “get it”.

                            Having the self confidence to take a step back, breathe, and say “This is just code. I will pick it apart and learn it until I understand it better” can be incredibly empowering.

                            1. 2

                              I have two different responses to this.

                              One is a bit off topic: I’d say some things are still hard after you ‘get it’. They require significant thought every time you think about them. That can be due to intrinsic or accidental complexity.

                              The other response is that the question I address is: if you don’t get it yet, do you blame yourself or the code? I have the impression programmers tend to blame the code too often and even retain a feeling that the code is to blame after they’ve realized and admitted they just didn’t understand it. I also regularly hear colleagues assert some code is too complex, even though after years they still haven’t come up with a better solution.

                              1. 1

                                Well… is not black and white as that. Comparisons to existing solutions are there.

                                For example, I once had to integrate IBM QRadar into application as log aggregation server. I got PDF of 100+ pages. Compare that to ES or whatever FOSS tool taking single curl call.

                              2. 3

                                In my experience majority of corporate code is is shit. This week I witnessed some of that shit, for example reduction of SQL code from 200 lines to single one. The original code was from company specializing in DWH/BI. I have seen bad designs from MS, Oracle, IBM, HP etc. employees that were all working with my company (its gov, hence the big players).

                                Not having automatic tests is rule, having them is an exception I almost never witness. The same goes for documentation. IN general, automation of any kind is not there.

                                People hanging out on github or here are not the norm. Speaking about it, almost nobody I worked with (and in previous few years I was exposed to thousands engeneers/devs) has Github account (or if it has, its unused), and majority of people don’t know to use git properly or not at all, even devs. (sysadmins never know).

                                In my company alone, I regularly see commented code tryouts (more then code itself).

                                Most of the code is procedural even in OO paradigm, and there are ALWAYS multiple concerns in each procedure.

                                In such environment, good code is not something you will find and fixing some of it may be nightmare - ironically, sometimes is not and very easy to follow given that programmers knew only few concepts.

                                Now, I also had exposure to few senior programmers, and their code was extremely complicated, overgeneralized, optimized in advance and all in all hard to follow until you really devoted time. Why do we have 99 classes when 1 or 2 will suffice ?

                                Like in any discipline, 80% of everything is junk. There is no way around it, only rationalizations.

                                I am in the if you want something you need to do it yourself camp, when I want stuff proper. Depending on topic and time involved, it takes less time to recreate something then to fix existing crap.

                                What I want to adopt next is Architectural decision records (ADR). Docs and tickets are not enough. People need to know, after X years, why we have chosen such and such and that it may be moronic but academic solution didn’t work.

                                1. 1

                                  How would you do ADRs? As comments on the top of code? Under docs/ ?

                                  1. 1

                                    I would keep them on docs repository/path as a part of the technical documentation.

                                    If you ask me how I would connect code abstractions to ADR file, what I usually do is put a lot of README.md’s in the folders all around that can mention stuff like tickets, ADRs, etc. I just love README.md because you can put it anywhere, and is closest to the source possible without polluting it and with all major git frontends displaying them asap.

                                2. 2

                                  I don’t really agree with this, with the exception of developers who aren’t familiar with with the domain or the language/paradigm/libraries being used.

                                  In general, good code makes it easy to “get it” - that’s what makes it good code.

                                  1. 2

                                    Heh. If I had a nickel for every team I’ve seen rewrite a codebase they didn’t understand without realizing that the main reason the new one is easier to “get” was going through the process… I’d have like 25 cents.