1. 28
  1.  

  2. 22

    I think it comes down to, if someones reading your code, they’re trying to fix a bug, or some other wise trying to understand what it’s doing. Oddly, a single, large file of sphaghetti code, the antithesis of everything we as developers strive to do, can often be easier to understand that finely crafted object oriented systems. I find I would much rather trace though a single source file than sift through files and directories of the interfaces, abstract classes, factories of the sort many architect nowadays. Maybe I have been in Java land for too long?

    1. 10

      This is exactly the sentiment behind schlub. :)

      Anyways, I think you nail it on the head: if I’m reading somebody’s code, I’m probably trying to fix something.

      Leaving all of the guts out semi-neatly arranged and with obvious toolmarks (say, copy and pasted blocks, little comments saying what is up if nonobvious, straightforward language constructs instead of clever library usage) makes life a lot easier.

      It’s kind of like working on old cars or industrial equipment: things are larger and messier, but they’re also built with humans in mind. A lot of code nowadays (looking at you, Haskell, Rust, and most of the trendy JS frontend stuff that’s in vogue) basically assumes you have a lot of tooling handy, and that you’d never deign to do something as simple as adding a quick patch–this is similar to how new cars are all built with heavy expectation that either robots assemble them or that parts will be thrown out as a unit instead of being repaired in situ.

      1. 6

        You two must be incredibly skilled if you can wade through spaghetti code (at least the kind I have encountered in my admittedly meager experience) and prefer it to helper function calls. I very much prefer being able to consider a single small issue in isolation, which is what I tend to use helper functions for.

        However, a middle ground does exist, namely using scoping blocks to separate out code that does a single step in a longer algorithm. It has some great advantages: it doesn’t pollute the available names in the surrounding function as badly, and if turned into an inline function can be invoked at different stages in the larger function if need be.

        The best example of this I can think of is Jonathan Blow’s Jai language. It allows many incremental differences between “scope delimited block” and “full function”, including a block with arguments that can’t implicitly access variables outside of the block. It sounds like a great solution to both the difficulty of finding where a function is declared and the difficulty in thinking about an isolated task at a time.

        1. 2

          It’s a skill that becomes easier as you do it, admittedly. When dealing with spaghetti, you only have to be as smart as the person who wrote it, which is usually not very smart :D.

          As others have noted, where many fail is too much abstraction, too many layers of indirection. My all time worst experience was 20 method calls deep to find where the code actually did something. And this was not including many meaningless branches that did nothing. I actually wrote them all down on that occasion for proof of the absurdity.

          The other thing that kills when working with others code is the functions/methods that don’t do what they’re named. I’ve personally wasted many hours debugging because I skipped over the funtion that mutated that data it shouldn’t have, judging from it’s name. Pro tip; check everything.

          1. 2

            Or you can record what lines of code are actually executed. I’ve done that for Lua to see what the code was doing (and using the results to guide some optimizations).

            1. 1

              Well, I wouldn’t say “incredibly skilled” so much as “stubborn and simple-minded”–at least in my case.

              When doing debugging, it’s easiest to step through iterative changes in program state, right? Like, at the end of the day, there is no substitute for single-stepping through program logic and watching the state of memory. That will always get you the ground truth, regardless of assumptions (barring certain weird caching bugs, other weird stuff…).

              Helper functions tend to obscure overall code flow since their point is abstraction. For organizing code, for extending things, abstraction is great. But the computer is just advancing a program counter, fiddling with memory or stack, and comparing and branching. When debugging (instead of developing), you need to mimic the computer and step through exactly what it’s doing, and so abstraction is actually a hindrance.

              Additionally, people tend to do things like reuse abstractions across unrelated modules (say, for formatting a price or something), and while that is very handy it does mean that a “fix” in one place can suddenly start breaking things elsewhere or instrumentation (ye olde printf debugging) can end up with a bunch of extra noise. One of the first things you see people do for fixes in the wild is to duplicate the shared utility function, and append a hack or 2 or Fixed or Ex to the function name and patch and use the new version in their code they’re fixing!

              I do agree with you generally, and I don’t mean to imply we should compile everything into one gigantic source file (screw you, JS concatenators!).

              1. 3

                I find debugging much easier with short functions than stepping through imperative code. If each function is just 3 lines that make sense in the domain, I can step through those and see which is returning the wrong value, and then I can drop frame and step into that function and repeat, and find the problem really quickly - the function decomposition I already have in my program is effectively doing my bisection for me. Longer functions make that workflow slower, and programming styles that break “drop frame” by modifying some hidden state mean I have to fall back to something much slower.

                1. 2

                  I absolutely agree with you that when debugging, it boils down to looking and seeing, step by step, what the problem is. I also wasn’t under the impression that you think that helper functions are unnecessary in every case, don’t worry.

                  However, when debugging, I still prefer helper functions. I think it’s that the name of the function will help me figure out what that code block is supposed to be doing, and then a fix should be more obvious because of that. It also allows narrowing down of an error into a smaller space; if your call to this helper doesn’t give you the right return, then the problem is in the helper, and you just reduced the possible amount of code that could be interacting to create the error; rinse and repeat until you get to the level that the actual problematic code is at.

                  Sure, a layer of indirection may kick you out of the current context of that function call and perhaps out of the relevant interacting section of the code, but being able to narrow down a problem into “this section of code that is pretty much isolated and is supposed to be performing something, but it’s not” helps me enormously to figure out issues. Of course, this only works if the helper functions are extremely granular, focused, and well named, all of which is infamously difficult to get right. C’est la vie.

                  Anyways, you can do that with a comment and a block to limit scope, which is why I think that Blow’s idea about adding more scoping features is a brilliant one.

                  On an unrelated note, the bug fixes where a particular entity is just copied and then a version number or what have you is appended hits way too close to home. I have to deal with that constantly. However, I am struggling to think of a situation where just patching the helper isn’t the correct thing to do. If a function is supposed to do something, and it’s not, why make a copy and fix it there? That makes no sense to me.

                  1. 1

                    It’s a balance. At work, there’s a codebase where the main loop is already five function calls deep, and the actual guts, the code that does the actual work, is another ten function calls deep (and this isn’t Java! It’s C!). I’m serious. The developer loves to hide the implementation of the program from itself (“I’m not distracted by extraneous detail! My code is crystal clear!”). It makes it so much fun to figure out what happens exactly where.

              2. 2

                A lot of code nowadays (looking at you, Haskell, Rust, and most of the trendy JS frontend stuff that’s in vogue) basically assumes you have a lot of tooling handy, and that you’d never deign to do something as simple as adding a quick patch

                I do quick patches in Haskell all the time.

                1. 1

                  Ill add that one of the motivations of improved structure (eg functional prigramming) is to make it easier to do those patches. Especially anything bringing extra modularity or isolation of side effects.

              3. 6

                I think it’s a case of OO in theory and OO as dogma. I’ve worked in fairly object oriented codebases where the class structure really was useful in understanding the code, classes had the responsibilities their names implied and those responsibilities pertained to the problem the total system was trying to solve (i.e. no abstract bean factories, no business or OSS effort has ever had a fundamental need for bean factories).

                But of course the opposite scenario has been far more common in my experience, endless hierarchies of helpers, factories, delegates, and strategies, pretty much anything and everything to sweep the actual business logic of the program into some remote corner of the code base, wholly detached from its actual application in the system.

                1. 7

                  I’ve seen bad code with too many small functions and bad code with god functions. I agree that conventional wisdom (especially in the Java community) pushes people towards too many small functions at this point. By the way, John Carmack discusses this in an old email about functional programming stuff.

                  Another thought: tooling can affect style preferences. When I was doing a lot of Python, I noticed that I could sometimes tell whether someone used IntelliJ (an IDE) or a bare bones text editor based on how they structured their code. IDE people tended (not an iron law by any means) towards more, smaller files, which I hypothesized was a result of being able to go-to definition more easily. Vim / Emacs people tended instead to lump things into a single file, probably because both editors make scrolling to lines so easy. Relating this back to Java, it’s possible that everyone (with a few exceptions) in Java land using heavyweight IDEs (and also because Java requires one-class-per-file), there’s a bias towards smaller files.

                  1. 1

                    Yes, vim also makes it easy to look at different parts of the same buffer at the same time, which makes big files comfortable to use. And vice versa, many small files are manageable, but more cumbersome in vim.

                    I miss the functionality of looking at different parts of the same file in many IDEs.

                2. 3

                  Sometimes we break things apart to make them interchangeable, which can make the parts easier to reason about, but can make their role in the whole harder to grok, depending on what methods are used to wire them back together. The more magic in the re-assembly, the harder it will be to understand by looking at application source alone. Tooling can help make up for disconnects foisted on us in the name of flexibility or unit testing.

                  Sometimes we break things apart simply to name / document individual chunks of code, either because of their position in a longer ordered sequence of steps, or because they deal with a specific sub-set of domain or platform concerns. These breaks are really in response to the limitations of storing source in 1-dimensional strings with (at best) a single hierarchy of files as the organising principle. Ideally we would be able to view units of code in a collection either by their area-of-interest in the business domain (say, customer orders) or platform domain (database serialisation). But with a single hierarchy, and no first-class implementation of tagging or the like, we’re forced to choose one.

                  1. 4

                    Storing our code in files is a vestige of the 20th century. There’s no good reason that code needs to be organized into text files in directories. What we need is a uniform API for exploring the code. Files in a directory hierarchy is merely one possible way to do this. It happens to be a very familiar and widespread one but by no means the only viable one. Compilers generally just parse all those text files into a single Abstract Syntax Tree anyway. We could just store that on disk as a single structured binary file with a library for reading and modifying it.

                    1. 3

                      Yes! There are so many more ways of analysis and presentation possible without the shackles of text files. To give a very simple example, I’d love to be able to substitute function calls with their bodies when looking at a given function - then repeat for the next level if it wasn’t enough etc. Or see the bodies of all the functions which call a given function in a single view, on demand, without jumping between files. Or even just reorder the set of functions I’m looking at. I haven’t encountered any tools that would let me do it.

                      Some things are possible to implement on top of text files, but I’m pretty sure it’s only a subset, and the implementation is needlessly complicated.

                      1. 1

                        Anyone who truly thinks this would be better ought to go learn some lisp.

                        1. 1

                          I’ve used Lisp but I’m still not sure what your point is here. Care to elaborate?

                          1. 2

                            IIRC, the s-expr style that Lisp is written in was originally meant to be the AST-like form used internally. The original plan was to build a more suggared syntax over it. But people got used to writing the s-exprs directly.

                            1. 1

                              Exactly this, some binary representation would presumably be the AST in some form, which lisp s-expressions are, serialized/deserialized to text. Specifically

                              It happens to be a very familiar and widespread one but by no means the only viable one.

                              Xml editors come to mind that provide a tree view of the data, as one possible alternative editor. I personally would not call this viable, certainly not desirable. Perhaps you have in mind other graphical programming environments, I haven’t found any (that I’ve tried) to be useable for real work. Maybe you have something specific in mind? Excel?

                              Compilers generally just parse all those text files into a single Abstract Syntax Tree anyway

                              The resulting parse can depend on the environment in many languages. For example the C preprocessor can generate vastly different code depending on how system variables are defined. This is desirable behavior for os/system level programs. The point here is that in at least this case the source actually encodes several different programs or versions of programs, not just one.

                              My experience with this notion that text is somehow not desireable for programs is colored by using visual environments like Alice, or trying to coerce gui builders to get the layout I want. Text really is easier than fighting arbitrary tools. Plus, any non text representation would have to solve diffing and merging for version control. Tree diffing is a much harder problem than diffing text.

                              People who decry text would have much more credibility with me, if they addressed these types of issues.

                        2. 1

                          Yes, I’m 100% in agreement.

                      2. 2

                        That’s literally true! I am work with some of the old code and things are really easy. There are lots of files but all are divided into such an easy way.

                        On the other hand, new project that is divided into lots of tier with strick guidelines, it become hard form me to just find a line from where bug occur

                      3. 7

                        Another aspect of code navigation that’s not often given much consideration is greppability/searchability. Bascially, how powerful of a tool do you have to use to be able to statically (or without running it) get a good idea of where a particular line of code dispatches to. Every time an indirection is introduced, you raise the bar for how powerful the code analysis tool is required to keep from having to guess at where something is, unless you preserve the uniqueness of the name used. The two biggest practices that seem to make this sort of analysis more difficult seem to be Interfaces and using RabbitMQ-dispatched microservices.

                        This isn’t to say that using interfaces and microservices are a bad things, but that they trades off easy navigability for some other quality (in the cases that I’m thinking of, Interfaces are used to help testability in C#, and microservices are used for, among other things, reducing IL->x86 JIT times in C#, by breaking up the monolith).

                        1. 3

                          On the flip side, how searchable is assembly? You can search for individual instructions but you can’t search for any higher level patterns in the code, which is what abstractions usually name.

                          It just seems like none of the languages at any level of abstraction lend themselves very well to analysis or exploration. I think it’s partly because of the attachment to representing programs as text, which is limiting.

                          1. 4

                            Note that I said indirection, not abstraction as such. Not all abstractions are represent semantic indirection as such. Function calls, for example, can be static jumps, and are usually pretty easy to analyze, provided that the types in question aren’t crazy.

                            1. 3

                              A grep derivative tailored towards finding specifically patterns in assembly code would make for a really interesting project actually…

                          2. 6

                            In the linked interview, this stood out:

                            Seibel: Have you ever done literate programming à la Knuth or read literate programs?

                            Abelson: Not really. I know the words, but I don’t even know what it is really.

                            Wow, forget about reading source code; now we’re talking about (not) reading papers, books? Ones I’ve heard of?! It’s just a bit surprising. But I think it falls in line with what he was saying earlier about reading just what you’re interested in. I will admit I’m quite interested in it.

                            Often, when I’m reading code, I’m trying to find something: the logic that controls some specific effect, the actual calculation behind a value, &c. After about 3 wrong guesses, I start to feel lost. It reminds me of a class I took where the textbook lacked both a Table of Contents and an Index. It was maddening, not to mention time-consuming, finding information in there. At work, we’ll spend a year hitting Find Anywhere before we would spend half a day “creating the index” and writing good summary/design/postfact docs (I think a good index is curated, not auto-generated).

                            1. 5

                              Some associations that formed in my mind as I read this:

                              a) Conventional approaches to making codebases readable are like still lifes or depictions of a single scene. But since codebases are non-linear, what they need is something more like a mural. Something that can be read in many different ways. Like a Thangka, or a codex (source). Something with multiple “centers” of narrative, to use Dave West’s terminology.

                              b) We typically focus on the ability to find the right information. OP is pointing out that knowing what to ignore is under-emphasized, and the desire to modify provides a powerful “ignore this” heuristic.

                              c) There’s the old story about how paving cowpaths is often a better way to decide where footpaths should go. (The best source I could find is the second occurrence of ‘footpath’ in http://tomslee.net/2008/03/mr-googles-guid.html) In these terms, perhaps we programmers are prematurely paving narratives in our codebase. In the terms of James C Scott, programmers are being Authoritarian High Modernist, in assuming that they understand all the ways future readers may try to read their code. Some humility for the inherent illegibility of the activity of programming (particularly how other people read code) may be in order.

                              d) There’s a body of work on how the human brain comprehends messes. If you see your codebase as a mess that results in some set of ideas for improvements. But the jungle metaphor results in a whole other set of ideas.

                              e) When we programmers read code, it is purposefully but non-linearly. Like Bruce Willis in “Die Hard”: barging through walls, moving through elevator shafts and air-conditioning ducts. However, when we write code, we assume an idealized well-behaved reader who will read our creation aimlessly but in order, content to follow our lead in all regards. This is a pretty big disconnect in all our minds.

                              f) Often you have something useful to tell the reader, but no obvious place to put it that the reader is likely to see at the right time. Perhaps making code easier to explore is just a matter of creating Schelling points.

                              (Posted originally in a comment on OP.)

                              1. 3

                                I’m hearing echoes of Seymour Papert’s theories of psychological constructivism: https://en.wikipedia.org/wiki/Constructionism_(learning_theory)

                                1. 3

                                  Indeed. Also Peter Naur’s “programming as theory building”. There’s a CS subculture here that is woefully under-appreciated.

                                  1. 2

                                    Yes! I’m less familiar with Constructionism itself, but I love Mindstorms.

                                2. 3

                                  Agree with the larger point, but have trouble imagining the first suggestion being maintainable – projects already have trouble keeping docs up to date and keeping feature nubs (which are likely tightly tied to other implementation details) around seems quite difficult.

                                  Also not sure the initiative would be there on the user side, given the article’s (accurate) statements around people mostly choosing to open up a codebase when they need it to do something it doesn’t already

                                  1. 1

                                    ..projects already have trouble keeping docs up to date and keeping feature nubs around seems quite difficult.

                                    It shouldn’t be hard to check for consistent feature nubs in CI. And really that’s all it takes. Python’s doctests have shown that it is possible for documentation to be kept up to date. The mainstream now finds updating code without tests to be unacceptable. Outdated documentation will hopefully soon get the same bar. Maybe it takes a little extra effort, just like it takes extra effort to write tests. But both yield dividends in the long term.

                                    not sure the initiative would be there on the user side, given the article’s (accurate) statements around people mostly choosing to open up a codebase when they need it to do something it doesn’t already

                                    OP doesn’t take a position on whether the desire people crack open a codebase for is small or large. I’ve certainly tried many times to navigate the code for open source projects like gcc, vim and firefox. I wasn’t expecting it to be easy. I would totally have accepted subgoals of understanding something over a few weeks.

                                    1. 3

                                      From my understanding feature nubs would need to be coupled to implementation and in a separate branch/commit/etc. Folks have a lot of trouble keeping functional patches up do date with master branch (e.g. a lot of the work neomutt has had to do). Asking a OSS team (whose code constitute most of the code one might read) to do so seems like it might be a pretty large burden on top maintaining a project.

                                      Well he may not talk about the number of people who want to read code (not positive I got what you’re saying there), but he does open with “Most programmers agree that we don’t read enough code”. Yes, but it is important to note that you are someone who cares about reading code (you hosted this article and write about this on your website). I think once you look at the subset of people who read code, then the subset of those who might do these excercises, and then the subset of those who would go on to contribute to the project, it may start to feel like a pretty high cost.

                                      1. 2

                                        There’s room for maneuver with the tooling. @smalina and I hack on a project written in a form of Literate Programming where every feature is in a separate file. Just one example of how it’s possible to reduce the management burden for such scaffolding. If we decide it’s useful we can solve these problems.

                                        Even with a separate branch, it should be useful even if it’s not up to date with master, right? Better than nothing? I think we make bigger compromises everyday with conventional workflows.

                                        I think once you look at the subset of people who read code, then the subset of those who might do these excercises, and then the subset of those who would go on to contribute to the project, it may start to feel like a pretty high cost.

                                        If you start with the premise that the goal is to combat personnel churn, it may well be worthwhile. Think of companies or open source projects as pipelines turning smart noobs into experts on their code.

                                        1. 2

                                          Paraphrasing myself, I’m much less sure of the solution than I am of the problem. I apologize that I’m about to address your points out-of-order but I think your second point is actually more general.

                                          On the topic of the subset of people who would actually go to the trouble to go through a project’s exercises, I’m as skeptical as you are that any of this will turn un-motivated people into avid code explorers. But that’s OK with me! If I can recommend things that actually make reading programs more pleasant for people like myself, akkartik, you, and many other Lobsters readers I suspect, that’s a huge win in my book.

                                          You make another good point that it’ll be hard to get maintainers to write exercises or “maps”. I agree that getting people to keep normal project docs up-to-date is hard already, and if I were arguing maintainers should just do more documenting and more explaining it would be a pipe dream (it may still be). However, what I may not have emphasized enough in the post is that I think it would be a net win if some projects did less module-level documenting in exchange for more high-level mapping and commit pointing. In other words, same amount of work just a different emphasis. Two other points about commit katas as I imagine them:

                                          1. One nice thing about a commit kata as opposed to a tutorial is that it can’t become stale. Having gone through a bunch of tutorials recently, I’ve been on the wrong end of stale instructions and they’re a huge motivation zapper and an annoying time sink for maintainers who have to deal with or write PRs to fix them.
                                          2. If you can really set up a way to host or point to multiple versions of a kata and people actually do it, you get a nice network effect where early kata contributors provide “free” help for later contributors.