Threads for GeoffWozniak

    1. 18

      Where I’ve seen the pyramid of death, it’s been where a “single point of return” code style was enforced. I get that it can be complicated to monitor control flow when there are a ton of exit points, but I don’t think it’s ever materially impacted my debugging. So, I generally early-out like the article advocates.

      1. 8

        My anecdote to the “single point of return” mantra is that it’s pushed by those worried about resource management. The early return version of this in C code is the goto error style of handling resource cleanup. Single return there has a better argument (although I’m not all that convinced by it) and it seems that situation is blindly applied to every other.

        1.  

          That would make sense. This was C# code, though, so there wasn’t much manual resource management.

          1.  

            I think holding out for single return in code with very little resource management makes even less sense.

            Personally, I find that mandating a single point of exit is almost always flawed. MISRA-C requires it (or at least used to, but it still advises it, as I recall) and every defense of it I see is weak. I suspect the fact that MISRA still pushes for it is why this trope lives on.

            (This is not to say that you should never aim for a single exit point in functions or procedures. It’s not a bad rule of thumb. But mandating it always ends up with some ridiculous-to-read code.)

      2.  

        In Rust I like to put that logic into a different function and have that function return an enum. Easy to test, and where the main interface ends up very readable.

        Of course it can’t always be done that way but it’s nice when it can.

      3.  

        The only time I prefer a single exit point is when I need to do some cleanup before exiting or in C when I want to make sure the compiler will elide the copy (since C doesn’t have those semantics, the lesson I’ve learned is that you should have a single return point and then most compilers will be able to optimize accordingly, but I’m really not sure the extent to which this is necessary since I haven’t verified this for multiple compilers in some time).

    2. 1

      Hi Lobste.rs! Excited to share the software engineering quiz we’ve been working on.

      I’ve been writing and teaching about expert topics in software design for a long time; you might have seen some of them here. This is my attempt to condense many of these ideas into a small interactive format that can produce a sense of “Wow, there’s a lot of deep ideas I don’t know!”

      The quiz is very short, but we’ve put a lot of work into getting a broad range of ideas into just 5 questions, and also making the correct answers ironclad in spite of the minimal context, and also trying to preemptively answer every objection that comes up, including (and especially) the idea that there are no objective answers in software design.

      Is your reaction “Wow, this is interesting” or “Gawd, these guys are such know-it-alls”? Excited for your feedback!

      1. 38

        I think the quiz is frustrating because it’s being deliberately obtuse. The “correct” choice is often worded in some manner that is tangential to the problem and only makes sense if the test taker can read your mind.

        Take the very first question: “What is a design issue with this class?” The correct answer is “A) The constructor should take two parameters, not four” which is a very, very weird way to say “You should use the type system to make it harder to pass the parameters incorrectly.” The issue is not that there are four parameters, that is completely beside the point. It really feels like you’re going for a “gotcha!” moment the way the questions are worded.

        1. 7

          I figured that this was the answer by process of elimination, but I agree that the wording is weird.

          What I found annoying about this particular question is that it says “two instead of four”. Why limit ourselves to only two extra types (/s)? If a programmer could mess up the order of x, y, width, and height, they could also mess up the order of x and y when creating a point or width and height when creating a dimension! In that case, maybe what we really want is to create the types HorizontalPosition, VerticalPosition, Width, and Height. We could use those to create a position and dimension, but now that all components have different types, maybe our four-argument Rectangle constructor isn’t so bad since it’s impossible to mess up the order.

          Similarly, the methods area and perimeter both return an int even though they are different kinds of measures. Surely if a programmer can mess up the order of the parameters of a rectangle, they can most certainly use a perimeter where an area is needed, so we should protect them by introducing more types, e.g., Perimeter, Area, Volume, etc.

          I’m being intentionally cheeky because I don’t believe in universal answers to programming design problems. If the users of the Rectangle class are used to creating rectangles with x, y, width, height, then having a pos, dim constructor would create unnecessary friction for them. If the system is to deal with a huge number of rectangles, the idea of storing rectangles in arrays (either row-ordered or column-ordered) could be a better design than a Rectangle class. Let’s not automatically create extra abstractions because some high-level principle says we should.

        2. 1

          Take the very first question: “What is a design issue with this class?” The correct answer is “A) The constructor should take two parameters, not four” which is a very, very weird way to say “You should use the type system to make it harder to pass the parameters incorrectly.” The issue is not that there are four parameters, that is completely beside the point. It really feels like you’re going for a “gotcha!” moment the way the questions are worded.

          Hi Relax,

          Would it help if the wording of the question was changed to “Which of the following changes would fix the design issue in this class?”

          If the answer was phrased “Use the type system to make it harder to pass the parameters incorrectly,” that would give away the answer. Writing plausible distractors is tough, and writing ones that look as plausibly correct as the “Use the type system…” is toughest.

          Originally I wanted to make the quiz open-ended where it would ask for a free response, but, implementation cost aside, that would make it much more cognitively demanding on the quiz-taker.

          1. 8

            The problem is that there isn’t any single “design issue” with the class that needs to be fixed, in the general case. There are lots of issues with the code, all with different impact and relevance depending on details of context which aren’t part of the question as stated. For example, there are plenty of situations where not caching the area/perimeter calculations represents a far larger design issue than the parameters accepted by the constructor.

          2. 4

            It might help to reframe the question as “Identify a footgun for the end user of this class”, because it may be easier to write distractors for that without tipping your hand entirely.

            1. 1

              Hmmm…so, something like this?

              Q: Identify the most significant footgun for the end user of this class.

              A. The arguments must be passed in in a certain order. B. The getArea and getPerimeter functions recompute it each time C. Oh dear, I’m having trouble coming up with more distractors thanks to the narrower scope

              1. 4

                I see your point, and it is a tricky one pedagogically. I think the issue we all bump into here is that you’re asking folks to skip the step of identifying the problem and to jump straight to evaluating the best solution–and as you’ve seen here, a lot of people disagree with the diagnosis of the problem.

                One classic technique is to take the correct answer and create a couple variants that are just a bit wrong: for example here, offering to have three arguments (anchor, width, height presumably) or having a single argument that’s just a map, or named arguments ala Python.

          3. 2

            I’m not really sure that’s much of an improvement. It might be better to ask the taker, “What is the biggest problem you see with this class?” and adjust the answers accordingly.

            Multiple choice tests are tough to do right – in avoiding giving away the answer you’ve gone completely the other direction. If that’s what you’re going for, great, but just realize it makes for a frustrating experience for the taker.

        3. 1

          I think the quiz is frustrating because it’s being deliberately obtuse. The “correct” choice is often worded in some manner that is tangential to the problem and only makes sense if the test taker can read your mind.

          I’ve re-read the quiz with this lens in mind.

          I can see that being a fair criticism of questions 3 and 4. (I have some ideas for how to improve question 3; less so for question 4.) Are you intending the criticism to apply to any of the others?

          I know you cited question 1 as an example, but I don’t find that a fair cop, because the discussion explains why, yes, it really should take two parameters even if passing parameters incorrectly was not an issue. For example, if you see a C function that takes a char* and an integer for its length, hopefully we’d be agreed it would be better off taking a proper string type, even though argument order is not an issue.

          This might be an argument in favor of making the quiz longer, trying to find questions that only rely on one point, as part of the lesson seems to have been lost here.

          1. 4

            I’d say it applies to question 2 as well, although there the answer is somewhat “obvious” through the process of elimination.

            Definitely question 5 too – it feels like a simplified version of something more interesting but we’re supposed to know how the more real version should be refactored.

            The answer to 3 is kind of ironic considering it’s avoiding the type system solution style of question 1.

            Anywho, I know this took a lot of work and is probably hard to receive criticism, so I really respect that you’re listening and trying to make it better all around.

      2. 24

        Saying these all have objective answers, and then justifying these objective answers with one person’s courses and strangeloop talks, makes it hard for me to accept these are objective.

        1. 11

          “Wow, there’s a lot of deep ideas I don’t know!”

          Just looking at the first question, I’ve lived through situations where all the answers are legitimate concerns, with the constructor being the least of them. So to say that the answer is objective is, frankly, objectively wrong.

          1. 1

            Hi Geoff,

            Can you clarify about the other situations? Are you saying that you’ve lived through situations where e.g.: caching or changing an int to a double was required, or are you specifically saying you have an example situation which is closer to the question? If so, can you elaborate?

            1.  

              As much as I do not want to drag this thread out…

              Here are the choices presented to me for the question on a Rectange class.

              1. The constructor should take two parameters, not four
              2. The application should pass around arrays of integers instead of the Rectangle type to avoid the overhead of classes
              3. Rectangle should be made to be an implementation of an IRectangle interface
              4. The class should cache the area and perimeter instead of computing them anew each time

              The “correct” answer, as described in the quiz, is 1. This is a reasonable thing to do, but not objectively better than the other three because there simply isn’t enough information to make an informed choice.

              First, there is no information at all about an IRectangle interface in the question. Does this make it a bad design decision? As the reader, I have no idea. For the sake of argument, let’s discount it entirely as an answer because of the lack of information. But I have been in situations where, say, the framework I’m working in requires the use of an interface, or perhaps coding standards do. Is the framework or coding standard a good design? It doesn’t matter because by not following it or attempting to subvert it in the name of supposed objective truth you are creating either massive churn or political upheaval. If such a situation applies, then making sure it follows the interface requirement is more important than the constructor.

              How about passing around arrays of integers? This is absolutely more important than the number of arguments to the constructor if performance really matters. Game programming is filled with these kinds of data-oriented design decisions because accessing flat arrays of raw data is considerably faster than chasing pointers. And then, maybe you want a four argument constructor (or even more!) because regularly computing the values uses too many cycles compared just looking it up. This also covers the other case about caching the area and perimeter: you may even pre-compute them in the constructor. I have applied this approach many times for highly utilized data structures.

              At best, this question could be said to be “objectively correct” for the incredibly narrow use case that is presented. But to generalize and imply that other decisions are incorrect is itself incorrect. There is plenty of experience out there to refute it. Would I typically use a two argument constructor for a rectangle class? Very likely. Will I always? Don’t count on it. The same can be said for the rest of the quiz. I can’t regard this quiz as anything useful and would not recommend it to anyone studying software engineering.

              1.  

                Hi Geoff,

                You may have noticed this question has been changed since last time you took it. The IRectangle answer replaces the “change int to double answer” (partially for consistency with the upcoming Typescript version, where int and double types don’t exist). The question now only asks “Which of the following is mostly likely to be a design improvement to this class,” and it sounds like you’re in agreement.

      3. 15

        Is your reaction “Wow, this is interesting” or “Gawd, these guys are such know-it-alls”? Excited for your feedback!

        Definitely the latter. Making up some “deep principle” according to which some answer is correct does not make the answer objective.

        1. 4

          OK, after doing the whole thing I feel more positive. I mostly agree with the principles. Things that irk me a bit are the way that the quiz presents itself as being the ultimate source of truth. If you change the tone of the quiz to be more about “thinking about code architecture” instead of “applying design principles” my opinion would be much more favourable.

          Feedback, a bit more structured:

          1. The first question feels contrived. There is no context, and in my opinion a design is good when it works well in the context that it’s in. Without context, this is a simple implementation, which is good. Complicating it with introducing classes/structs for the dimensions and position seems bad to me.
          2. The whole premise of being objective is flawed, in my opinion. Different “deep principles” are often at odds. For example, in the first question, the principle “keep it simple” is directly at odds with the “English language test” (which I have never heard of). I like using a single design principle which is called “use common sense”.
          3. The answers are very verbose, because you are essentially convincing someone your opinion is right.
          4. The whole “99% of software engineers get this wrong” thing… The kindest way to put it is “I wouldn’t do that”.
      4. 8

        Honest feedback, with a little context first:

        1. I’m on your email list and enjoy your articles
        2. I didn’t disagree with any of your explanations

        That said, the questions were very confusing and lacked necessary context. It felt like a game of “guessing what the teacher wants”. Sometimes I could guess. When I missed it was always a tossup between the correct answer and some other answer that would be correct in another context.

        Two hallmarks of good code by any standard are clear communication and avoiding ambiguity, which this quiz does not achieve. Indeed, I think every single question had some amount of ambiguity:

        Question 1: It’s a GUI environment – perhaps one where millions of rectangles are being drawn. How do I know the lack of caching isn’t relevant?

        Question 2: This one was easier to guess right, but still the phrase “is just plain incorrect, even though it always works” threw me off.

        Question 3: Correctly guessed that we needed some version of a “type” for the config, and just assumed that’s what you were getting at with “enum”. The phrase “contains the substring “(140, 200, 200)” or some equivalent” just seemed odd and I don’t think you can reasonably assume people will connect that with “a phrase you must use when constructing the type you need”.

        Question 4: How do I know that the copying of what could be thousands of comments on thousands of posts at scale won’t be a problem? How do I know that concurrent modification will be an issue? Maybe the comments are read only? Again, the intended answer is a perfectly reasonable point – my problem is that there are other reasonable points here as well.

        Question 5: Often inlining is the best solution. Especially when the logic is only used in one place. How are we to know it’s not? Making functions for everything, while (yes) providing the encapsulation you mention in the explanation, can also make a code base much harder to read. Depending on the situation, I might think the two function solution was ideal, and I might think it was a clear mistake.

      5. 7

        Hey! I liked that the questions do touch on concepts I think are fundamental to software design, and force you to think about all of them in concert (and decide which supersede which on a case-by-case basis). I’ve learned from the explanations, as well as validated my own understanding.

        With that said, and even keeping in mind your defense about objectivity both in the quiz and in this comment, I’d like to comment: please reconsider the use of the word “objective”. A subjective question (one based on personal experience) does not change its subjective nature regardless of how well-founded an answer is, or how unanimous people are in answering the question. One doesn’t necessarily need to take away merit from an argument just because it’s subjective, if that helps relieve the pressure of using “objective”.

        Subjective questions are typically operationalized, that is, an operational definition is given which can be used as an objective question. E.g. the subjective “what soda brand is best” could be given an operational definition of “what soda brand will be voted the most when a population is asked ‘what soda brand is best’” – the answer is literally the count of votes, regardless of how people reasoned out their vote. If we think the operational definition is well-posed, we can try to form some answer to the original subjective question, but it doesn’t mean the original question is now objective – we simply have a proxy we can objectively answer.

        I feel the same about “which code is best”-style questions (although the entire body of software design is so large that I’ll admit I can’t – and wouldn’t dare to – prove it to you from first principles that it is subjective). Are we sure we’re answering “which code is best”? Or are we answering an objective proxy, such as “which code patterns are employed by successful software teams, where success is measured as X” for some operational definition of success (which btw I don’t even think the quiz attempts to do, except possibly in a very implicit way)? Answers to the latter can be valuable I think, of course, but they don’t change the nature of the original question.

        This absolutely isn’t meant to take away merit from the principles behind the quiz. I think there are deep principles behind it, and I think software designers should take the time to digest this instead of reflexively and defensively disagreeing if they get it wrong. Cynical takes would be “this is not objective, therefore this is wrong” or “this is subjective, so I’ll only listen to it if I agree”, both of which I wouldn’t condone. But I think the world needs more precision and nuance around language, not less – there’s already enough confusion between fact and opinion, subjective and objective.

        IMO don’t just normalize “this is objective, therefore listen to me”; also normalize “subjective questions are okay; listen to this subjective answer because the principles are well-constructed”.

        Then again, this entire piece (as you can probably tell from my constant “I think” hedging) is itself subjective, so take that as you will :)

        1. 1

          Hi igemnace,

          Would a fair summary of your core argument be “It cannot be objective because the actual value of the software is not objective?”

          I wish I knew my philosophy well enough to name what philosopher you’re taking after. :)

      6. 6

        These answers and the way they’re framed show very little regard for the subjectivity of context or much justification for the “objectively” correct answer other than software design principles that are just some other guys’ opinions: in other words, for different ways of knowing.

        For example, you claim in the XOR explanation that the correct answer is the one in which the programmer does not have to think as much. Now, I agree with reducing mental load where possible and practical, but what even constitutes mental load is contextual, and so is the applicability of the principle. Sometimes you want the gory details spelled out in front of you, and sometimes abstracting a one-liner out to a function (let alone two) is just silly.

        Or for the rectangle one, I disagree that having more constructs (Point, Dimension) is universally better. That means when I want to construct a Rectangle I first have to construct two other Things first, which could be less desirable for performance reasons, and also just strikes me as gross.

        So maybe this makes me the first person to dispute that your answers are better, let alone objectively correct. Somehow, I doubt it.

      7. 6

        None of these questions are “ironclad”. You seem very convinced of yourself.

        I honestly wonder how anyone can walk away from that feeling they “learned” something.

        I feel you have some good points, worth making, but these are not the examples you want. It’s 5/5 nonsense.

      8. 8

        Calling any of this “objective” left a foul taste in my mouth. def new_rectangle(x, y, w, h) has been fine for the past 300,000 years of human existence.

    3. 24

      Whenever I see this kind of post (which is a decent one, by the way), I really want to encourage people to read what may be the best book on debugging I’ve ever read (and I’ve read a lot of them).

      Dave Agans’ Debugging gives nine rules and expands on them with stories and examples to illustrate the reason for the rules. It’s a short book that pretty much every software developer should read. The rules are:

      1. Understand the system
      2. Make it fail
      3. Quit thinking and look
      4. Divide and conquer
      5. Change one thing at a time
      6. Keep an audit trail
      7. Check the plug
      8. Get a fresh view
      9. If you didn’t fix it, it ain’t fixed

      One thing that is frequently left off of stuff like this submitted story is keeping an audit trail. Every time I start debugging something nasty (and I work on a compiler, so things get nasty) I take notes in my notebook or, if cut and paste is needed, I fire up a notes.org file and put stuff in there after making a directory to store all the files I’m using for the investigation.

      After about 25 years of doing this stuff I haven’t found anything to really add to Agans’ rules, nor have I felt the need to think any are unnecessary. They are really good steps to keep in mind, and it’s very much worth it to read his book.

      1. 4

        To add: after you fix the bug, going back to the audit notes can help you find a wealth of open source contributions in your tooling and dependencies.

        Checked docs for the right thing but didn’t find what you needed? That’s a doc commit.

        Hit an error message that could be improved? That’s a code commit or a UX issue.

        The worse the bug, the longer the debugging session: the more opportunities to find contribution gold.

      2. 2

        One thing that is frequently left off of stuff like this submitted story is keeping an audit trail. Every time I start debugging something nasty (and I work on a compiler, so things get nasty) I take notes in my notebook or, if cut and paste is needed, I fire up a notes.org file and put stuff in there after making a directory to store all the files I’m using for the investigation.

        I often put my audit trail when debugging to a chat channel. It provides an opportunity to rubber duck debug, and possibility, someone might notice and provide a fresh view.

        1. 4

          Comments on tickets are also great.

          I feel like I’m just different than most though. Nobody but me likes to write things down.

      3. 2

        Love this advice, especially “Make it fail” and “Keep an audit trail”.

      4. 1

        Came here to recommend that same book. It’s an excellent resource

      5. 1

        I cannot overstate how useful “audit trails” are for understanding hard problems/bugs.

        It’s so easy to get lost in details, to forget previous test results, or even to forget why you were testing something in the first place.

    4. 21

      I was pretty skeptical. Turns out I was correct be.

      Notably, it doesn’t support: structs; enums / unions; preprocessor directives; floating point; 8 byte types (long/long long or double); some other small things like pre/post cremements [sic], in-place initialization, etc., which just didn’t quite fit; any sort of standard library or i/o that isn’t returning an integer from main(); casting expressions

      So, not C. Or even really anything close to C. Still a fun exercise, but it’s not C. Maybe it could count as C circa 1980.

      Compilers have a reputation for being complex—GCC and Clang are massive, and even TCC, the Tiny C Compiler, is tens of thousands of lines of code—but if you’re willing to sacrifice code quality and do everything in a single pass, they can be surprisingly compact!

      Pascal was a single pass compiler and it supported much more than this does. The first publically available close-to-C compiler that I know of from the late 1970s was single pass and supported pre/post-increment expressions and basic preprocessor directives. It did not support everything, lacking stuctures and floating point support as well (part of the reason for the lack of floating point support was that it was not standardized at the time). It lacked support for enumerations since they did not exist in C then. It was roughly 2000 lines of C code (including comments) and could compile itself.

      The compilers mentioned as massive are that way because they support stuff that people want or need, so I’m not sure if this was said with tongue planted firmly in cheek.

      And “sacrificing code quality” is doing a lot of work in that sentence.

      1. 20

        The point of the compiler is to be something interesting to learn from, not a standards-compliant compiler. I picked 500 lines up front, and this is what fit. I definitely think the majority of C features (maybe minus the preprocessor since that’s famously corner-casey) could fit in 1k lines of Python, but that wouldn’t be as approachable for a blog post.

        1. 12

          My contention is that the title makes it sound like a C compiler when it is not, in fact, a C compiler.

          I’m fine with the effort. I’m sure it was a fun exercise. It looks like it was. And the language certainly is a reasonable subset of C. But it’s not the C that the title makes it seem.

          1. 3

            I thought it was a C/C++ compiler.

      2. 2

        It’s a little bit bigger than Small C https://en.m.wikipedia.org/wiki/Small-C but not much

        1. 3

          Well Small C is a few thousand lines of code, so… :-)

          I think a better comparison is c4 (https://gitee.com/yenmuse/c4/blob/master/c4.c), which implements more than my compiler, including a custom VM, in ~500 lines, albeit some of them are pretty long.

          1. 3

            Well Small C is a few thousand lines of code, so… :-)

            The first Small C was written in Small C and was, in fact, the compiler I referred to. It’s original version is only about 2000 lines. Given that it was written in the subset of C that it supported (which was very reasonable given the time) and it targetted 8080 assembly, 2000 lines is pretty good.

          2. 1

            Er, yes :-) I meant size of language rather than size of code, tho I think it might be hard to fit a Python interpreter and c500 onto an 8 bit micro - could it run in MicroPython on an Arduino?

        2. 1

          The original version of tcc was an IOCC entry, a self-hosting C compiler. It fitted on a single screen, though with multiple statements per line and single-character names for all variables. I think the pre-obfuscated version was close to 500 lines.

          I think it skipped a lot of error checking. If you fed it invalid C code, it would generate something (much like other C compilers, but for a larger set of invalid inputs).

    5. 12

      I think a post with some background on this would be appreciated.

      1. 13

        There’s a story on Wired from a couple days ago: https://www.wired.com/story/apple-csam-scanning-heat-initiative-letter/

    6. 18

      I’m rarely more than a few steps away from a notebook and pencil. In my experience, the act of writing the notes that matters far more than reading them - most of my notes are never read, but the act of writing focuses my thinking in a way that often helps me remember.

      1. 6

        I have tried many things, and paper and pen/pencil have always won out.

        the act of writing focuses my thinking in a way that often helps me remember.

        This is why I find I never have to search my notes.

      2. 3

        This is me. Personally I think there’s a lot of value in shifting my focus away from the computer.

        That said, I have reached a point at work where I take a lot of notes that I do need to revisit and I’ve been wanting a better structure. It’s been the excuse I needed to finally cave and order a Remarkable tablet.

        I do mean “excuse” - I love the e-ink display on my ebook reader and I’ve been coveting the Remarkable since I first saw one about five years ago, but haven’t quite managed to justify the cost where pen and paper would do. It’s on its way so we’ll see whether it will offer enough over physical paper to keep me.

        1. 2

          I’ve been using a Remarkable for years now and it’s one of my favorite purchases ever.

          It’s expensive, but well worth it for people who strongly prefer pen and paper workflows with the disposable income to afford it.

    7. 14

      Some counterpoints, based on me doing this for the better part of 25 years now.

      Duplication of Knowledge is the Worst

      Sort of. What’s arguably worse is abstracting something away before you know what the abstraction is. “Copy pasta” is generally bad, but it’s acceptable to do it for a little while until you’ve figured out what kind of pattern you’re working with.

      TDD Is Legit, and It’s a Game-Changer

      I think the evidence is clear that this is not the case. My experience with it was that I wrote a lot of useless and brittle tests. What I do think is important, however, is to think about how to test something before you write it, and sketch out that test, possibly writing it. At least working out the tests first is a really good idea. Write the tests then make them pass? Not necessarily.

      Evidence is King

      It depends. Often rhetoric will do you a lot better, even if the evidence is in your favour. You can dump a ton of evidence in a code review, for example, but if you can’t summarize it or help someone make sense of it, they may ignore it or twist it in an unexpected way. (This can happen anyway, so don’t get too attached.) Having the facts to make your case is often crucial, but communicating effectively can be more powerful than the evidence. (Sad, but true.)

      1. 5

        communicating effectively

        I think that’s actually one of the most important skills to learn as a developer. It doesn’t just help explain concepts to co-workers, customers or managers but it will also improve your code, as code is communication. From human to machine, but also human-to-human, as code will be read and re-read over and over.

      2. 3

        What’s arguably worse is abstracting something away before you know what the abstraction is.

        I hear this a lot, and I have to push back. The crucial difference between duplicated logic and a bad abstraction is that you don’t know where duplication exists. It’s inherently non-local, and the ability to reason locally has to be one of the most important goals in all of software engineering.

        At least with a bad abstraction, you know it’s there because you can do a “find all references” / grep. Then, to unwind the abstraction, you just replace it with its implementation, and then you can re-abstract it however you want. Basically, the cost of a bad abstraction isn’t all that bad, and the cost of unwinding it also isn’t that bad. The cost of duplicated logic is absolutely terrible in comparison.

        1. 3

          That depends on where the logic lives. Two functions just below one another is typically fine. Clearly documenting the fact that there is a “missing abstraction” in a TODO is also fine, assuming a reasonably disciplined dev team.

          A nice hack I picked up at my previous job for such cases is to put a special “identifier” in the comments of all the places that need fixing, like “duplicated-invoice-country-selection” or something equally greppable. Then when it’s time to refactor you can do it in one go. Bonus points for making a backlog ticket for it.

        2. 2

          The thing with bad abstraction I tend to see is that the “locality” is semi-imagined, because how it’s implemented is frequently implemented through big amounts of control flow and I do prefer very similar code in different places than to go through code and figure out if it’s relevant in the specific case.

          So in a way it’s going against what you want to reach with code being local. On the other hand if you have a bit of duplicated code you have a good overview with everything local to the function.

          This is not to disagree with you on the theoretical level, but more that in certain contexts you still don’t really have a benefit.

          Of course this also depends on how much duplication there is, and how it’s done. But that’s what I mean with context. It’s just that over my career too often “improvements” have been done in the name of abstraction and DRY that lead to everything ending up in on giant ball of mess, because the code that was easy to understand with just minor differences was combined and/or “abstracted” through control flow and special variables and options to decide what is called and how. Of course then the next step is to split up that giant ball into multiple functions, but here the next problem emerges and these functions are barely (or simply not) useful independently of the rest, so you have a giant ball that is now non-local, because every time you debug it you have to go throw a dozen of functions. But at least your code linter is happy.

          So while in general it’s a good idea to abstract, deduplicate and also split up functions I think in the context of an experienced programmer giving advice to a new programmer they all can do quite a bit of harm, even when I agree with them, because I know how its meant and especially how it isn’t.

          1. 2

            This is not to disagree with you on the theoretical level, but more that in certain contexts you still don’t really have a benefit.

            Well, my true opinion is that in software, we’re simply always stuck between a rock and a hard place, and there is no optimal approach anywhere. i.e. no silver bullets. So I agree with that.

      3. 2

        Red-green-refactor TDD is absolutely legit, and was a game changer in my career. There are places where it’s not appropriate though: if you don’t know what you’re building, but are simply exploring the design space. For example, if you’re building a game you want to make sure you have complete design freedom in all but a few corners of the code base. Conversely, if you’re building a graphics engine, there are probably important invariants that you want to always enforce.

        1. 1

          I used TDD back when I was working on cable box software. It was good for data structures. Integration testing, however, was incredibly more useful for the product. It caught more issues than unit tests ever did.

          I recently used it when writing a compiler pass. I had the control flow and logic of the pass figured out, but there was a supporting data structure that I needed to make it work. TDD seemed like a good thing to use.

          I thought I had it all figured out, but I didn’t and I only realized it after I had written an exhaustive test suite and implementation that passed those tests. (Integration tests caught the problem.) That was ditched and everything was rewritten. Well, as fate would have it, there was another problem and all the tests and implementation were, again, thrown away.

          Finally I sat down, designed the structure and basically proved it was correct, without writing tests but keeping in mind what the tests would be. I worked through it, coded the structure, then wrote all the tests. The tests were more documentation than anything else, although they did test every possibility. This time, it worked.

          My personal post-mortem on the work was that TDD didn’t really help me and did not lead me (directly) to a good design. In fact, not coding helped more than coding: my time would have been better spent writing a thorough design document. I spent a lot of time writing what proved to be useless tests (and there were many of them). Am I “blaming” TDD? No. I think it’s a useful technique and have used it here and there. But my experience with it and all the mixed feelings I’ve heard about it over the years lead me to conclude it is occasionally useful and very far from a game changer.

          1. 1

            I’m afraid the situation you described does sound like exactly the condition I mentioned - the first implementation was effectively a prototype. TDD isn’t going to help with discovering the design, only with implementing it. It is not a silver bullet.

            As a side note, TDD is orthogonal to unit vs. integration testing. You can do TDD with any test type which uses assertions. There’s nothing wrong with writing even acceptance tests in a TDD fashion.

    8. 1

      This is a trip down memory lane because I had a laminated version of the original drawing on my wall as a kid. I had no idea there was a vector drawing of it!

      Also, you should avoid putting long comments in the story description. Best to put it as a comment.

      1. 1

        Awesome you had this! And thanks for the advice.

    9. 4

      It’s cool to see eglot included, but it has been easy to use it in older versions of Emacs for as long as it’s existed, since it’s a single file with no dependencies. A lot of people seem excited about tree-sitter and wayland, but those don’t really seem relevant to any of my own personal use cases. The thing I am looking forward to using is seeing what kind of wild hacks people end up doing with the native sqlite support.

      1. 2

        A lot of people seem excited about tree-sitter and wayland, but those don’t really seem relevant to any of my own personal use cases.

        Same. I don’t care about Wayland at all (I don’t use Emacs in graphical environments) and I’ll try tree-sitter but I have this nagging feeling it’s over-hyped.

        I am also happy about the inclusion of eglot, although it won’t change much for me. I might start using use-package now that it’s not a dependency.

        Better long line support seems long overdue. It’s a bit of an embarassement, to be honest.

        1. 3

          and I’ll try tree-sitter but I have this nagging feeling it’s over-hyped.

          If you are not using exclusively lisps, then syntax-aware selection (explained, e.g., here https://zed.dev/blog/syntax-aware-editing) is huuuuge. It’s one tiny feature, but for me like 60% of non-trivial edits are funneled through it.

          TreeSitter gives you pretty great tools to be able to implement syntax-aware selection, but, if I understand correctly the actual feature isn’t in the core Emacs. The closest thing to what it should be is probably https://github.com/magnars/expand-region.el/pull/279, but that’s an unmerged PR to a third party package. (there are also things like combobulate, but I think they go a bit too far, syntax-aware selection is 80% solution at 10% UX complexity).

          1. 4

            If you are not using exclusively lisps

            Right; that’s the impression I get from tree-sitter mostly; the rest of the world is finally catching up to what we’ve been doing with paredit since *checks comment header of paredit.el* 2005, apparently? =D

    10. 8

      I’ve used every 29.1 pretest and rc across Windows, Mac, and Linux (Ubuntu, Debian, Manjaro, and Pop!_OS) and I’ve encountered no majors issues in the last few months. I’ve run these commands more than I’ve ever done before so maybe I will remember them this time lol

      sudo apt install build-essential autoconf automake texinfo libgtk-3-dev libxpm-dev libjpeg-dev libgif-dev libtiff-dev libgnutls28-dev libncurses-dev libjansson-dev libgccjit-10-dev ./configure –with-tree-sitter

      make -j16

      sudo make install

      Really happy to see this release. Can’t wait to see all the new modes adopting tree-sitter. Haven’t been this excited about an Emacs release in a while! Looking forward to the day when this is the default Emacs on most Linux Distros but that will take a couple years

      1. 5

        Emacs has made incredible progress since the late 2000s. At the time, you had to manage package libraries and versions manually. Plus, there was a lot of glue code you had to copy-paste into ~/.emacs to get it all working. For example, setting up Emacs for Ruby on Rails was quite hard. The outcome was fragile and often broke with updates.

        With ELPA and use-package, everything has become much more streamlined, and as a consequence a very rich package ecosystem has emerged. My only complaint right now is that many packages still require a bit of ~/.emacs code to get up and running. Plus, documentation about how to mix and match packages to get IDE-like features is not great. The best source for that is, paradoxically, Doom Emacs. IMHO, this scares beginners away compared to VS Code, which is a shame.

        1. 3

          I still remember the day when I read Your Text Editor Is Malware, I removed all the Emacs packages and manually git cloned or copy-pasted them to get some illusion of safety.

          I guess the primary benefit by not using a package manager is getting a better understanding of their dependencies.

      2. 3

        I tried the build instructions here and they didn’t work on Debian 12. Just FYI, you need to use libgccjit-12-dev instead of libgccjit-10-dev.

        The point being that you probably shouldn’t blindly copy this in the hopes it will work.

    11. 2

      With a proper CI setup the cost of supporting all the tools is next to nothing.

      I think this grossly underestimates the cost of setting up and maintaining those CI workflows.

      Also, this must be the kind of embedded that uses a filesystem since the post is advocating for use of gcov. In many contexts, that’s not possible.

    12. 7

      When we log a message, do an HTTP POST, it shouldn’t be that hard.

      Based on the writing I don’t know who is saying this (the implementor or the customer), but no matter what type of software you develop, and no matter who says it, “shouldn’t be that hard” is always a red flag phrase.

      If it’s the customer who said it, the implementor should be scoping out the work and asking the customer questions that makes it clear it’s not such a simple task. Even I, a lowly compiler developer, could tell from the description of the setup in this post that network considerations are going to matter. When I see “log aggregation with an HTTP endpoint”, my immediate questions before knowing anything else:

      • How reliable does it have to be?
      • How often should it be logging (so how much traffic will there be)?
      • What authentication/authorization aspects are there?

      Were these asked of the customer? I have no idea. If they were not, then shame on the implementor and my general lack of sympathy to them. If they were asked, then I hope the author learned something and gets better at estimates.

      If it’s the implementor who said it, it’s not much different than above. The difference being the implementor didn’t know what they didn’t know, although lots of other people know it. So hopefully they have learned to ask around or reflect on something before making estimates.

      To me, this reads as a story of someone kvetching about their lack of experience without knowing they lack it, or one of those stories where one wants to say the job was bad instead of the effort put forth to do it. Maybe that’s harsh, but there is so much missing I can’t really find myself feeling sympathy for the author.

      1. 11

        I’m still waiting for CPUs that have to be paid per cycle. So basically you pay the CPU chip maker and then it downloads a certificate and then it’ll run for, say, 50 trillion cycles, after which the certificate is invalidated and a new one has to be bought, otherwise the CPU will not budge. That would be one of the purest forms of rent-seeking imaginable.

          1. 4

            I mean chips that you physically own yet still have to rent.

        1. 11

          Isn’t that more or less how IBM charges for their mainframe systems?

          1. 10

            That is exactly how z/OS hardware was billed (and probably in some cases still is). MIPS/MSU capacity billing is a thing there. It also (kind of) makes sense: you own a mainframe, you do most of your bulk processing end-of-quarter, which requires a lot of resources, so why not pay for that capacity when you use it (and not all the time).

            This also means that IBM hardware you have has theoretical processing power, it’s just locked until you pay for it :-)

            1. 1

              So pretty much “the cloud at home”? Do you pay for the hardware itself, or does it just get installed and you pay for use (with some fixed capacity I assume)?

              Also how does it work exactly, does the z machine have a permanent link to IBM, or does IBM just pull usage data (possibly in person) at some frequency like utilities?

              1. 2

                …or does it just get installed and you pay for use…does the z machine have a permanent link to IBM…

                Basically, yeah, from what I understood when I worked there.

                The IBM model of billing for computing is how this whole computing thing got started. The home computing revolution might just be blip in history where, for a short time, individuals owned machines instead of companies.

        2. 2

          Oracle literally did this years ago for some hardware JVM features.

    13. 4

      I find the modal popups slightly annoying, but I also find them to be easy to dismiss and not that much of a bother. (Perhaps my setup makes it easier than for others, I don’t know.)

      I don’t see a need to remove such links. At most, tag them. To be honest, I find the annoyance of modal popups on the same level as posts with an extensive number of interstitial images, and I don’t see a need to remove or flag those posts either.

      1. 3

        I do find the blog posts that include looping “meme” gifs from the office and just.. awful to read.. I hope this trend dies very soon.

    14. 7

      I tried using Fossil for my personal projects but discovered that the the following two Git features have become essential for my workflow: (1) partial file commits and (2) rewriting commit history. Fossil doesn’t have (1) and is specifically designed to disallow (2). As much as I liked the built-in bug tracker and wiki, I couldn’t use Fossil and went back to Git.

      1. 5

        We have Fossil repos at work and aside from the fact Fossil intergrates with almost nothing, these two things are huge pain points, especially lack of history rewrites.

        Most repos have been moved away from Fossil at this point.

      2. 4

        For (1), Fossil docs recommend stashing the changes and splitting them into patches:

        fossil stash save -m 'my big ball-o-hackage'
        fossil stash diff > my-changes.patch
        

        For each git add -p you’d call git stash diff instead. I agree it’s a bit less convenient, an interactive version would be nice.

        For (2), what’s your use-case? For me Fossil removes a lot of cases for which I would use history rewriting in Git. Note that Fossil also has amend, which changes history non-destructively - you can peek the previous state. As for lack of rebases, there’s a good write-up here.

        EDIT: you can also delete content when really needed.

    15. 11
      # Note that make uses := for assignment instead of =
      # I don't know why that is.  The sooner you accept that this isn't
      # bash/sh, the better.
      

      This was addressed in a comment there (that was made over 8 years ago), but seems to have been ignored: it’s one of many kinds of assignments in GNU Make. Given that the comment is that old and the Gist was apparently updated within the last couple of days, it’s a bit disappointing to see that bit about assignment is still there. And the fact that the SHELL is assumed to be a recent Bash is a letdown too.

      The difference between := and = is quite important: = is recursively expanded and is determined at the time of use. := is expanded at the time of definition.

      1. 3

        Given that the comment is that old and the Gist was apparently updated within the last couple of days

        (Apologies for the tangent, but…)

        I don’t understand how gists handle time. The gist was created on January 14, 2015 and then edited a few times that same day. As far as I can see, there have been no revisions since that first day eight years ago. (While I’m at it, the last revision reports a diff of “No changes.” I have no idea what a revision of no changes means either, but let’s set that aside and focus on time.) The most recent comment says it was made and edited “19 hours ago.” Nevertheless, when I first visited this gist this morning, it reported, “Last active 3 minutes ago.” A few hours ago, it said “Last active 44 minutes ago.” Now it says “Last active 52 minutes ago.” I checked each time, and each time there were no revisions or new comments reported. (Yes, it’s weird that I keep checking, but…)

        Does anyone know what counts as “active” here? There have been no new revisions and no new comments all day.

        I just updated the gist, and now it says “Last active 4 minutes ago.” I think the site is trolling me (us).

        1. 2

          I was going based on what it said on the Gist, but I rarely use Github and wasn’t sure how to check when it was last edited. Thanks for pointing that out!

          I was wondering if the “last active” thing was referring to the user, and not the Gist. Given the timeline and what I’m seeing now, it might be the user.

          1. 1

            I was wondering if the “last active” thing was referring to the user, and not the Gist. Given the timeline and what I’m seeing now, it might be the user.

            I had the same thought, but I don’t think that’s the answer either.

            If you check that user’s gists, they have lots of different reported times for creation or last active. There’s not one shared time for all the user’s gists. (Same for, e.g., my gists.)

            In some way the times are per gist, but I can’t see what they track. Maybe that code is simply broken. The gists part of GitHub seems pretty unloved to me in general.

        2. 1

          I have no idea what a revision of no changes means either

          I think that can happen when the commit message changes. No clue what it means for a gist.

    16. 3

      I think this idea needs to go back to the drawing board.

      1. 2

        Thanks for your feedback. Some details and suggestions would be helpful.

        1. 3

          This tool is a secure way to share passwords

          No it’s not, it’s a complete disaster. You encourage people to insert sensitive data (i.e. passwords) into a random website. You might[0] have implemented this complete client side, but there is no sane way to enforce that this is done always this way. Also it leads that people might believe they can put there secretes into a (maybe phishing) website.

          [0] I’m to lacy to check this right now

          1. 1

            but there is no sane way to enforce that

            .. that this can’t be changed for one specific user, after for example a compromised host

            I recommend using messengers like signal or matrix for such things. There is no easy user targeting and they have applications for everyone with a good track record.

    17. 3

      The uploader has not made this video available in your country.

      So much for that, I guess.

    18. 5

      It’s great to see someone writing this up, but a few comments:

      • The usual extension for preprocessed C is .mii, not .tu. Using this will let syntax highlighting work correctly.
      • The first figure with the pipeline is missing the assembler step. In clang, this is integrated, but historically it is not (and I think GCC keeps it separate).
      • Having the compiler and driver in a single binary and the linker in a separate one is historical: clang needed a gcc-compatible driver and already had a load of the argument-parsing machinery. There was an effort for a while to provide a universal compiler driver in the LLVM project, which would probably have been cleaner (to drive clang, flang, and so on), but it was abandoned.
      • cc is not just convenient, it is part of POSIX.
      • GNU Binutils is available on all of the listed platforms, but it isn’t the default on more than half of them.
      • The multi-file section is very *NIX-specific. Windows build systems typically make the opposite tradeoff (invoking the compiler with multiple translation units) because process-creation costs are higher (and, I think, the visual studio compiler will do some transparent sharing of common include processing if you do).
      • The authors of lld and mold might be surprised to learn that linking cannot be parallelised. The authors of LINK.EXE and mold would be surprised to learn that you need a full relink from scratch if a single input file changes.
      • The language detection section contradicts itself. In this example, the compiler (not the driver) is setting up the default search paths.
      1. 4

        The usual extension for preprocessed C is .mii, not .tu.

        GCC will use .i (traditionally for C) and .ii (for C++). .mii is used for Objective-C++.

        The first figure with the pipeline is missing the assembler step. In clang, this is integrated, but historically it is not (and I think GCC keeps it separate).

        Yeah, it’s separate in GCC. GCC’s code generator outputs assembly and then invokes as. This is actually a bit of a pain sometimes because not only do you need an assembler, but there are cases you won’t know the size of an instruction during codegen because the assembler might be able to change it.

        1. 4

          GCC will use .i (traditionally for C) and .ii (for C++). .mii is used for Objective-C++.

          You’re right. I tend to default to .mii because then it doesn’t matter whether the input it C, C++, Objective-C, or C++ for syntax highlighting to work.

          GCC’s code generator outputs assembly and then invokes as. This is actually a bit of a pain sometimes because not only do you need an assembler, but there are cases you won’t know the size of an instruction during codegen because the assembler might be able to change it.

          This is even more true with the Plan 9 toolchain (which Go uses), which expands pseudos that contain relocations at link time. Depending on the distance / address of the target, you may need 1-3 instructions on modern architectures to materialise the address and the Plan 9 linker picks the shorter sequence. RISC-V tries to do the inverse and emit the inefficient sequence in the compiler and then ‘relax’ it back by deleting instructions and updating all other label addresses, which causes a huge amount more (complex) work in the linker than any other modern architecture / ABI.

          1. 1

            RISC-V tries to do the inverse and emit the inefficient sequence in the compiler and then ‘relax’ it back by deleting instructions and updating all other label addresses, which causes a huge amount more (complex) work in the linker than any other modern architecture / ABI.

            We have examined this kind of approach in our linker work and it really seems RISC-V made a mistake here.

            1. 3

              it really seems RISC-V made a mistake here.

              This statement works in almost any context.

        2. 1

          Yes, .i can be found un the 7th Edition cc source

      2. 2

        Nit: c99 is POSIX, not cc.

        About parallel linking, that was in reference to using the historical linker, and lld and mold etc are mentioned later on.

        1. 2

          Ah, you’re right. I’m fairly sure cc was in POSIX 1997 but I can’t work out how to search that version.

          1. 1

            It was, but deprecated in favour of the c89 command. Putting the language revision in the command name seems like a mistake… https://pubs.opengroup.org/onlinepubs/007908799/xcu/cc.html

            1. 1

              Putting it in the name made some sense because you could detect c89 or c99 support by just checking for the file. C99 also introduced some breaking changes and so you generally didn’t want cc to compile with an unspecified dialect because that would break either old or new code.

      3. 1

        The authors of lld and mold might be surprised to learn that linking cannot be parallelised. The authors of LINK.EXE and mold would be surprised to learn that you need a full relink from scratch if a single input file changes.

        I interpreted that part to mean that the end-result of the linking operation is a single entity, as opposed to compilation, where every compilation unit can be done in parallel independently of one another.

        So it’s possible to link in parallel, but you still need to aggregate everything together in a single linked artifact. (This also seems to be strongly implied by the diagram for the linker chapter)

        1. 2

          That’s kind-of true, but there’s a lot of nuance. Most compilers do some form of separate compilation, but there’s a case to be made (especially on modern hardware) for doing whole-program compilation and some languages do. Modern C/C++ compiler have an option to do this, though they still build IR for translation units one at a time. Linking involves several logical steps that broadly fit into two categories:

          • Resolving symbols
          • Copying (the sections that contain) referenced symbols into the resulting output.

          Both of these can be done somewhat incrementally and this is actually what mold does: it starts running before all object code is available. Resolving symbols can be done as soon as the symbol definition is available and sections can be copied into the output eagerly if there is space reserved for them.

          1. 1

            but there’s a case to be made (especially on modern hardware) for doing whole-program compilation and some languages do.

            Absolutely. We are trying to get our customers to build with LTO on embedded projects because they are small enough that modern hardware can actually do the complex whole-program optimizations that were mostly dreamed about back in the 80s and 90s. (Although in practice, all you really need for the big benefit is to be able to get the whole call-graph into memory so you can inline effectively.) The largest programs for most customers will be maybe 100 compilation units with a total program size of about 1MB, with roughly 1000 functions. With current workstation hardware, that will easily fit in memory and can be analyzed quickly.

    19. 2

      I think I’m mostly in agreement with the summary here. My biggest beef with Agile are the true believers and the proselytizers who don’t allow for variation. Specifically, the interpretation of “working iteratively”. I think iterative creation of software is fundamentally important, but the forms it takes vary wildly and trying to nail it down is basically futile. Yet there are those that insist on two week iterations with a deliverable at the end of each one. Asserting such an approach is universal is demonstrably false, yet it persists.

      I don’t totally agree with the breakdown of the practices into bad/good/brilliant as outlined by the OP, but they are largely correct. I’d like to see another book on the subject, possibly by a different author, in a couple years since the book being reviewed is now almost 10 years old.