1. 3

    I’m missing Lehman’s laws of software evolution, at least the first two:

    (1974) “Continuing Change” — a system must be continually adapted or it becomes progressively less satisfactor.

    (1974) “Increasing Complexity” — as a system evolves, its complexity increases unless work is done to maintain or reduce it.

    1. 1

      re: conway’s law

      i think you can work around this to some extent:

      • have more components than people (an order of magnitude or so?)
      • avoid “utils” and other catch-all dumping grounds
      • have your components be searchable

      you’re still going to get libraries/tools/etc. that follow org structure, but you can get a lot outside of that structure too, and that’ll be more reusable

      1. 2

        Why work around it? Isn’t the purpose of Conway’s Law to accept the inevitable rather than fight it?

        FWIW I’ve worked in an environment that follows your suggestions and it still followed Conway’s Law.

        1. 1

          Yes. This goes beyond software, too. The way to exploit Conway’s law is to shape the organisation after the system you desire to build. This implies things like smaller, cross-functional teams* with greater responsibility (in order to get less coupling between system components). That way you maximise communication efficiency.

          * Lean people would advocate that the team should be co-located too. The idealist in me still clings to the Microsoft research that showed org chart distance mattered orders of magnitude more than physical distance.

      1. 1

        I find that naming things is hard when I don’t (yet) understand what I’m building - but easy when I do.

        Does this mean that domain understanding is hard and the difficulty with naming things is a side effect?

        1. 2

          That’s one way to interpret the evidence. Another is that the difficulty of naming things doesn’t change, only your perception of it. In other words, when you know the domain well, you’re good at fooling yourself that you’ve come up with great names, when in fact they may be no better than before.

          Not suggesting anything either way, just dropping in with a reminder of how not receiving a signal is not evidence of there being no signal.

        1. 3

          I get the feeling that this is a really good way to describe to crowds of non-computer people what it is I do all day and why it’s so hard. Sure, there are many other aspects of the job, but this is one that’s not talked about a lot in the mainstream, yet should be easy enough to understand. And the ideas of hair-splitting generalise well to some other aspects of the craft.

          1. 5

            This is exactly what used to be known as modularity. Module boundaries can absolutely cross technologies.

            For the classic reads on the subject, check out David Parnas’ papers on modularity and designing for extension and contraction (the original names for these things.)

            1. 2

              “Designing Software for Extension and Contraction” is awesome, I’ve read it!

            1. 2

              This is a pet peeve of mine, but any prediction/forecast like this is useless without sufficient detail to make it falsifiable. What does one mean by “the year of the ARM workstation”? What market share? In which countries? What does “TypeScript broadly seen as an essential tool” mean? What evidence would one need to know that turned out to be false?

              Vague predictions without sufficient detail to be falsifiable tends to boil down to various fancy ways of saying “popular technology will keep being popular”. By putting it in concrete terms you force yourself to be honest.

              Additionally it’s a huge difference between 95 % and 55 % confidence. I’d recommend anyone forecasting tech here attach a confidence level to your prediction. That way you can be scored and we can get an aggregate crustacean prediction skill score too!

              1. 5

                It seems like they tried to coordinate the size reduction from the top, which is hard for many reasons, not least that size problems – like performance problems – can be characterised as death by a thousand cuts.

                I’ve read about an alternative approach to deal with that problem, which is to find out how much an oversize app costs (which they did) and then translate that down to an optimal price point of $/kB.

                Let every team sell size reductions on their little slice of functionality at that price, and you have everyone incentivised to reduce size in whichever way they reasonably can without accidentally incurring greater costs than just shipping the oversize application would result in. The localised decision making will make much better choices on what to optimise than any global control could. Teams that build large functionality will have better opportunities to reduce their size by a fraction than teams building tiny features. Just as it should be!

                1. 1

                  Interesting approach. I wonder: is this much different from running perf stat to find out the number of instructions executed (and other cache-related metrics, I suppose) by a running process?

                  1. 2

                    If you’re using consistent hardware then yeah, that kinda works (you may end up sharing caches with other running processes…).

                    In a series of runs on different cloud VMs, though, while instructions will probably (at a guess, untested) be fairly consistent, cache misses will be less consistent:

                    Cache sizes may be inconsistent across VM instances.

                    GitHub Actions says it uses Standard_DS2_v2 Azure instances, and if you look at Azure docs (https://docs.microsoft.com/en-us/azure/virtual-machines/dv2-dsv2-series?toc=/azure/virtual-machines/linux/toc.json&bc=/azure/virtual-machines/linux/breadcrumb/toc.json) it seems this involves 4 different CPU models, across 4 generations of Intel generations, with differing L2 and L3 cache sizes.

                    You’re sharing hardware with other VMs, which can impact the processor’s caches, in particular L3.

                    Found this empirical research where they tried to measure cache sizes on cloud instances over time, and it’s noisy, especially for L3: http://www.cs.cornell.edu/courses/cs5412/2018sp/recitation_slides/cs5412-cache-weijia.pdf (slide 22). Seems like if it was just about CPU models you wouldn’t see that level of variation.

                    (I’ll update the article with this info at some point).

                    1. 1

                      Oh, this is very Interesting and relevant to a future project of mine. Thank you for doing the deep digging. I didn’t expect such a detailed response!

                  1. 5

                    I write scripts both in Python and Bash/ZSH but I feel like Perl 5 would be the sweet spot between those two languages for sys admin stuff.

                    I don’t know if Perl 6 serves that spot as well, but it may be an interesting choice for sys admins if it does.

                    1. 6

                      I think Perl 6 serves that spot well in many senses except one: it doesn’t come preinstalled with virtually every real operating system.

                      1. 5

                        Perl 6 has been renamed to Raku: https://www.raku.org/

                      1. 2

                        I’ve worked on projects which mandated use of various code formatters and linters, and without fail they’ve all had, in my opinion, a negative impact on code readability.

                        1. 19

                          My experience has been the exact opposite. In fact, I currently hold that collaborative development without mandated formatting will be filled with pointless arguments about silly details that an automatic formatter would just fix permanently. It’s like yet another thing no human (except for the few who decide the standard and author the tooling) should have to ever think about again.

                          I think Go perfected this approach from the beginning, and most others have followed suit, more or less. The longer a language has existed without de facto formatters, the worse the situation is. Of course, languages like python are a bit of a problem, since formatting a piece of python code is not exactly a deterministic problem. Black does a pretty good job though.

                          1. 5

                            I’m okay with all code formatters, as long as they don’t put record separators on the start of the following line. :D

                            1. 3

                              collaborative development without mandated formatting will be filled with pointless arguments about silly details that an automatic formatter would just fix permanently.

                              I used to think this too, but then I realised it depends on what the composition of your team is like. I’m currently in a team with very experienced engineers who basically always run issues through a mental cost/benefits model. We don’t have pointless arguments about silly details.

                              We tried using a linter and auto-formatter for a while, but we found that it only increased the time we spent on silly details. Sure, we got a great feeling of progress by fixing lint issues, but from a bigger perspective, it was just a game of chasing a lower lint count, and not providing any actual value.

                            2. 4

                              I feel link linters pull you towards a certain level of readability. Whether they pull you up or pull you down depends on where you started at.

                              After using a linter for a while, learning from the feedback it’s giving you, you end up writing code that already satisfies the linter before it gets linted. At that point, the linter only ever gives you false positives: when you’ve broken the rules deliberately because it makes the code better.

                              1. 2

                                I agree with your assessment. The problem arises when linters are run on the CI or a pre push hook and you can’t ignore them.

                                Currently fighting jsPrettier which is made to be as unconfigurable as possible. They call it opinionated. Not a fan.

                              2. 3

                                while I do think this is true of the RuboCop defaults and the AirBnB eslint config, I don’t think this is true of all code formatters/linters (I like PEP8 and gofmt, for example). Code formatters help to identify and keep consistent the values and styles of a community. The end result of this is that those communities, by nature of preserving their own values and styles, wind up excluding divergent values and styles, which … is the entire point. The entire point is to exclude differing styles. If you really don’t like a code formatter or linter on a project, that’s a hint that you either need to learn to adjust your practices to work in harmony with the people already on the project, -or- that the people working on the project don’t share your values and you might be better off working on different projects with people who have values more similar to your own.

                                1. 2

                                  I’d love to use good formatters/linters, but most languages simply lack them.

                                  So I rather use none than a bad one.

                                  1. 1

                                    I ageee that readiblilty is worse than for “ideal formatted stuff” but coworkers like it (being a team player = free brownie points for later) and it is nice to not have to futz about with layout cuz it will get mangled anyways.

                                    Acceptance leads to piece of mind (for me anyways)

                                  1. 3

                                    I used to find this invaluable, but I mostly used it to fix commands where I forgot to type sudo. Now I’m more familiar with my keyboard shortcuts (after switching to programmable keyboards), and I just jump back to the beginning of the previous command and type sudo before running the command again. Also, I switched to fish shell which has better autocomplete, so I make fewer typos.

                                    1. 3

                                      Also at least in Bash you have !! which resolves to the previous command, so sudo !! will execute the previous command under sudo.

                                      1. 1

                                        Huh, that would have really been good to know! Now I hit up and home, and type sudo, so I think that’s the same number of keystrokes.

                                        1. 1

                                          Trips me up on openbsd with ksh, where !! is a command not found.

                                      1. 4

                                        Many of these points are worked out in detail in Engineering a Safer World by Nancy Leveson. Book is available for free by the publisher on the author’s request. Highly recommended perspective.

                                        1. 1

                                          Thank you for this! I’ve been thinking about migrating to a different provider but what’s held me back is that I would like to configure my new server(s) slightly more automatically. I use Ansible at work but hate it. I’ve tried getting started with CFEngine3 which is brilliant but I never quite get to a point where it’s the thing I reach for.

                                          A plain Makefile is actually fairly genious.

                                          1. 1

                                            I switched from Cfengine to a shell script for my Raspberry Pi homeserver and it feels good enough.

                                            I recently i decided to use simple disk images as backup strategy. Not sure yet, if I’ll keep that.

                                          1. 3

                                            I very nearly switched my Linux servers over to Fedora 32, as it felt reasonably clean and coherent for a mainstream distro (available by default with most VPS providers). But then I ran into an SELinux issue which cost me hours because many SELinux related errors apparently don’t point to SELinux being the cause in any way, they just come up as something generic like ‘permission denied’. It just feels very bolted on rather than nicely integrated. I don’t mind configuring things, but the lack of discoverability just killed it for me unfortunately.

                                            1. 3

                                              SELinux is always the first thing I disable after installing a CentOS server. It’s just too much of a departure from the standard *nix permission model and its random errors (well, not random of course but they seem that way when looked at from a standard *nix point of view) make no sense to me at all.

                                              You’re totally right, it feels bolted on. None of the standard tools can even manipulate its “special” permissions, so it feels like a parallel universe. Of course, standard *nix permissions still apply, so you need to know both these universes to effectively manage a server.

                                              1. 2

                                                I rather pick CentOS for my servers instead of Fedora. The 8-month (or so) upgrade cycle is a bit too much for servers - some of my past servers have had longer uptimes than a desktop/laptop distro support cycle.

                                                1. 1

                                                  Have you considered that it might be by design a security feature does not advertise itself to what it thinks is illegal access?

                                                  1. 2

                                                    The problem is that it doesn’t advertise itself to legitimate access, either. I can’t remember the last time I’ve seen a machine with properly-configured SELinux policies, even in a corporate setting. The tooling and the documentation are so bad (e.g. a while back Fedora’s docs started by telling you to install a package that didn’t exist anymore!) that lots of people just give up, set its policy to permissive and get on with their lives.

                                                    I don’t recommend it, either, but I completely understand why it happens. I picked up SELinux at a former workplace a while back, but not by reading the docs, there are no useful docs. Someone showed it to me and gave me their cheatsheet. They’d picked it up at a former workplace, too, from someone who worked at Red Hat.

                                                    1. 1

                                                      Not particularly no, I’d prefer to know how my own server is working and exactly where to look when things go wrong.

                                                  1. 6

                                                    This article is generally good, but misses one critical point. It leads with the problem statement:

                                                    […] sometimes, you may stumble upon a very hard code review. […] So here’s the dilemma:

                                                    • Do you try to spot obvious flaws and get it merged, hoping nothing slips through the cracks of your imperfect knowledge?

                                                    • Or do you block the merge until you’ve tested extra-thoroughly the change against all scenarios you can imagine, risking the deadline?

                                                    And then goes on to answer it based only on personal opinion. That seems like an ineffective way to approach the problem. After all, the question is about an economic tradeoff, and economics have a thing or two to teach us here.

                                                    The way to do approach this type of problem is to estimate the cost and probability of failure, and then only review as much as is necessary to balance this cost of failure with the cost of the time spent reviewing.

                                                    If you find it hard to do that sort of calculation, I strongly recommend reading How To Measure Anything which goes through how to properly estimate, decompose, and run monte carlo simulations on problems to get accurate 90% confidence intervals and probabilities.

                                                    The points mentioned in the article are presented as “this is what to do when you are asked to review a difficult change”. However, they’re not hard and fast rules on what to do with difficult changes. They are ways to reduce the cost of reviewing, to the point where it is economically sensible to review more and thus reduce the risk of failure further. The ones I think are the most important are:

                                                    • Ask to review smaller chunks at a time. Split the change up in a way that keeps the system working throughout, and integrate one little piece at a time. (Consistent with lean/agile/devops practises.)

                                                    • Ask about how you would know that the code works. Ideally, this would lead to automated tests, but if not, at least the author can supply one or more test plans/scripts for manual testing. (This in and of itself tends to reveal errors with 0 effort spent on reviewing…)

                                                    1. 2

                                                      I mostly agree with you, but ask yourself: From the given information, can you really estimate the risk involved? Monte Carlo simulation do not help if you don’t have reliable data. Garbage in, garbage out. So when deeply in doubt I prefer rules of thumbs and general advice.

                                                      What the article is lacking, are examples of cases where the author has applied these recommendations and how it went. Either the author does not have any or for some reason does not want tell them.

                                                      Also thank you for the book recommendation. I will definitely check it out.

                                                      1. 1

                                                        From the given information, can you really estimate the risk involved?

                                                        Yeah, that’s a tough problem. Some approaches are suggested in the book I recommended, but I also know people like Taleb argue rather convincingly that some of those approaches might not be fruitful either. I’m not yet entirely sure where to stand.

                                                    1. 8

                                                      I like having log levels, but I tend to find the conventional syslog-defined hierarchy both overly complex and too one-dimensional. In general, I really only use logging at two levels:

                                                      • Effectively “INFO” level, in which I want to log discrete events in the system and their result, preferably with a unique and traceable event ID. Generally where I want production logging most of the time. (Incidentally, this level is where event-oriented frameworks like Honeycomb really shine for me.)

                                                      • Effectively “TRACE” level: log everything that’s going on so I can trace program flow. Useful for development, or when trying to understand what the hell is going on during a particularly bad incident, but often heavyweight or slow.

                                                      One innovation I’ve seen a few times, is to define multiple “trace” levels for different components of a system so that you can get more granular logging for specific code paths. E.g., I often want better logs to understand what’s going on with a particular task like DB queries, or a complex calculation, or whatever. I like being able to just log that component’s execution without producing huge logs from everything else in the program.

                                                      1. 2

                                                        Hm. Does not practically every logging solution out there support a “source” attribute (facility in syslog terminology) that solves the last thing you mentioned without creating multiple equivalent levels?

                                                        1. 1

                                                          I think that helps with filtering, absolutely!

                                                          However, I realize that the services where I see this help, are also ones in which managing at the component level helps at the log production stage. Either because the logging is verbose enough to slow down the service itself, or to overload the logging system or fill up local disks. However that wasn’t clear in my first comment :)

                                                          I can also see the argument that such verbose logging is itself a code smell, but I see the issue mostly in open source or proprietary code where I don’t own the code or have resources to maintain a fork. So I like getting the extra knob to term.

                                                          1. 1

                                                            I think most of the logging solutions I’ve used have allowed turning logs on/off dynamically based on source, too. Maybe that’s not as common, though.

                                                        2. 1

                                                          What a different experience to my own! I am laser focused on three, different!, levels:

                                                          • WARN: a part of a something broke in a way that is recoverable
                                                          • ERROR: a part of something broke in a way that is not recoverable
                                                          • FATAL: the entire thing broke in a way that is not recoverable… and it’ll just nip off and kill itself.
                                                          1. 1

                                                            I generally distinguish between “something I need to know about, probably sooner rather than later” and “something I can use to figure out problems if they occur”. I’ve never seen much value in distinguishing between warn, error, and fatal because I need to know about all of them and usually implement a fix sooner rather than later.

                                                        1. 3

                                                          intellects vast, cool and unsympathetic regarded this code with envious eyes and slowly and surely drew their plans against it:

                                                          This made me realise there are way too few H.G. Wells references in the stuff I write.

                                                          1. 4

                                                            Working with people who know what they’re doing.

                                                            That means hiring sysadmins to do the sysadmin work, not just saying “devops” like a prayer and hoping your intern can somehow secure your servers against the wild west that is the internet. Ops is a skill that takes years to hone, and you’re disrespecting the devs and the admins when you pretend they’re interchangeable. Devops was supposed to be sysadmins taking on the tools of programmers and programmers targeting the (new) platforms of admins, not a magic word that allows cutting headcount without losing capability.

                                                            It means having actual testers, not just saying “devs (or the office admin) are testers now” and praying that somehow a different hat will allow people to see bugs in code they wrote and tested to the best of their ability last week. Programmers make terrible testers, and you have no idea how happy your devs will be if you have even a single dedicated professional test engineer to work with them.

                                                            It definitely means CI, and probably CD if you’re delivering via HTTP.

                                                            It means listening to your seniors when they tell you something is broken and needs to be rijiggered, even if it’s broken in a way you can’t see. Contrary to popular belief, most programmers do not want to “gold plate” things, they just don’t want to work in the code equivalent of a workshop where there’s cables stretched across the room at waist-height, the floor is slick with grease, and there’s no shielding on the dangerous parts of the equipment.

                                                            1. 1

                                                              You have misunderstood the meaning of DevOps. The term refers to bringing the developers closer to the customers (the day-to-day operations of the product/service they are building) through a set of mechanisms, including continuous delivery, which gives you quick feedback on what you are building.

                                                              It’s a very common misunderstanding that DevOps should mean “use developers for ops work” but reality is not that simple (or stupid!)

                                                              1. 4

                                                                In my experience reality is, in fact, that simple and stupid.

                                                                1. 4

                                                                  while I agree completely with the intended “real” meaning of the term, I disagree on the last point.

                                                                  Reality is that stupid, and companies have and do very literally disband/defund their ops staff, on the basis that “With DevOps our devs can do Ops”.. this is often but not always combined with a Management-decided move to some combination of “cloud” computing and/or outsourcing every possible dependency to SaaS.

                                                              1. 2

                                                                All the points here so far are very good.

                                                                Fortunately, we don’t have to resort to guessing either: there’s been lots of research on this. It boils down to motivation. A motivated workforce leads to a great environment. Even better: there are concrete, proven steps to get there.

                                                                These steps are known by different names depending on who you ask, but some examples are Deming’s 14 Points for Management, lean production, and DevOps culture. What they have in common is they lead to autonomy, pride of work, security, mastery, purpose and those things that Pink brings up as components of intrinsic motivation.

                                                                Almost everything else mentioned in this discussion is a consequence of an intrinsically motivated workforce. When you have the steps in place, the rest tends to fall out of that naturally. The steps are necessary and frequently sufficient – they are what underlies all the other things people bring up.

                                                                The interesting thing is that just like with cognitive behavioural therapy, you don’t need to start out from a place of motivation to get to these steps. You can enforce these steps and this will lead to motivation!

                                                                I strongly suggest reading up on this – the world would be a better place of more people knew about it.

                                                                1. 1

                                                                  This is interesting – is there any reason for me to switch to it on an existing application? E.g. does maintenance get easier when using native browser functionality instead of the React DOM diffing?