1. 27

    This article and all other Medium articles are what’s killing the web.

    1. 2

      For special projects, I use a Fabriano notebook.

      However, I have recently (re-)discovered that having a super cheap spiral notebook increases my note-taking by a factor of 100x and that note-taking is a fantastic way to organize thoughts. I have since filled many of those. Something like 140-page Hilroy 1-subject notebooks.

      I’ve been using the Pilot G2 exclusively for over 10 years now. I buy them in bulk and always have 3 to 5 of them on me. I enjoy seeing them spread in other peoples hands wherever I go, so when someone asks for a pen I’m always the first to offer them one of mine and never ask for it back.

      1. 2

        The G2 is a fantastic pen. I prefer the 0.38 tip size becethe flows so well and takes just a moment to dry.

        1. 2

          Not an artist nor anything alike, but I simply love products by Fabriano. I always carry with me an A4 Notebook (Glued Long Side) with dots instead of lines. The 85g/m2 paper is perfect for the 0.1mm Uniball Pin. I’ve been using those for a long time now.

          Feel like everything looks better with this combination, specially mathematical stuff.

          1. 1

            Pilot G2

            So you may like this post

            1. 1

              Thanks for the tip! Pilot also makes a G2 Limited which is a fancier exterior but takes the same fills which should also work.

          1. 10

            I use Jekyll and Hugo but I’m constantly wrestling with them. Too many features. Too much magic. Too blog centric. They should be called “static blog generators”.

            I dream of a “unified theory of static site generation” in which it would be as easy to host a blog as any other kind of content using a number of more generic primitives that give the user flexibility without creating complexity.

            1. 2

              I’m in the same boat as you. Older stuff done in Jekyll, newer stuff done in Hugo. Feels like I’m only using 10% of the available bells & whistles, but have to fight the other 90% trying to inject themselves into my workflow. I’m considering moving everything to soupault, which conceptually seems much more interesting than the rest of the pack.

              1. 2

                I use Hugo for my website, made everything from scratch. What are the things you found that are too blog-centric?

                1. 3

                  That was about a year ago so I may be a bit rusty but basically I had a hierarchy of different kinds of contents + some cross cutting elements. I wanted control over how the different kinds of content are displayed in their index pages and needed multiple levels of nesting.

                  Hugo makes it easy to create lists of different kinds of content with the minimum flexibility required to produce a blog. But when I wanted more control over index pages, links to sub content and multiple levels of nesting I found it very difficult to use. I had to read a ton of resources, watch hours of video tutorial, and still couldn’t get it done exactly how I wanted it to be. I could have done it all in Django in 2 hours.

                2. 1

                  I too use Hugo. Can be a bit overwhelming, i admit.

                  But may I suggest to use https://github.com/kaushalmodi/hugo-debugprint in your layout, so that while in devmode you can see all your page varables at the bottom of the page.

                  Has helped me a lot.

                  1. 1

                    Thanks! I’ll give a try next time.

                1. 3

                  I think GPL is getting less popular in the industry in general. Of course you can install all of those packages back if you want them, at least for now.

                  1. 3

                    I wish more people would give Proxmox a chance. It’s a very capable hypervisor based on Debian with a decent web GUI.

                    1. 4

                      There are SO many hypervisors out there and everybody’s got a favorite horse in that race.

                      1. 6

                        The term ‘hypervisor’ has a very specific, narrow definition. Proxmox isn’t a hypervisor. In fact there are only a few hypervisors.

                        Wow, the Wikipedia page on hypervisors is somewhat wrong.

                        Looking at the “Classification” section, I would only call the “bare-metal” software listed there Hypervisors. bhyve and KVM are hypervisors, too, albeit weird fat ones.

                        Here is an easy to highlight error in this wikipedia page: VirtualBox is not a hypervisor product, full stop. I don’t know when they started calling that kind of software a hypervisor, much less a “type-2”… Is this the result of some marketing strategy?

                        Hypervisors run on bare metal. That’s part of the definition. VirtualBox is an application. (Albeit one that invokes some exotic CPU instructions…)

                        1. 3

                          This reminds me of the OS argument. Linux is not an OS it’s just a kernel. Ubuntu is not an OS either, it’s a distribution.

                      2. 3

                        xcp-ng1 is also pretty cool

                        1. 1

                          I haven’t tried using Proxmox at all yet, but I definitely want to! The web GUI does look pretty nice.

                          1. 2

                            We use it at work, after switching from VMWare. Medium-small company, we use it to manage telephony and networking servers/clusters. The sysadmins seem to love it.

                        1. 2

                          I had the non-phone version N810 and thought it was the best portable piece of tech ever made until I got the original iPod touch as a gift a few months later.

                          1. 2

                            I just pulled mine out but it doesn’t seem to be taking a charge. :-( The 770 still works, though!

                          1. 23

                            Oh dang another essay on empirical software engineering! I wonder if they read the same sources I did

                            Reads blog

                            You watched the conference talk “What We Know We Don’t Know”, by Hillel Wayne, who, also disturbed by software’s apparent lack of scientific foundation, found and read as many scholarly papers as he could find. His conclusions are grim.

                            I think I’m now officially internet famous. I feel like I crossed a threshold or something :D

                            So I’m not sure how much of this is frustration with ESE in general or with me in particular, but a lot of quotes are about my talk, and so I’m not sure if I should be defending myself? I’m gonna err on the side of defending myself, mostly because it’s an excuse to excitedly talk about why I’m so fascinated by empirical engineering.


                            One thing I want to open with. I’ve mentioned a couple of times on Lobsters that I’m working on a long term journalism project. I’m interviewing people who worked as “traditional” engineers, then switched to software, and what they see as the similarities and differences. I’ve learned a lot from this project, but one thing in particular stands out: we are not special. Almost everything we think is unique about software, from the rapid iteration to clients changing the requirements after we’ve released, happens all the time in other fields.

                            So, if we can’t empirically study software engineering, it would follow that we can’t empirically study any kind of engineering. If “you can’t study it” only applied to software, that would make software Special. And everything else people say about how software is Special turns out to be wrong, so I think it’s the case here.

                            I haven’t interviewed people outside of engineering, but I believe it goes even further: engineering isn’t special. If we can’t study engineers, then we can’t study lawyers or nurses or teachers or librarians. Human endeavor is incredibly complex, and every argument we can make about why studying software is impossible extends to any other job. I fundamentally reject that. I think we can usefully study people, and so we can usefully study software engineers.

                            Okay so now for individual points. There’s some jank here, because I didn’t edit this a whole lot and didn’t polish it at all.

                            You were disappointed with Accelerate: The Science of Lean Software and DevOps. You agreed with most of its prescriptions. It made liberal use of descriptive statistics.

                            Accelerate’s research is exclusively done by surveying people. This doesn’t mean it’s not empirical- as I say in the talk, qualitative information is really helpful. And one of my favorite examples of qualitative research, the Gamasutra Study on Crunch Mode, uses a similar method. But it’s far from being settled, and it bothers me that people use Accelerate as “scientifically proven!!!

                            1. Controlled experiments are typically nothing like professional programming environments […] So far as I know, no researcher has ever gathered treatment and control groups of ten five-developer teams each, put them to work M-F, 9-5, for even a single month, in order to realistically simulate the conditions of a stable, familiar team and codebase.

                            You’d be surprised. “Two comparisons of programming languages”, in “making software”, does this with nine teams (but only for one day). Some labs specialize in this, like SIMULA lab. Companies do internal investigations on this- Microsoft and IBM especially has a lot of great work in this style.

                            But regardless of that, controlled experiments aren’t supposed to be holistic. They test what we can, in a small context, to get solid data on a specific thing. Like VM Warmup Blows Hot and Cold: in a controlled environment, how consistent are VM benchmarks? Turns out, not very! This goes against all of our logic and intuition, and shows the power of controlled studies. Ultimately, though, controlled studies are a relatively small portion of the field, just as they’re a small portion of most social sciences.

                            For that matter, using students is great for studies on how students learn. There’s a ton of amazing research on what makes CS concepts easier to learn, and you have to use students for that.

                            1. The unpredictable dynamics of human decision-making obscure the effects of software practices in field data. […] This doesn’t hold for field data, because real-life software teams don’t adopt software practices in a random manner, independent from all other factors that might potentially affect outcomes.

                            This is true for every form of human undertaking, not just software. Can we study teachers? Can we study doctors and nurses? Their world is just as chaotic and dependent as ours is. Yet we have tons of research on how educators and healthcare professionals do their jobs, because we collectively agree that it’s important to understand those jobs better.

                            One technique we can use cross-correlating among many different studies on many different groups. Take the question “does Continuous Delivery help”. Okay, we see that companies that practice it have better outcomes, for whatever definiton of “outcomes” we’re using. Is that correlation or causation? Next we can look at “interventions” where a company moved to CD and see how it changed their outcomes. We can see what practices all of the companies share and what things they have different, to see what cluster of other explanations we have. We can examine companies where some teams use CD and some teams do not, and correlate their performance. We can look at what happens when people move between the different teams. We can look at companies that moved away from CD.

                            We’re not basing our worldview off a single study. We’re doing many of them, in many different contexts, to get different facets of what the answer might actually be. This isn’t easy! But it’s worth doing.

                            1. The outcomes that can be measured aren’t always the outcomes that matter. […] So in order to effectively inform practice, research needs to ask a slightly different, more sophisticated question – not e.g. “what is the effect software practice X has on ‘defect rate’”, but “what is the effect software practice X has on ‘defect rate per unit effort’”. While it might be feasible to ask this question in the controlled experiment setting, it is difficult or impossible to ask of field data.

                            Pretty much all studies take this as a given. When we study things like “defect rate”, we’re always studying it in the context of unit time or unit cost. Otherwise we’d obviously just use formal verification for everything. And it’s totally feasible to ask this of field data. In some cases, companies are willing to instrument themselves- see TSP or the NASA data sets. In other cases, the data is computable- see research on defect rates due to organizational structure and code churn. Finally, we can cross-correlate between different projects, as is often done with repo mining.

                            These are hard problems, certaintly. But lots of things are “hard problems”. It’s literally scientists’ jobs to figure out how to solve these problems. Just because we, as layfolk, can’t figure out how to solve these problems doesn’t they’re impossible to solve.

                            1. Software practices and the conditions which modify them are varied, which limits the generality and authority of any tested hypothesis

                            This is why we do a lot of different studies and test a lot of different hypothesis. Again, this is an accepted fact in empiricial research. We know it’s hard. We do it anyway.

                            But if you’re holding your breath for the day when empirical science will produce a comprehensive framework for software development – like it does for, say, medicine – you will die of hypoxia.

                            A better analogue is healthcare, the actual system of how we run hospitals and such. Thats in the same boat as software development: there’s a lot we don’t know, but we’re trying to learn more. The difference is that most people believe studying healthcare is important, but that studying software is not.

                            Is this cause for despair? If science-based software development is off the table, what remains? Is it really true as Hillel suggests, that in the absence of science “we just don’t know” anything, and we are doomed to an era of “charisma-driven development” where the loudest opinion wins, and where superstition, ideology, and dogmatism reign supreme?

                            The lack of empirical evidence for most things doesn’t mean we’re “doomed to charisma-driven development.” Rather it’s the opposite: I find the lack of evidence immensely freeing. When someone says “you are unprofessional if you don’t use TDD” or “Dynamic types are immoral”, I know, with scientific certainty, that they don’t actually know. They just believe it. And maybe it’s true! But if they want to be honest with themselves, they have to accept that doubt. Nobody has the secret knowledge. Nobody actually knows, and we all gotta be humble and honest about how little we know.

                            Of course not. Scientific knowledge is not the only kind of knowledge, and scientific arguments are not the only type of arguments. Disciplines like history and philosophy, for instance, seem to do rather well, despite seldom subjecting their hypotheses to statistical tests.

                            Of course science isn’t the only kind of knowledge! I just gave a talk at Deconstruct on the importance of studying software history. My favorite software book is Data and Reality, which is a philosophical investigation into the nature of information representation. My claim is that science is a very powerful form of knowledge that we as software folk not only neglect, but take pride in our neglecting. It’s like, yes, we don’t just have science, we have history and philosophy. But why not use all three?

                            Your decision to accept or reject the argument might be mistaken – you might overlook some major inconsistency, or your judgement might be skewed by your own personal biases, or you might be fooled by some clever rhetorical trick. But all in all, your judgement will be based in part on the objective merit of the argument

                            Of course we can do that. Most of our knowledge will be accumulated this way, and that’s fine. But I think it’s a mistake to be satisfied with that. For any argument in software, I can find two experts, giants in their fields, who have rigorous arguments and beautiful narratives… that contradict each other. Science is about admitting that we are going to make mistakes, that we’re going to naturally believe things that aren’t true, no matter how mentally rigorous we try to be. That’s what makes it so important and so valuable. It gives us a way to say “well you believe X and I believe not X, so which is it?”

                            Science – or at least a mysticized version of it – can be a threat to this sort of inquiry. Lazy thinkers and ideologues don’t use science merely as a tool for critical thinking and reasoned argument, but as a substitute. Science appears to offer easy answers. Code review works. Continuous delivery works. TDD probably doesn’t. Why bother sifting through your experiences and piecing together your own narrative about these matters, when you can just read studies – outsource the reasoning to the researchers? […] We can simply dismiss them as “anti-science” and compare them to anti-vaxxers. […] I witnessed it play out among industry leaders in my Twitter feed, the day after I started drafting this post.

                            I think I know what you’re referencing here, and if it’s what I think it is, yeah that got ugly fast.

                            Regardless of how Thought Leaders use science, my experience has been the opposite of this. Being empirical is the opposite of easy. If I wanted to not think, I’d say “LOGICALLY I’m right” or something. But I’m an idiot and want to be empirical, which means reading dozens of papers that are all maddeningly contradictory. It means going through papers agonizingly carefully because the entire thing might be invalidated by an offhand remark.[1] It means reading paper’s references, and the references’ references, and trawling for followup papers, and reading the followup paper’s other references. It means spending hours hunting down preprints and emailing authors because most of the good stuff is locked away by the academic paper hoarders.

                            Being empirical means being painfully aware of the cognitive dissonance in your head. I love TDD. I recommend it to beginners all the time. I think it makes me a better programmer. At the same time, I know the evidence for it is… iffy. I have to accept that something I believe is mostly unfounded, and yet I still believe in it. That’s not the easy way out, that’s for sure!

                            And even when the evidence is in your favor, the final claim is infuriatingly nuanced. Take code review! “Code Review works”. By works, I mean “in most controlled studies and field studies, code review finds a large portion of the extant bugs in reviewed code in a reasonable timeframe. But most of the comments in code review are not bug-finding, but code quality things, about 3 code improvements per 1 bug usually. Certain things make CR better, and certain things make it a lot worse, and developers often complain that most of the code review comments are nitpicks. Often CRs are assigned to people who don’t actually know that area of the codebase well, which is a waste of time for everyone. There’s a limit to how much people can CR at a time, meaning it can easily become a bottleneck if you opt for 100% review coverage.”

                            That’s a way more nuanced claim than just “code review works!” And it’s way, way more nuanced than about 99% of the Code Review takes I see online that don’t talk about the evidence. Empiricism means being more diligent and putting in more work to understand, not less.


                            So one last thought to close this out. Studying software is hard. People bring up how expensive it is. And it is expensive, just as it’s expensive to study people in general. But here’s the thing. We are one of the richest industries in the history of the world. Apple’s revenue last year was a quarter trillion dollars. That’s not something we should leave to folklore and feelings. We’re worth studying.

                            [1]: I recently read one paper that looked solid and had some really good results… and one sentence in the methodology was “oh yeah and we didn’t bother normalizing it”

                            1. 3

                              Hi Hillel! I’m glad you found this, and thank you for taking the time to respond.

                              I’m not sure you necessarily need to mount a defense, either. I didn’t consciously intend to set your talk up as the antagonist in my post, but I realize this is sort of what I did. The attitude I’m trying to refute (that empirical science is the only source of objective knowledge about software) is somewhat more extreme than the position you advocate. And the attitude you object to (that software “can’t be studied” empirically, and nothing can be learned this way) is certainly more extreme than the position I hoped to express. I think in the grand scheme of things we largely share the same values, and our difference of opinion is rather esoteric and mostly superficial. That doesn’t mean it’s not interesting to debate, though.

                              Re: Omitted variable bias

                              You seemed to suggest that research could account for omitted variable bias by “cross-correlating” studies

                              • across different companies
                              • within one same company before and after adopting/disadopting the practice
                              • across different teams within the same company.

                              I submit to you this is not the case. Continuing with the CD example, suppose CD doesn’t improve outcomes but the “trendiness” that leads to it does. It is completely plausible for

                              • trendy companies to be more likely to adopt CD than non-trendy companies
                              • trendy teams within a company to be more likely to adopt CD than non-trendy teams
                              • a company that is becoming more trendy is more likely to adopt CD and be trendier before the adoption than after adoption
                              • a company that is becoming less trendy is more likely to disadopt CD and be trendier before the disadoption than after

                              If these hold, then all of the studies in the “cross-correlation” you describe will still misattribute an effect to CD.

                              You can’t escape omitted variable bias just by collecting more data from more types of studies. In order to legitimately address it, you need to do one of:

                              • Find some sort of data that captures “trendiness” and include it as a statistical control.
                              • Find an instrumental variable
                              • Find data on teams within a company that were randomly assigned to CD (so that trendiness no longer correlates with the decision to adopt).

                              If you don’t address a plausible omitted variable bias in one of these ways, then basically you have no guarantee that the effect (or lack of effect) you measured was actually the effect of the practice and not the effect of whatever social conditions or ideology led to the adoption of your practice (or something else that those social conditions caused). This is a huge threat to validity, especially to “code mining” studies whose only dataset is a git log and therefore have no possible hope of capturing or controlling the social or human drivers behind the practice. To be totally honest, I assign basically zero credibility to the empirical argument of any “code mining” study for this reason.

                              Re: The analogy to medicine

                              As @notriddle seemed to be hinting at, professions comprehensively guided by science are the exception, not the rule. Science-based lawyering seems… unlikely. Science-based education is not widely practiced, and is controversial in any case. Medicine seems to be the major exception. It’s worth exploring the analogy/disanalogy between software and medicine in greater detail. Is software somehow inherently more difficult to study than medicine?

                              Maybe not. You brought up two good points about avenues of software research.

                              Companies do internal investigations on this- Microsoft and IBM especially has a lot of great work in this style.

                              and

                              In some cases, companies are willing to instrument themselves- see TSP or the NASA data sets.

                              I think analysis of this form is miles more persuasive than computer lab studies or code mining. If a company randomly selects certain teams to adopt a certain practice and certain teams not to, this solves the realism problem because they are, in fact, real software teams. And it solves the omitted variable bias problem because the practice was guaranteed to have been adopted randomly. I think much of the reason medicine has been able to incorporate empirical studies so successfully is because hospitals are so heavily “instrumented” (as you put it) and willing to conduct “clinical trials” where the treatment is randomly assigned. I’m quite willing to admit that we could learn a lot from empirical research if software shops were willing to instrument themselves as heavily as hospitals, and begin randomly designating teams to adopt practices they want to study. I think it’s quite reasonable to advocate for a movement in that direction.

                              But whether or not we should advocate for more better data/more research is orthogonal to the main concern of my post: in the meantime, while we are clamoring for better data, how ought we evaluate software practices? Do we surrender to nihilism because the data doesn’t (yet) paint a complete picture? Do we make wild extrapolations from the faint picture the data does paint? Or should we explore and improve the body of “philosophical” ideas about programming, developed by programmers through storytelling and reflection on experience?

                              It is very important to do that last thing. I wrote my post because, for a time, my own preoccupation with the idea that only scientific inquiry had an admissible claim to objective truth prevented me from enjoying and taking e.g. “A Philosophy of Software Design” seriously (because it was not empirical), and realizing what a mistake this was was somewhat of a personal revelation.

                              Re: Epistemology

                              Science is about admitting that we are going to make mistakes, that we’re going to naturally believe things that aren’t true, no matter how mentally rigorous we try to be. That’s what makes it so important and so valuable. It gives us a way to say “well you believe X and I believe not X, so which is it?”

                              Science won’t rescue you from the fact that you’re going to believe things that aren’t true, no matter how mentally rigorous you try to be. Science is part of the attempt to be mentally rigorous. If you aren’t mentally rigorous and you do science, your statistical model will probably be wrong, and omitted variable bias will lead you to conclude something that isn’t true.

                              Science, to me, is merely a toolbox for generating persuasive empirical arguments based on data. It can help settle the debate between “X” and “not X” if there are persuasive scientific arguments to be found for X, and there are not persuasive scientific arguments to be found for “not X” – but just as frequently, there turn out to be persuasive scientific arguments for both “X” and “not X” that cannot be resolved empirically must be resolved theoretically/philosophically. (Or – as I think describes the state of software research so far – there turn out to be persuasive scientific arguments for neither “X” nor “not X”, and again, the difference must be resolved theoretically/philosophically).

                              [Being empirical]… means reading dozens of papers that are all maddeningly contradictory. It means going through papers agonizingly carefully because the entire thing might be invalidated by an offhand remark.[1] It means reading paper’s references, and the references’ references, and trawling for followup papers, and reading the followup paper’s other references.

                              That’s a way more nuanced claim than just “code review works!” And it’s way, way more nuanced than about 99% of the Code Review takes I see online that don’t talk about the evidence. Empiricism means being more diligent and putting in more work to understand, not less.

                              I value this sort of disciplined thinking – but I think it’s a mistake to brand this as “science” or “being empirical”. After all, historians and philosophers also agonize through papers, crawling the reference tree, and develop highly nuanced, qualified claims. There’s nothing unique to science about this.

                              I think we should call for something broader than merely disciplined empirical thinking. We want disciplined empirical and philosophical/anecdotal thinking.

                              My ideal is that software developers accept or reject ideas based on the strength or weakness of the argument behind them, rather than whims, popularity of the idea, or the perceived authority or “charisma” of their advocates. For empirical arguments, this means doing what you described – reading a bunch of studies, paying attention to the methodology and the data description, following the reference trail when warranted. For philosophical/anecdotal arguments, this means doing what I described – mentally searching for inconsistencies, evaluating the argument against your own experiences and other evidence you are aware of.

                              Occasionally, this means the strength of a scientific argument must be weighed against a philosophical/anecdotal argument. The essence of my thesis is that, sometimes, a thoughtful, well-explained story by a practitioner can be a stronger argument than an empirical study (or more than one) with limited data and generality. “X worked for us at Dropbox and here is my analysis of why” can be more persuasive to a practitioner than “X didn’t appear to work for undergrad projects at 12 institutions, and there is not a correlation between X and good outcome Y in a sampling of Github Repos”.

                              1. 2

                                Hi, thanks for responding! I think we’re mostly on the same page, too, and have the same values. We’re mostly debating the degrees and methods of here. I also agree that the issues you raise make things much more difficult. My stance is just that while they do make things more difficult, they don’t make it impossible, nor do they make it not worth doing.

                                Ultimately, while scientific research is really important, it’s only one means of getting knowledge about something. I personally believe it’s an incredibly strong form- if philosophy makes one objective claim and science makes another, then we should be inclined to look for flaws in the philosophy before looking for flaws in the science. But more than anything else, I want defence in depth. I want people to learn the science, and the history, and the philosophy, and the anthropology, and the economics, and the sociology, and the ethics. It seems to me that most engineers either ignore them all, or care about only one or two of these.

                                (Anthro/econ/soc are also sciences, but I’m leaving them separate because they usually make different claims and use different ((scientific!)) than what we think of as “scientific research” on software.)

                                One thing neither of us have brought up, that is also important here: we should know the failure modes of all our knowledge. The failure modes of science are really well known: we covered them in the article and our two responses. If we want to more heavily lean on history/philosophy/anthropology, we need to know the problems with using those, too. And I honestly don’t know them as well as I do the problems with scientific knowledge, which is one reason I don’t push it as hard- I can’t tell as easily when I should be suspicious.

                              2. 3

                                What a fantastic response.

                                When doctors get involved in fields such as medical education or quality improvement and patient safety, they often have a similar reaction to Richard’s. The problem is in thinking that the only valid way to understand a complex system is to study each of its parts in isolation, and if you can’t isolate them, then should just give up.

                                As Hillel illustrated nicely here, you can in fact draw valid conclusions from studying “complex systems in the wild”. While this is a “messier” problem, it is much more interesting. It requires a lot of creativity but also more rigor in justifying and selecting the methodology, conducting the study, and interpreting the results. It is very easy to do a subpar study in those fields, which confounds the perception about the fields being “unscientific”.

                                A paper titled Research in the Hard Sciences, and in Very Hard “Softer” Domains by Phillips, D. C. discusses this issue. Unfortunately, it’s behind a paywall.

                                1. 3

                                  Can we study teachers? Can we study doctors and nurses?

                                  The answer to that question might be “no”.

                                  When you’re replying to an article that’s titled “The False Promise of Science”, with a bunch of arguments against empirical software engineering that seem applicable to other fields as well, and your whole argument is basically an analogy, you should probably consider the possibility that Science is Just Wrong and we should all go back to praying to the sun.

                                  The education field is at least as fad- and ideology-driven as software, and the medical field has cultural problems and studies that don’t reproduce. Many of the arguments given in this essay are clearly applicable to education and medicine (though not all of them obviously are, I can easily come up with new arguments for both fields). The fundamental problem with applying science to any field of endeavor is that it’s anti-situational at the core. The whole point of The Scientific Method is to average over all but a few variables, but people operating in the real world aren’t working with averages, they’re working with specifics.

                                  The argument that software isn’t special cuts both ways, after all.


                                  I’m not sure if I actually believe that, though.

                                  The annoying part about this is that, as reasonably compelling as it’s possible to make the “science sucks” argument sound, it’s not very conducive to software engineering, where the whole point of the practice is to write generalized algorithms that deal with many slight variants of the same problem, so that humans don’t have to be involved in every little decision. Full-blown primativism, where you reject Scalable Solutions(R) entirely, has well-established downsides like heightened individual risk; one of the defining characteristics of modernism is risk diffusion, after all.

                                  Adopting hard-and-fast rules is just a trade-off. You make the common case simpler, and you lose out in the special cases. This is true both within the software itself (it’s way easier to write elegant code if you don’t have weird edge cases) and with the practice. The alternative, where you allow for exceptions to the rules, is decried as bad for different reasons.

                                  1. 6

                                    That is absolutely a valid counterargument! In response, I’d like to point out that we have learned a lot about those fields! Just a few examples:

                                    I’m don’t know very much about classroom teaching or nursing, so I can’t deep-dive into that research as easily as I can software… but there are many widespread and important studies in both fields that give us actionable results. If we can do that with nursing, why not software?

                                    1. 1

                                      To be honest, I think you’re overselling what empirical science tells us in some of these domains, too. Take the flipped classroom one, since it’s an example I’ve seen discussed elsewhere. The state of the literature summarized in that post is closer to: there is some evidence that this might be promising, but confidence is not that high, particularly in how broadly this can be interpreted. Taking that post on its own terms (I have not read the studies it cites independently), it suggests not much more than that overall reported studies are mainly either positive or inconclusive. But it doesn’t say anything about these studies’ generalizability (e.g. whether outcomes are mediated by subject matter, socioeconomic status, country, type of institution, etc.), suggests they’re smallish in number, suggests they’ve not had many replication attempts, and pretty much outright says that many studies are poorly designed and not well controlled. It also mentions that the proxies for “learning” used in the studies are mostly very short-term proxies chosen for convenience, like changes in immediate test scores, rather than the actual goal of longer-term mastery of material.

                                      Of course that’s all understandable. Gold-standard studies like those done in medicine, with (in the ideal case) some mix of preregistration, randomized controlled trials, carefully designed placebos, and longitudinal follow-up across multi-demographic, carefully characterized populations, etc., are logistically massive undertakings, and expensive, so basically not done outside of medicine.

                                      Seems like a pretty thin rod on which to hang strong claims about how we ought to reform education, though. As one input to qualitative decision-making, sure, but one input given only its proper weight, in my opinion significantly less than we’d weight the much better empirical data in medicine.

                                  2. 2

                                    Dammit, man. That was a great response. I don’t think I’ll ever comment anything anywhere just so my comment won’t be compared to this.

                                    1. 1

                                      My favorite software book is Data and Reality, which is a philosophical investigation into the nature of information representation.

                                      A beautiful book, one of my favorites as well.

                                      rest of post….

                                      While I thought the article articulated something important which I agree with, its conclusion felt a bit lazy and too optimistic for my taste – I’m more persuaded by the POV you’ve articulated above.

                                      While we’re making analogies, “writing software is like writing prose” seems like a decent one to explore, despite some obvious differences. Specifically relevant is the wide variety of different and successful processes you’ll find among professional writers.

                                      And I think this explains why you might be completely right that something like TDD is valuable for you, even though empirical studies don’t back up that claim in general. And I don’t mean that in a soggy “everyone has their own method and they’re all equally valid” way. I mean that all of your knowledge, the way think about programming, your tastes, your knowledge of how to practice TDD in particular, and on and on, are all inputs into the value TDD provides you.

                                      Which is to say: I find it far more likely that TDD (or similar practices with many knowledgeable, experienced supporters) have highly context sensitive empirical value than none at all. I don’t foresee them being one day unmasked by science as the sacred cows of religious zealots (though they may be that in some specific cases too).

                                      For something like TDD, the “treatment” group would really need to be something like “people who have all been taught how to do it by the same expert over a long enough time frame and whose knowledge that expert has verified and signed off on.”

                                      I’m not shilling for TDD, btw – just using it as a convenient example.

                                      The broader point is that effects can be real but extremely hard to show experimentally.

                                      1. 1

                                        “We’re not basing our worldview off a single study. We’re doing many of them, in many different contexts, to get different facets of what the answer might actually be.”

                                        That’s exactly what I do for the sub-fields I study. Especially formal proof which I don’t understand at all. Just constantly looking at what specialists did… system type/size, properties, level of automation, labor required… tells me a lot about what’s achievable and allows mix n’ matching ideas for new, high-level designs. That’s without even needing to build anything which takes a lot longer. That specialists find the resulting ideas worthwhile proves the surveys and integration strategy work.

                                        So, I strongly encourage people to do a variety of focused studies followed by integrated studies on them. They’ll learn plenty. We’ll also have more interesting submissions on Lobsters. :)

                                        “When someone says “you are unprofessional if you don’t use TDD” or “Dynamic types are immoral”, I know, with scientific certainty, that they don’t actually know. “

                                        I didn’t think about that angle. Actually, you got me thinking maybe we can all start telling that to new programmers. They get warned the field is full of hype, trends, etc that usually don’t pan out over time. We tell them there’s little data to back most practices. Then, experienced people cutting them down or getting them onto new trend might have less effect. Esp on their self-confidence. Just thinking aloud here rather than committed to idea.

                                        “Science is about admitting that we are going to make mistakes”

                                        I used to believe science was about finding the truth. Now I’d go further than you. Science assumes we’re wrong by default, will screw up constantly, and are too biased or dishonest to review the work alone. The scientific method basically filters bad ideas to let us arrive a beliefs that are justifiable and still might be wrong. Failure is both normal and necessary if that’s the setup.

                                        The cognitive dissonance make it really hard like you said. I find it a bit easier to do development and review separately. One can be in go mode iterating stuff. At another time, in skeptical mode critiquing the stuff. The go mode also gives a mental break and/or refreshes the mind, too.

                                        1. 1

                                          You’d be surprised. “Two comparisons of programming languages”, in “making software”, does this with nine teams (but only for one day).

                                          My reading (which is congruent with my experiences) indicates a newly-put-together team takes 3-6 months before productivity stabilizes. Some schools of management view this as ‘stability=groupthink, shuffle the teams every 6 months’ and some view it as ‘stability=predictability, keep them together’. However, IMO this indicates to me that you might not be able to infer much from one day of data.

                                          1. 2

                                            To clarify, that specific study was about nine existing software teams- they came to the project as a team already. It’s a very narrow study and definitely has limits, but it shows that researchers can do studies on teams of professionals.

                                          2. 1

                                            People bring up how expensive it is. And it is expensive, just as it’s expensive to study people in general. But here’s the thing. We are one of the richest industries in the history of the world. Apple’s revenue last year was a quarter trillion dollars. That’s not something we should leave to folklore and feelings. We’re worth studying.

                                            I don’t think I understand what you’re saying. Software is expensive, and for some companies, very profitable. But would it really be more profitable if it were better studied? And what exactly does that have to do with the kinds of things that the software engineering field likes to study, such as defect rates and feature velocities? I think that in many cases, even relatively uncontroversial practices like code review are just not implemented because the people making business decisions don’t think the prospective benefit is worth the prospective cost. For many products or services, code quality (however operationalized) makes a poor experimental proxy for profitability.

                                            Inasmuch as software development is a form of industrial production, there’s a huge body of “scientific management” literature that could potentially apply, from Frederick Taylor on forward. And I would argue it generally is being applied too: just in service of profit. Not for some abstract idea of “quality”, let alone the questionable ideal of pure disinterested scientific knowledge.

                                            1. 1

                                              Mistakes are becoming increasingly costly (e.g., commercial jets falling from the sky) so understanding the process of software-making with the goal of reducing defects could save a lot of money. If software is going to “eat the world”, then the software industry needs to grow up and become more self-aware.

                                              1. 1

                                                Aviation equipment and medical devices are already highly regulated, with quality control processes in place that produce defect rates orders of magnitude less than your average desktop or business software. We already know some things about how to make high-assurance systems. I think the real question is how much of that reasonably applies to the kind of software that’s actually eating the world now: near-disposable IoT devices and gimmicky ad-supported mobile apps, for example.

                                          1. 1

                                            I use Apple Notes and Things.

                                            1. 14

                                              I think it’s good to give people feedback when they’re being harsh/unkind but I don’t think it’s worth a downvote.

                                              A downvote to me is like saying “this comment doesn’t belong here”, as opposed to “thank you for your comment but I wish you were kinder in the way you wrote it”.

                                              1. 4

                                                This is a good point. I agree that my downvote in that situation would be “please express that differently”, not “don’t express that”.

                                                On the other hand, in-thread discussion seems to risk derailing the discussion, and is a bit too explicit of a lecturing stance. (I imagine I might react more positively to some relatively subtle expression of “I really didn’t like your tone” than to being called out publically.)

                                                1. 4

                                                  “in-thread discussion seems to risk derailing the discussion, and is a bit too explicit of a lecturing stance. “

                                                  In the political metas, most of the community already voted in favor of such comments in any thread where they thought something needed calling out. They also have been doing that for years now. So, this risk is already standard practice here.

                                                  Might as well follow that practice by simply pointing out the problem in a civil way. They’ll have a chance to improve.

                                                  1. 3

                                                    What about some sort of flagging mechanism for tone? Wouldn’t have to be restricted to abrasiveness; could easily include things like sexist / racist terms, etc. It’d be orthogonal to correctness (i.e. regular upvotes / downvotes), and could be filtered on separately.

                                                1. 3

                                                  I think it’s good to give people feedback when they’re being harsh/unkind but I don’t think it’s worth a downvote.

                                                  A downvote to me is like saying “this comment doesn’t belong here”, as opposed to “thank you for your comment but I wish you were kinder in the way you wrote it”.

                                                  1. 1

                                                    I fall into the same boat as you. I think it is worthwhile feedback, but don’t know if I think it should count as an actual downvote.

                                                  1. 4

                                                    Learn C++ in 21 Days by Jesse Liberty is one of my all time favorite programming books by one of my all time favorite technical writers. Just like all book series, “X in Action”, “Learning Y”, “Z for Dummies”, quality varies from one book to the other.

                                                    The series are marketing’s way of indicating the size of the book (short, medium, long), the assumed level of experience (beginner, intermediate, advanced) and the way it is mean to be consumed (start to finish, read what interests you, cookbook, reference, etc).

                                                    I’m not fond of the marketing around the “For Dummies” series of books but I’ve found a couple that were great at introducing me to (non-technical) subjects I was vaguely interested in. If you learn what the marketing is trying to convey and to evaluate each book on its technical merits, your choice of books will expand, and that’s a good thing.


                                                    The novice phase of learning anything is unavoidable. Even if you write a 10,000 page book mean to be read over 10-years, you will still have novices who’ve read the first few hundred pages who “know enough to be dangerous”. The problem is not in the learning materials, is in people’s poor ability to evaluate themselves, especially when they are novices. The solution to this problem is external evaluation. So before you allow people to write potentially dangerous code, you need to evaluate them through exams (in college), technical interviews, and supervised work.

                                                    1. 11

                                                      5 Whys is like a buggy DFS that visits the first adjacent node and ignores the rest and where the stop condition is reaching a stack size of 5.

                                                      1. 3

                                                        That’s what I got from the article too, even though it didn’t go so far as to suggest BFS as an alternative. It should be 1 Why And 4 Why Elses.

                                                      1. 10

                                                        Serve up AMP page to Google bots, and non-AMP to everyone else.

                                                        1. 7

                                                          When you visit an AMP page from Google’s results page, it’ll have a google.com url. You can’t get the higher ranking without serving the AMP version.

                                                          1. 4

                                                            Is this really doable? I.e. do you have experience / data that shows that this is something you can do without getting penalized?

                                                            1. 3

                                                              It’s hard to do, because Google bots for crawling AMP won’t tell you whether they’re Google bots or regular users.

                                                              1. 2

                                                                Agreed. It’s hard, but if we don’t fight back Google’s going to hoover up everything.

                                                                I, for one, will not sit idly by and let the free and open Internet die. I remember the walled gardens of the 80’s, or the siloed access of the 90’s with AOL.

                                                                1. 1

                                                                  I dunno. I still miss getting free frisbees in the mail every month.

                                                            2. 4

                                                              That won’t work — Google search users will get the AMP page anyway. Part of Google’s AMP implementation is that you no longer host the site yourself.

                                                              1. 2

                                                                That part would work fine. People are talking about giving AMP where AMP is not necessary, not when it is expected.

                                                            1. 1

                                                              I was baffled at first but it makes sense. This is the sort of thing more appropriate for an extension. No reason to have it as a browser feature. And as mentioned by many, JS is a lot more powerful and convenient for tracking. I doubt that anyone is using this in the real world.

                                                              1. 8

                                                                If you haven’t come across it, I also highly recommend Bob Nystrom’s book Crafting Interpreters, available for free. It has two parts: first he goes over building a tree-walking interpreter in Java then he goes over building a bytecode compiler & VM in C.

                                                                This second part is still a work-in-progress but he’s kept a strong pace, last chapter was released about a week ago.

                                                                Thank you for the great write up. I’m on a similar learning path and I really enjoyed it and got me excited to write my own compiler as well!

                                                                1. 6

                                                                  I’m hoping to get the chapters done by the end of 2019. If you’re impatient, all of the code for the entire book is already done. (In fact, I wrote all of the code and carefully split it into chapters before I wrote the first sentence of prose.)

                                                                  You can see it all here: https://github.com/munificent/craftinginterpreters/tree/master/c

                                                                  1. 2

                                                                    Thanks! I had briefly noticed Crafting Interpreters before, but I’m glad to hear it’s worth a second look.

                                                                    Thanks for the kind words, and I hope you will continue on your journey!

                                                                  1. 7

                                                                    In 2005 WordPress themes very the hottest internet commodity but they were all designed for left-to-right languages and were using CSS.

                                                                    Before then, it was easy to flip an entire site to be right-to-left with <html dir="rtl">, and designs were based on HTML tables and would correctly get flipped horizontally with that attribute. However, CSS based design were a regression from that point-of-view because they were filled with hard-coded directions like margin-left and padding-right.

                                                                    I wrote a Python script full of Regexes that converted all of those CSS properties, including combined ones like margin: 1px 2px 3px 4px; and put it up as an online converter. It’s been running since then. It briefly went offline a couple of years ago and I got a lot of emails asking me to fix it. I haven’t had to touch it since I wrote and I wouldn’t dare change any of it by now.

                                                                    1. 7

                                                                      This seems like more of an anti-feature to me. Maybe in limited uses it won’t be too bad?

                                                                      <joke> Maybe the perl folks are staring to move to ruby now? </joke>

                                                                      1. 6

                                                                        This snippet is kotlin (which had the opportunity to make it a reserved word at the getgo), but it’s applicable and imho is a good example of how very readble succinct code can come out of this:

                                                                        nums.filter { it > 5 }.sortBy { -it }.map { it * 3 }
                                                                        

                                                                        I’m a fan, at least. @1 is an uglier sigil to me, but that’s history.

                                                                        1. 11

                                                                          I’ll argue that any feature that has every been added to any programming languages has at least a few good use cases. It’s not like language designers are adding features just for the craic, they do it to solve real problems.

                                                                          The question isn’t so much “does this language feature make a certain type of problem easier to solve?”, but rather “does this solve enough problems to offset the costs of adding it to the language?”

                                                                          Adding features to languages comes with real costs. It will increase programmers cognitive load, it will make tools harder to write, it will make future language improvements/changes harder as features interact with features, etc.

                                                                          In this particular case, I’m not so sure if it’s a good trade-off. The problem it solves is typing an explicit parameter (|a|). It strikes me as a small problem, at best.

                                                                          1. 9

                                                                            They all seem like warts to paper over a lack of proper partial application.

                                                                            1. 4

                                                                              Partly, though you can use these variables to apply deeper than the first position

                                                                            2. 6

                                                                              Swift had $0, $1, etc since 1.0. I though I’d never use this syntax when I first saw it but I was very wrong. Your example is exactly where it shines.

                                                                              On paper, it looks magical. But in practice, coming up with arbitrary names for a parameter is probably less clear and adds more cognitive load, including coming up with good names when writing the code. Here’s the same example with an explicit “good” parameter name:

                                                                              nums.filter { num -> num > 5 }.sortBy { num -> -num }.map { num -> num * 3 }
                                                                              
                                                                              1. 5

                                                                                Oleg Kiselyov has an interesting take on the subject. Suppose Kotlin took this from Scala’s _.

                                                                              2. 2

                                                                                I think ‘limited uses’ is key. I expect we (team/employer) will adopt it, restricted to use in one-line blocks, enforced by a Rubocop.

                                                                                Haven’t seen Clojure mentioned yet in the comments, but that’s where I first encountered this kind of thing.

                                                                                1. 1

                                                                                  That could potentially encourage people to write ‘smarter’ and more magical one-liners.

                                                                                2. 1

                                                                                  Well it’s really close to perl’s $_[1]. But it doesn’t work in blocks if I recall correctly.

                                                                                  I think some people really want Perl but are afraid to admit that.

                                                                                1. 4

                                                                                  Honest question: Why are C programmers so keen about libraries being within one single source file? I guess it’s great if a library is simple and small, but a single file can also be very big…

                                                                                  1. 3

                                                                                    Probably because of the lack of a package and dependency management system.

                                                                                    1. 2

                                                                                      After having experience with old software and pip (Specifically trying (and failing) to get OsChameleon up and running) I feel safer knowing that a tarball of my code will build and work forever given POSIX 2008 is supported, and a good C compiler. Introducing package management systems into a programming language to me feels like a disgusting half-baked replication of a problem that is already ideally solved for 99% of linux systems.

                                                                                  1. 4

                                                                                    Very interesting read. Makes me think pure functional programming is a leaky abstraction.

                                                                                    1. 6

                                                                                      I dare say most abstractions become leaky if you push on them hard enough. :) Perhaps one dimension of the utility of an abstraction is just how hard you have to push before it leaks.

                                                                                      1. 2

                                                                                        Many of the problems they discuss are specific to Haskell and GHC, rather than being a general problem with all pure functional languages. Things like: they can’t inline loops, they have to copy a string in order to pass it to a foreign function. Even the problem with not being able to write low level code in a way that will be compiled and optimised predictably is fixable with the right language abstractions. (This is on my mind because I’m working on fixing some of these issues in my own pure functional language.)

                                                                                        1. 2

                                                                                          I get where you’re coming from, but from experience, Haskell is a lot leaker than others due to thunking and laziness. OCaml’s a functional language, but I don’t find it any leaker than, say, Python, because it’s strict and has a straightforward runtime. (Which isn’t to say Haskell’s bad/slow/whatever, just that the abstraction tower gets a lot higher the moment you pull laziness into the picture.)

                                                                                        1. 7

                                                                                          From the README:

                                                                                          pydis is an experiment to disprove some of the falsehoods about performance and optimisation regarding software and interpreted languages in particular.

                                                                                          Unfortunately many programmers [..] spend countless hours by making life harder for themselves in the name of marginal performance gains [..]

                                                                                          The aim of this exercise is to prove that interpreted languages can be just as fast as C

                                                                                          Okay, fair enough. but a SET operation is 80% the speed of Redis, a GET is 60%, and others as low as 40%!

                                                                                          Even the best case is not “marginal performance gains”. Perhaps the project will improve, but in its current state it seems to disprove the point it’s trying to make.

                                                                                          That doesn’t mean I completely disagree with the point as such, but at the same time Python does come with a real measurable performance cost – which may be corrected for by being easier to develop faster algorithms with, which is why mercurial is faster than git – and attempting to deny that seems a bit strange to me.

                                                                                          1. 10

                                                                                            I poked around a bit and he’s using the hiredis python package to parse requests, which is a thin wrapper around the C library, so he’s not actually doing everything in Python. Also, his implementations aren’t functional matches- he doesn’t update any query data, for example. So the actual Python code looks faster than it really is here.

                                                                                            I’d cut him a bit of slack; it looks like he’s still early in Uni and probably doesn’t know much better.

                                                                                            Tangent: Python is a bad language for proving how fast you are, you’d probably want to try LuaJIT or K.

                                                                                            1. 4

                                                                                              Update: raised issue, was closed as “invalid” because hiredis is part of the python ecosystem. I don’t think this is an accurate benchmark or will become one.

                                                                                              1. 3

                                                                                                Look at it from the point of view of a developer writing a Python web app. The main take away may be that an in-app cache might be all you need.

                                                                                              2. 2

                                                                                                Totally agree. He says marginal, then in the same breath says 40% drop in performance. 40% is not marginal period.

                                                                                                1. 7

                                                                                                  Getting 60% of obsessively-optimized C performance with idiomatic Python would be unbelievably good, like revolutionarily good. It’s much more likely he’s made a benchmarking mistake.

                                                                                                  1. 1

                                                                                                    Interesting theory regarding a benchmarking mistake. Isn’t it possible though, that this is simply because the Python interpreter itself is obsessively-optimized C for running this sort of idiomatic Python code.

                                                                                                    Okay may be not obsessively-optimized, but still pretty decent.

                                                                                                    1. 2

                                                                                                      It’d be revolutionary because in most other cases, Python gets nowhere near that close. At least in the benchmarks game you see it barely crack 1-10% as fast.