1. 18

    All of these articles are frustrating because they use different environments and test sets and none of the ones I’ve read have posted the test sets up. Some people use random characters, some people use existing files. Some people use files of 1 MiB, some 100 MiB, some several GiB in size. Not only that, but the people programming the replacements don’t even normalize for the difference in machine/processor capability by compiling the competitors and GNU wc from scratch. The system wc is likely to be compiled differently depending on your machine. The multithreaded implementations are going to perform differently depending on if you’re running Chrome when you test the app or not, etc.

    This would easily be solved by using the same distribution as a live USB, sharing testing sets, and compiling things from scratch with predefined options, but nobody seems to want to go to that much effort to get coherent comparisons.

    1. 7

      I tested this on a fresh install of Fedora 31, so I didn’t really see any benefit of running it on a LiveUSB. As I mentioned in the article, the wc implementation I used for comparison has been compiled locally with gcc 9.2.1 and -O3 optimizations. I’ve also listed my exact system specifications there. I’ve used the unprocessed enwik9 dataset (Wikipedia dump), truncated to 100 MB and 1 GB.

      I understand your frustrations with the previous posts, but I’ve tried to make my article as unambiguous as possible. Do give it a read, if you have any futher suggestions or comments, I’d be happy to hear them!

      1. 8

        These posts are all pointless because wc doesn’t represent high performance C code. If anyone cared about optimizing wc, it would use SIMD extensions like AVX to count multiple chars per cycle and trash all these blog posts (edit: apparently someone did, see this post on lobste.rs).

        The real take away: all these languages are fast enough for general purpose use, because they beat a C program that everyone considers fast enough for general purpose use.

        1. 8

          Someone did write a C version with SIMD. They got a ~100x speedup over wc.

          1. 5

            So they’re not pointless. The value in theses posts (this one included) is that they describe how to solve and optimize a problem in various languages.

        2. 3

          Most fail to mention the test setup’s locale as well! GNU’s version of wc, at least, uses the definition of “whitespace” from your current locale when counting words, while the linked Go implementation hard-codes whitespace as “ \t\n\r\v\f”. Whether this impacts speed and/or correctness depends on your locale.

          1. 5

            As mentioned in the article, the test files are us-ascii encoded. I’m comparing with the OS X implementation, not GNU, and I have used the same definition of whitespace as them. I didn’t mention this in the post for the sake of brevity.

            1. 3

              Well, unless you remove the multi-character white space function call (iswsspace) from the C version and replace it by an if/case statement as in the Go version. Then the C version is faster than the Go version, though by a small margin (they probably compile to similar machine code):

              https://lobste.rs/s/urrnz6/beating_c_with_70_lines_go#c_flic9z

              Still, the parallelization done in Go is nice!

        1. 2

          Not to discourage the more elaborate solutions, but a comment on the simple solution discussed early on, of just incrementally increasing the size of the integers being searched over:

          Unfortunately, when a program is restarted, we lose its progress. Regions that were already explored are explored again. If we increase int to int9, Sentient finds new solutions but it also finds all the same solutions again. This is wasteful.

          Wasteful in a sense, but not that wasteful, especially if you increment in some reasonable step size, like say 8 bits at a time. For example if you first search in an int8, then increment to int16 and restart the search, you do re-visit the int8 part of the space, but that’s only 0.39% of the int16 search space. If you increase again to int24 and restart, you’ll now have wastefully triple-searched 0.0015% of the search space, and double-searched 0.39% of the search space. That’s… not that much waste. This is essentially the same reason that iterative-deepening search isn’t that wasteful.

          1. 1

            This is very true! I do this a lot today manually and it works pretty well. Often the simple solutions are best.

          1. 1

            Weekend project to learn how to query APIs in JS and see if I can get what I want out of Wikidata. For a project I’m working on it would be nice to be able to put in arbitrary terms like “rock” or “Obama” or “grasshopper” and get an image back. Years ago people used to use the shut-down Google Image Search API for this, but Wikidata is nonprofit, will probably be around for a while, and arguably returns slightly better curated results, when it has them. So far it seems probably usable for my purposes. I thought I’d throw up a little web demo of the results so far. You can enter a word in one of several languages and (maybe) get an image back.

            There’s code at the end but this is the first thing I’ve actually written in JS so it may or may not be any good. It does seem to work in both Firefox and Chrome at least.

            1. 2

              Oh, nice, seems like a good reading follow-up to this post.

              1. 23

                Oh dang another essay on empirical software engineering! I wonder if they read the same sources I did

                Reads blog

                You watched the conference talk “What We Know We Don’t Know”, by Hillel Wayne, who, also disturbed by software’s apparent lack of scientific foundation, found and read as many scholarly papers as he could find. His conclusions are grim.

                I think I’m now officially internet famous. I feel like I crossed a threshold or something :D

                So I’m not sure how much of this is frustration with ESE in general or with me in particular, but a lot of quotes are about my talk, and so I’m not sure if I should be defending myself? I’m gonna err on the side of defending myself, mostly because it’s an excuse to excitedly talk about why I’m so fascinated by empirical engineering.


                One thing I want to open with. I’ve mentioned a couple of times on Lobsters that I’m working on a long term journalism project. I’m interviewing people who worked as “traditional” engineers, then switched to software, and what they see as the similarities and differences. I’ve learned a lot from this project, but one thing in particular stands out: we are not special. Almost everything we think is unique about software, from the rapid iteration to clients changing the requirements after we’ve released, happens all the time in other fields.

                So, if we can’t empirically study software engineering, it would follow that we can’t empirically study any kind of engineering. If “you can’t study it” only applied to software, that would make software Special. And everything else people say about how software is Special turns out to be wrong, so I think it’s the case here.

                I haven’t interviewed people outside of engineering, but I believe it goes even further: engineering isn’t special. If we can’t study engineers, then we can’t study lawyers or nurses or teachers or librarians. Human endeavor is incredibly complex, and every argument we can make about why studying software is impossible extends to any other job. I fundamentally reject that. I think we can usefully study people, and so we can usefully study software engineers.

                Okay so now for individual points. There’s some jank here, because I didn’t edit this a whole lot and didn’t polish it at all.

                You were disappointed with Accelerate: The Science of Lean Software and DevOps. You agreed with most of its prescriptions. It made liberal use of descriptive statistics.

                Accelerate’s research is exclusively done by surveying people. This doesn’t mean it’s not empirical- as I say in the talk, qualitative information is really helpful. And one of my favorite examples of qualitative research, the Gamasutra Study on Crunch Mode, uses a similar method. But it’s far from being settled, and it bothers me that people use Accelerate as “scientifically proven!!!

                1. Controlled experiments are typically nothing like professional programming environments […] So far as I know, no researcher has ever gathered treatment and control groups of ten five-developer teams each, put them to work M-F, 9-5, for even a single month, in order to realistically simulate the conditions of a stable, familiar team and codebase.

                You’d be surprised. “Two comparisons of programming languages”, in “making software”, does this with nine teams (but only for one day). Some labs specialize in this, like SIMULA lab. Companies do internal investigations on this- Microsoft and IBM especially has a lot of great work in this style.

                But regardless of that, controlled experiments aren’t supposed to be holistic. They test what we can, in a small context, to get solid data on a specific thing. Like VM Warmup Blows Hot and Cold: in a controlled environment, how consistent are VM benchmarks? Turns out, not very! This goes against all of our logic and intuition, and shows the power of controlled studies. Ultimately, though, controlled studies are a relatively small portion of the field, just as they’re a small portion of most social sciences.

                For that matter, using students is great for studies on how students learn. There’s a ton of amazing research on what makes CS concepts easier to learn, and you have to use students for that.

                1. The unpredictable dynamics of human decision-making obscure the effects of software practices in field data. […] This doesn’t hold for field data, because real-life software teams don’t adopt software practices in a random manner, independent from all other factors that might potentially affect outcomes.

                This is true for every form of human undertaking, not just software. Can we study teachers? Can we study doctors and nurses? Their world is just as chaotic and dependent as ours is. Yet we have tons of research on how educators and healthcare professionals do their jobs, because we collectively agree that it’s important to understand those jobs better.

                One technique we can use cross-correlating among many different studies on many different groups. Take the question “does Continuous Delivery help”. Okay, we see that companies that practice it have better outcomes, for whatever definiton of “outcomes” we’re using. Is that correlation or causation? Next we can look at “interventions” where a company moved to CD and see how it changed their outcomes. We can see what practices all of the companies share and what things they have different, to see what cluster of other explanations we have. We can examine companies where some teams use CD and some teams do not, and correlate their performance. We can look at what happens when people move between the different teams. We can look at companies that moved away from CD.

                We’re not basing our worldview off a single study. We’re doing many of them, in many different contexts, to get different facets of what the answer might actually be. This isn’t easy! But it’s worth doing.

                1. The outcomes that can be measured aren’t always the outcomes that matter. […] So in order to effectively inform practice, research needs to ask a slightly different, more sophisticated question – not e.g. “what is the effect software practice X has on ‘defect rate’”, but “what is the effect software practice X has on ‘defect rate per unit effort’”. While it might be feasible to ask this question in the controlled experiment setting, it is difficult or impossible to ask of field data.

                Pretty much all studies take this as a given. When we study things like “defect rate”, we’re always studying it in the context of unit time or unit cost. Otherwise we’d obviously just use formal verification for everything. And it’s totally feasible to ask this of field data. In some cases, companies are willing to instrument themselves- see TSP or the NASA data sets. In other cases, the data is computable- see research on defect rates due to organizational structure and code churn. Finally, we can cross-correlate between different projects, as is often done with repo mining.

                These are hard problems, certaintly. But lots of things are “hard problems”. It’s literally scientists’ jobs to figure out how to solve these problems. Just because we, as layfolk, can’t figure out how to solve these problems doesn’t they’re impossible to solve.

                1. Software practices and the conditions which modify them are varied, which limits the generality and authority of any tested hypothesis

                This is why we do a lot of different studies and test a lot of different hypothesis. Again, this is an accepted fact in empiricial research. We know it’s hard. We do it anyway.

                But if you’re holding your breath for the day when empirical science will produce a comprehensive framework for software development – like it does for, say, medicine – you will die of hypoxia.

                A better analogue is healthcare, the actual system of how we run hospitals and such. Thats in the same boat as software development: there’s a lot we don’t know, but we’re trying to learn more. The difference is that most people believe studying healthcare is important, but that studying software is not.

                Is this cause for despair? If science-based software development is off the table, what remains? Is it really true as Hillel suggests, that in the absence of science “we just don’t know” anything, and we are doomed to an era of “charisma-driven development” where the loudest opinion wins, and where superstition, ideology, and dogmatism reign supreme?

                The lack of empirical evidence for most things doesn’t mean we’re “doomed to charisma-driven development.” Rather it’s the opposite: I find the lack of evidence immensely freeing. When someone says “you are unprofessional if you don’t use TDD” or “Dynamic types are immoral”, I know, with scientific certainty, that they don’t actually know. They just believe it. And maybe it’s true! But if they want to be honest with themselves, they have to accept that doubt. Nobody has the secret knowledge. Nobody actually knows, and we all gotta be humble and honest about how little we know.

                Of course not. Scientific knowledge is not the only kind of knowledge, and scientific arguments are not the only type of arguments. Disciplines like history and philosophy, for instance, seem to do rather well, despite seldom subjecting their hypotheses to statistical tests.

                Of course science isn’t the only kind of knowledge! I just gave a talk at Deconstruct on the importance of studying software history. My favorite software book is Data and Reality, which is a philosophical investigation into the nature of information representation. My claim is that science is a very powerful form of knowledge that we as software folk not only neglect, but take pride in our neglecting. It’s like, yes, we don’t just have science, we have history and philosophy. But why not use all three?

                Your decision to accept or reject the argument might be mistaken – you might overlook some major inconsistency, or your judgement might be skewed by your own personal biases, or you might be fooled by some clever rhetorical trick. But all in all, your judgement will be based in part on the objective merit of the argument

                Of course we can do that. Most of our knowledge will be accumulated this way, and that’s fine. But I think it’s a mistake to be satisfied with that. For any argument in software, I can find two experts, giants in their fields, who have rigorous arguments and beautiful narratives… that contradict each other. Science is about admitting that we are going to make mistakes, that we’re going to naturally believe things that aren’t true, no matter how mentally rigorous we try to be. That’s what makes it so important and so valuable. It gives us a way to say “well you believe X and I believe not X, so which is it?”

                Science – or at least a mysticized version of it – can be a threat to this sort of inquiry. Lazy thinkers and ideologues don’t use science merely as a tool for critical thinking and reasoned argument, but as a substitute. Science appears to offer easy answers. Code review works. Continuous delivery works. TDD probably doesn’t. Why bother sifting through your experiences and piecing together your own narrative about these matters, when you can just read studies – outsource the reasoning to the researchers? […] We can simply dismiss them as “anti-science” and compare them to anti-vaxxers. […] I witnessed it play out among industry leaders in my Twitter feed, the day after I started drafting this post.

                I think I know what you’re referencing here, and if it’s what I think it is, yeah that got ugly fast.

                Regardless of how Thought Leaders use science, my experience has been the opposite of this. Being empirical is the opposite of easy. If I wanted to not think, I’d say “LOGICALLY I’m right” or something. But I’m an idiot and want to be empirical, which means reading dozens of papers that are all maddeningly contradictory. It means going through papers agonizingly carefully because the entire thing might be invalidated by an offhand remark.[1] It means reading paper’s references, and the references’ references, and trawling for followup papers, and reading the followup paper’s other references. It means spending hours hunting down preprints and emailing authors because most of the good stuff is locked away by the academic paper hoarders.

                Being empirical means being painfully aware of the cognitive dissonance in your head. I love TDD. I recommend it to beginners all the time. I think it makes me a better programmer. At the same time, I know the evidence for it is… iffy. I have to accept that something I believe is mostly unfounded, and yet I still believe in it. That’s not the easy way out, that’s for sure!

                And even when the evidence is in your favor, the final claim is infuriatingly nuanced. Take code review! “Code Review works”. By works, I mean “in most controlled studies and field studies, code review finds a large portion of the extant bugs in reviewed code in a reasonable timeframe. But most of the comments in code review are not bug-finding, but code quality things, about 3 code improvements per 1 bug usually. Certain things make CR better, and certain things make it a lot worse, and developers often complain that most of the code review comments are nitpicks. Often CRs are assigned to people who don’t actually know that area of the codebase well, which is a waste of time for everyone. There’s a limit to how much people can CR at a time, meaning it can easily become a bottleneck if you opt for 100% review coverage.”

                That’s a way more nuanced claim than just “code review works!” And it’s way, way more nuanced than about 99% of the Code Review takes I see online that don’t talk about the evidence. Empiricism means being more diligent and putting in more work to understand, not less.


                So one last thought to close this out. Studying software is hard. People bring up how expensive it is. And it is expensive, just as it’s expensive to study people in general. But here’s the thing. We are one of the richest industries in the history of the world. Apple’s revenue last year was a quarter trillion dollars. That’s not something we should leave to folklore and feelings. We’re worth studying.

                [1]: I recently read one paper that looked solid and had some really good results… and one sentence in the methodology was “oh yeah and we didn’t bother normalizing it”

                1. 3

                  Hi Hillel! I’m glad you found this, and thank you for taking the time to respond.

                  I’m not sure you necessarily need to mount a defense, either. I didn’t consciously intend to set your talk up as the antagonist in my post, but I realize this is sort of what I did. The attitude I’m trying to refute (that empirical science is the only source of objective knowledge about software) is somewhat more extreme than the position you advocate. And the attitude you object to (that software “can’t be studied” empirically, and nothing can be learned this way) is certainly more extreme than the position I hoped to express. I think in the grand scheme of things we largely share the same values, and our difference of opinion is rather esoteric and mostly superficial. That doesn’t mean it’s not interesting to debate, though.

                  Re: Omitted variable bias

                  You seemed to suggest that research could account for omitted variable bias by “cross-correlating” studies

                  • across different companies
                  • within one same company before and after adopting/disadopting the practice
                  • across different teams within the same company.

                  I submit to you this is not the case. Continuing with the CD example, suppose CD doesn’t improve outcomes but the “trendiness” that leads to it does. It is completely plausible for

                  • trendy companies to be more likely to adopt CD than non-trendy companies
                  • trendy teams within a company to be more likely to adopt CD than non-trendy teams
                  • a company that is becoming more trendy is more likely to adopt CD and be trendier before the adoption than after adoption
                  • a company that is becoming less trendy is more likely to disadopt CD and be trendier before the disadoption than after

                  If these hold, then all of the studies in the “cross-correlation” you describe will still misattribute an effect to CD.

                  You can’t escape omitted variable bias just by collecting more data from more types of studies. In order to legitimately address it, you need to do one of:

                  • Find some sort of data that captures “trendiness” and include it as a statistical control.
                  • Find an instrumental variable
                  • Find data on teams within a company that were randomly assigned to CD (so that trendiness no longer correlates with the decision to adopt).

                  If you don’t address a plausible omitted variable bias in one of these ways, then basically you have no guarantee that the effect (or lack of effect) you measured was actually the effect of the practice and not the effect of whatever social conditions or ideology led to the adoption of your practice (or something else that those social conditions caused). This is a huge threat to validity, especially to “code mining” studies whose only dataset is a git log and therefore have no possible hope of capturing or controlling the social or human drivers behind the practice. To be totally honest, I assign basically zero credibility to the empirical argument of any “code mining” study for this reason.

                  Re: The analogy to medicine

                  As @notriddle seemed to be hinting at, professions comprehensively guided by science are the exception, not the rule. Science-based lawyering seems… unlikely. Science-based education is not widely practiced, and is controversial in any case. Medicine seems to be the major exception. It’s worth exploring the analogy/disanalogy between software and medicine in greater detail. Is software somehow inherently more difficult to study than medicine?

                  Maybe not. You brought up two good points about avenues of software research.

                  Companies do internal investigations on this- Microsoft and IBM especially has a lot of great work in this style.

                  and

                  In some cases, companies are willing to instrument themselves- see TSP or the NASA data sets.

                  I think analysis of this form is miles more persuasive than computer lab studies or code mining. If a company randomly selects certain teams to adopt a certain practice and certain teams not to, this solves the realism problem because they are, in fact, real software teams. And it solves the omitted variable bias problem because the practice was guaranteed to have been adopted randomly. I think much of the reason medicine has been able to incorporate empirical studies so successfully is because hospitals are so heavily “instrumented” (as you put it) and willing to conduct “clinical trials” where the treatment is randomly assigned. I’m quite willing to admit that we could learn a lot from empirical research if software shops were willing to instrument themselves as heavily as hospitals, and begin randomly designating teams to adopt practices they want to study. I think it’s quite reasonable to advocate for a movement in that direction.

                  But whether or not we should advocate for more better data/more research is orthogonal to the main concern of my post: in the meantime, while we are clamoring for better data, how ought we evaluate software practices? Do we surrender to nihilism because the data doesn’t (yet) paint a complete picture? Do we make wild extrapolations from the faint picture the data does paint? Or should we explore and improve the body of “philosophical” ideas about programming, developed by programmers through storytelling and reflection on experience?

                  It is very important to do that last thing. I wrote my post because, for a time, my own preoccupation with the idea that only scientific inquiry had an admissible claim to objective truth prevented me from enjoying and taking e.g. “A Philosophy of Software Design” seriously (because it was not empirical), and realizing what a mistake this was was somewhat of a personal revelation.

                  Re: Epistemology

                  Science is about admitting that we are going to make mistakes, that we’re going to naturally believe things that aren’t true, no matter how mentally rigorous we try to be. That’s what makes it so important and so valuable. It gives us a way to say “well you believe X and I believe not X, so which is it?”

                  Science won’t rescue you from the fact that you’re going to believe things that aren’t true, no matter how mentally rigorous you try to be. Science is part of the attempt to be mentally rigorous. If you aren’t mentally rigorous and you do science, your statistical model will probably be wrong, and omitted variable bias will lead you to conclude something that isn’t true.

                  Science, to me, is merely a toolbox for generating persuasive empirical arguments based on data. It can help settle the debate between “X” and “not X” if there are persuasive scientific arguments to be found for X, and there are not persuasive scientific arguments to be found for “not X” – but just as frequently, there turn out to be persuasive scientific arguments for both “X” and “not X” that cannot be resolved empirically must be resolved theoretically/philosophically. (Or – as I think describes the state of software research so far – there turn out to be persuasive scientific arguments for neither “X” nor “not X”, and again, the difference must be resolved theoretically/philosophically).

                  [Being empirical]… means reading dozens of papers that are all maddeningly contradictory. It means going through papers agonizingly carefully because the entire thing might be invalidated by an offhand remark.[1] It means reading paper’s references, and the references’ references, and trawling for followup papers, and reading the followup paper’s other references.

                  That’s a way more nuanced claim than just “code review works!” And it’s way, way more nuanced than about 99% of the Code Review takes I see online that don’t talk about the evidence. Empiricism means being more diligent and putting in more work to understand, not less.

                  I value this sort of disciplined thinking – but I think it’s a mistake to brand this as “science” or “being empirical”. After all, historians and philosophers also agonize through papers, crawling the reference tree, and develop highly nuanced, qualified claims. There’s nothing unique to science about this.

                  I think we should call for something broader than merely disciplined empirical thinking. We want disciplined empirical and philosophical/anecdotal thinking.

                  My ideal is that software developers accept or reject ideas based on the strength or weakness of the argument behind them, rather than whims, popularity of the idea, or the perceived authority or “charisma” of their advocates. For empirical arguments, this means doing what you described – reading a bunch of studies, paying attention to the methodology and the data description, following the reference trail when warranted. For philosophical/anecdotal arguments, this means doing what I described – mentally searching for inconsistencies, evaluating the argument against your own experiences and other evidence you are aware of.

                  Occasionally, this means the strength of a scientific argument must be weighed against a philosophical/anecdotal argument. The essence of my thesis is that, sometimes, a thoughtful, well-explained story by a practitioner can be a stronger argument than an empirical study (or more than one) with limited data and generality. “X worked for us at Dropbox and here is my analysis of why” can be more persuasive to a practitioner than “X didn’t appear to work for undergrad projects at 12 institutions, and there is not a correlation between X and good outcome Y in a sampling of Github Repos”.

                  1. 2

                    Hi, thanks for responding! I think we’re mostly on the same page, too, and have the same values. We’re mostly debating the degrees and methods of here. I also agree that the issues you raise make things much more difficult. My stance is just that while they do make things more difficult, they don’t make it impossible, nor do they make it not worth doing.

                    Ultimately, while scientific research is really important, it’s only one means of getting knowledge about something. I personally believe it’s an incredibly strong form- if philosophy makes one objective claim and science makes another, then we should be inclined to look for flaws in the philosophy before looking for flaws in the science. But more than anything else, I want defence in depth. I want people to learn the science, and the history, and the philosophy, and the anthropology, and the economics, and the sociology, and the ethics. It seems to me that most engineers either ignore them all, or care about only one or two of these.

                    (Anthro/econ/soc are also sciences, but I’m leaving them separate because they usually make different claims and use different ((scientific!)) than what we think of as “scientific research” on software.)

                    One thing neither of us have brought up, that is also important here: we should know the failure modes of all our knowledge. The failure modes of science are really well known: we covered them in the article and our two responses. If we want to more heavily lean on history/philosophy/anthropology, we need to know the problems with using those, too. And I honestly don’t know them as well as I do the problems with scientific knowledge, which is one reason I don’t push it as hard- I can’t tell as easily when I should be suspicious.

                  2. 3

                    What a fantastic response.

                    When doctors get involved in fields such as medical education or quality improvement and patient safety, they often have a similar reaction to Richard’s. The problem is in thinking that the only valid way to understand a complex system is to study each of its parts in isolation, and if you can’t isolate them, then should just give up.

                    As Hillel illustrated nicely here, you can in fact draw valid conclusions from studying “complex systems in the wild”. While this is a “messier” problem, it is much more interesting. It requires a lot of creativity but also more rigor in justifying and selecting the methodology, conducting the study, and interpreting the results. It is very easy to do a subpar study in those fields, which confounds the perception about the fields being “unscientific”.

                    A paper titled Research in the Hard Sciences, and in Very Hard “Softer” Domains by Phillips, D. C. discusses this issue. Unfortunately, it’s behind a paywall.

                    1. 3

                      Can we study teachers? Can we study doctors and nurses?

                      The answer to that question might be “no”.

                      When you’re replying to an article that’s titled “The False Promise of Science”, with a bunch of arguments against empirical software engineering that seem applicable to other fields as well, and your whole argument is basically an analogy, you should probably consider the possibility that Science is Just Wrong and we should all go back to praying to the sun.

                      The education field is at least as fad- and ideology-driven as software, and the medical field has cultural problems and studies that don’t reproduce. Many of the arguments given in this essay are clearly applicable to education and medicine (though not all of them obviously are, I can easily come up with new arguments for both fields). The fundamental problem with applying science to any field of endeavor is that it’s anti-situational at the core. The whole point of The Scientific Method is to average over all but a few variables, but people operating in the real world aren’t working with averages, they’re working with specifics.

                      The argument that software isn’t special cuts both ways, after all.


                      I’m not sure if I actually believe that, though.

                      The annoying part about this is that, as reasonably compelling as it’s possible to make the “science sucks” argument sound, it’s not very conducive to software engineering, where the whole point of the practice is to write generalized algorithms that deal with many slight variants of the same problem, so that humans don’t have to be involved in every little decision. Full-blown primativism, where you reject Scalable Solutions(R) entirely, has well-established downsides like heightened individual risk; one of the defining characteristics of modernism is risk diffusion, after all.

                      Adopting hard-and-fast rules is just a trade-off. You make the common case simpler, and you lose out in the special cases. This is true both within the software itself (it’s way easier to write elegant code if you don’t have weird edge cases) and with the practice. The alternative, where you allow for exceptions to the rules, is decried as bad for different reasons.

                      1. 6

                        That is absolutely a valid counterargument! In response, I’d like to point out that we have learned a lot about those fields! Just a few examples:

                        I’m don’t know very much about classroom teaching or nursing, so I can’t deep-dive into that research as easily as I can software… but there are many widespread and important studies in both fields that give us actionable results. If we can do that with nursing, why not software?

                        1. 1

                          To be honest, I think you’re overselling what empirical science tells us in some of these domains, too. Take the flipped classroom one, since it’s an example I’ve seen discussed elsewhere. The state of the literature summarized in that post is closer to: there is some evidence that this might be promising, but confidence is not that high, particularly in how broadly this can be interpreted. Taking that post on its own terms (I have not read the studies it cites independently), it suggests not much more than that overall reported studies are mainly either positive or inconclusive. But it doesn’t say anything about these studies’ generalizability (e.g. whether outcomes are mediated by subject matter, socioeconomic status, country, type of institution, etc.), suggests they’re smallish in number, suggests they’ve not had many replication attempts, and pretty much outright says that many studies are poorly designed and not well controlled. It also mentions that the proxies for “learning” used in the studies are mostly very short-term proxies chosen for convenience, like changes in immediate test scores, rather than the actual goal of longer-term mastery of material.

                          Of course that’s all understandable. Gold-standard studies like those done in medicine, with (in the ideal case) some mix of preregistration, randomized controlled trials, carefully designed placebos, and longitudinal follow-up across multi-demographic, carefully characterized populations, etc., are logistically massive undertakings, and expensive, so basically not done outside of medicine.

                          Seems like a pretty thin rod on which to hang strong claims about how we ought to reform education, though. As one input to qualitative decision-making, sure, but one input given only its proper weight, in my opinion significantly less than we’d weight the much better empirical data in medicine.

                      2. 2

                        Dammit, man. That was a great response. I don’t think I’ll ever comment anything anywhere just so my comment won’t be compared to this.

                        1. 1

                          My favorite software book is Data and Reality, which is a philosophical investigation into the nature of information representation.

                          A beautiful book, one of my favorites as well.

                          rest of post….

                          While I thought the article articulated something important which I agree with, its conclusion felt a bit lazy and too optimistic for my taste – I’m more persuaded by the POV you’ve articulated above.

                          While we’re making analogies, “writing software is like writing prose” seems like a decent one to explore, despite some obvious differences. Specifically relevant is the wide variety of different and successful processes you’ll find among professional writers.

                          And I think this explains why you might be completely right that something like TDD is valuable for you, even though empirical studies don’t back up that claim in general. And I don’t mean that in a soggy “everyone has their own method and they’re all equally valid” way. I mean that all of your knowledge, the way think about programming, your tastes, your knowledge of how to practice TDD in particular, and on and on, are all inputs into the value TDD provides you.

                          Which is to say: I find it far more likely that TDD (or similar practices with many knowledgeable, experienced supporters) have highly context sensitive empirical value than none at all. I don’t foresee them being one day unmasked by science as the sacred cows of religious zealots (though they may be that in some specific cases too).

                          For something like TDD, the “treatment” group would really need to be something like “people who have all been taught how to do it by the same expert over a long enough time frame and whose knowledge that expert has verified and signed off on.”

                          I’m not shilling for TDD, btw – just using it as a convenient example.

                          The broader point is that effects can be real but extremely hard to show experimentally.

                          1. 1

                            “We’re not basing our worldview off a single study. We’re doing many of them, in many different contexts, to get different facets of what the answer might actually be.”

                            That’s exactly what I do for the sub-fields I study. Especially formal proof which I don’t understand at all. Just constantly looking at what specialists did… system type/size, properties, level of automation, labor required… tells me a lot about what’s achievable and allows mix n’ matching ideas for new, high-level designs. That’s without even needing to build anything which takes a lot longer. That specialists find the resulting ideas worthwhile proves the surveys and integration strategy work.

                            So, I strongly encourage people to do a variety of focused studies followed by integrated studies on them. They’ll learn plenty. We’ll also have more interesting submissions on Lobsters. :)

                            “When someone says “you are unprofessional if you don’t use TDD” or “Dynamic types are immoral”, I know, with scientific certainty, that they don’t actually know. “

                            I didn’t think about that angle. Actually, you got me thinking maybe we can all start telling that to new programmers. They get warned the field is full of hype, trends, etc that usually don’t pan out over time. We tell them there’s little data to back most practices. Then, experienced people cutting them down or getting them onto new trend might have less effect. Esp on their self-confidence. Just thinking aloud here rather than committed to idea.

                            “Science is about admitting that we are going to make mistakes”

                            I used to believe science was about finding the truth. Now I’d go further than you. Science assumes we’re wrong by default, will screw up constantly, and are too biased or dishonest to review the work alone. The scientific method basically filters bad ideas to let us arrive a beliefs that are justifiable and still might be wrong. Failure is both normal and necessary if that’s the setup.

                            The cognitive dissonance make it really hard like you said. I find it a bit easier to do development and review separately. One can be in go mode iterating stuff. At another time, in skeptical mode critiquing the stuff. The go mode also gives a mental break and/or refreshes the mind, too.

                            1. 1

                              You’d be surprised. “Two comparisons of programming languages”, in “making software”, does this with nine teams (but only for one day).

                              My reading (which is congruent with my experiences) indicates a newly-put-together team takes 3-6 months before productivity stabilizes. Some schools of management view this as ‘stability=groupthink, shuffle the teams every 6 months’ and some view it as ‘stability=predictability, keep them together’. However, IMO this indicates to me that you might not be able to infer much from one day of data.

                              1. 2

                                To clarify, that specific study was about nine existing software teams- they came to the project as a team already. It’s a very narrow study and definitely has limits, but it shows that researchers can do studies on teams of professionals.

                              2. 1

                                People bring up how expensive it is. And it is expensive, just as it’s expensive to study people in general. But here’s the thing. We are one of the richest industries in the history of the world. Apple’s revenue last year was a quarter trillion dollars. That’s not something we should leave to folklore and feelings. We’re worth studying.

                                I don’t think I understand what you’re saying. Software is expensive, and for some companies, very profitable. But would it really be more profitable if it were better studied? And what exactly does that have to do with the kinds of things that the software engineering field likes to study, such as defect rates and feature velocities? I think that in many cases, even relatively uncontroversial practices like code review are just not implemented because the people making business decisions don’t think the prospective benefit is worth the prospective cost. For many products or services, code quality (however operationalized) makes a poor experimental proxy for profitability.

                                Inasmuch as software development is a form of industrial production, there’s a huge body of “scientific management” literature that could potentially apply, from Frederick Taylor on forward. And I would argue it generally is being applied too: just in service of profit. Not for some abstract idea of “quality”, let alone the questionable ideal of pure disinterested scientific knowledge.

                                1. 1

                                  Mistakes are becoming increasingly costly (e.g., commercial jets falling from the sky) so understanding the process of software-making with the goal of reducing defects could save a lot of money. If software is going to “eat the world”, then the software industry needs to grow up and become more self-aware.

                                  1. 1

                                    Aviation equipment and medical devices are already highly regulated, with quality control processes in place that produce defect rates orders of magnitude less than your average desktop or business software. We already know some things about how to make high-assurance systems. I think the real question is how much of that reasonably applies to the kind of software that’s actually eating the world now: near-disposable IoT devices and gimmicky ad-supported mobile apps, for example.

                              1. 23

                                Not good in principle, mostly because it seems pretty sloppy. In terms of impact, though, I would guess the vulnerable configuration is incredibly rare?

                                The privilege escalation scenario here is that you’ve given a user sudoers access to run commands as (ALL, !root), i.e. as any user except root. This bug lets them upgrade that into being able to run them as root, also. Is there any remotely common scenario where you would have that kind of sudoers setup? I can vaguely imagine something like that from old-school multiuser academic Unix servers, but even there it’d be a somewhat exotic setup (restricted sudoers there are typically restricted to specific users they can sudo to, like the apache user or something, or a prof being able to sudo to their students, but not to ALL, !root).

                                1. 18

                                  If hosting a NAS at a friends house isn’t an option, Backblaze B2 might be a nice option ($5/1TB/m *)

                                  1. 8

                                    +1 for Backblaze B2. I use it with Restic and it works great.

                                    1. 7

                                      I use B2 and have been happy with it. I use rclone to interact with it.

                                      1. 3

                                        rclone seems neat. Thanks for the pointer!

                                      2. 3

                                        I’d recommend using fiber inside your house for faster backups.

                                        1. 2

                                          The only issue I found with Backblaze is that it requires your phone number upon registration.

                                          This is a serious turn-down.

                                          1. 3

                                            Why is this a problem? Genuine naive question.

                                            1. 6

                                              because they don’t need it

                                              1. 2

                                                Not OP but for me:

                                                • I don’t give my number out to anyone except people I know
                                                • I have a Google voice number which a lot of companies flag as “not a real number” for since reason
                                                1. 1

                                                  Spam, data collection, identification

                                                2. 1

                                                  Auch, didn’t know that. That’s indeed a negative. Especially when you have to link payment details anyway.

                                                3. 2

                                                  Backblaze B2 + Duplicati backs up all the non-application files on my laptop for ~$0.10/month. It’s been a few months and I don’t even think they’ve billed me yet since the monthly bill is so small.

                                                  1. 2

                                                    @timvisee after reading your and other recommendations, I’m trying out BackBlaze. I have a lot of photos, around 60,000. I’m currently backing them all up individually (as in, I just point to the folder and sync the folder). Should I be creating tars of them and backing those up? Thanks!

                                                    1. 2

                                                      Tarring would probably be faster yes, to prevent lots of expensive file creations.

                                                  1. 43

                                                    I would guess that a very substantial proportion of the people who read Lobsters have heard of Timsort.

                                                    1. 22

                                                      That, and TimSort really only makes sense in certain situations:

                                                      • Comparisons have to be extremely expensive. Think dynamic dispatch. This is because TimSort itself performs a bunch of branches and integer comparisons to keep track of galloping scores and the stack invariant.

                                                      • You need a stable sort. If you don’t need a stable sort, pattern defeating quicksort will probably do better.

                                                      1. 11

                                                        Quicksort is the Achilles of sorting algorithms: unbeatably fast, easy to implement, in-place; but with the vulnerable heel of bad worst-case performance (the worst case being pre-sorted data in the naïve implementation) and instability.

                                                        1. 5

                                                          There’s a fairly easy fix to that, called introsort: start with quicksort, but bail out to a guaranteed O(n log n) sort like heapsort if it takes too long. In the bail-out case, you lose constant-factor performance compared to if you had used heapsort in the first place, but you avoid quicksort’s O(n^2) worst case, while still getting its good performance in non-pathological cases. It’s used in practice in .NET and some C++ STL implementations.

                                                          1. 3

                                                            Quicksort -> Heapsort is method I used. It worked fine in practice. I love solutions like that. Another, unrelated one was sklogic’s trick of using a fast, dumb parser first to see if it’s correct. If it wasn’t, he switched to one that made error messages easier.

                                                            I bet there’s more of this stuff waiting to be found for situations where people are shoving every case into one algorithm.

                                                        2. 4

                                                          Comparisons have to be extremely expensive. Think dynamic dispatch.

                                                          That explains why Python uses it as it’s standard sort.

                                                          1. 9

                                                            Yeah. That’s exactly why Python uses TimSort.

                                                            More tellingly, where Rust uses an algorithm that’s related to TimSort for its stable sorting algorithm, they didn’t implement “galloping” because it’s not worth it. https://github.com/rust-lang/rust/blob/7130fc54e05e247f93c7ecc2d10f56b314c97831/src/liballoc/slice.rs#L917

                                                        3. 10

                                                          I consider myself relatively knowledgeable about many different topics of CS and had not heard of Timsort until this article. What’s the point of your comment? That the article is not worth posting as you presume that it is widely known?

                                                          1. 4

                                                            The point is that the title of the article, and of this submission, is inaccurate. I would call the title clickbait because for most readers, the article doesn’t deliver what it promises – a sorting algorithm “you’ve” never heard of. I think the article itself is fine; it’s just the title that is a lie.

                                                            1. 5

                                                              That seems to be a really low-value comment. For whom is the remark actually intended? For other crustaceans to look at, nod and agree, thinking, “yes, I too possess the superior knowledge”? Does every submission with a title that invokes “you” need to be correct and ‘deliver on its promise’ for all possible “you”s? C’mon.

                                                              1. 4

                                                                Yes, I suppose jgb could have been more explicit in why they brought up their guess. (I wrote an explanation of my interpretation of the comment, not the original comment.)

                                                                Does every submission with a title that invokes “you” need to be correct and ‘deliver on its promise’ for all possible “you”s?

                                                                I think every article with a title that invokes “you” needs to be correct and ‘deliver on its promise’ for the majority of possible “you”s in its audience. If a title says “you’ll love this” and most readers don’t love it, the title was wrong, and it wasted people’s time by getting them to open the article on false pretenses. It is up to article authors to adhere to that principle or not.

                                                                As for titles of submissions of articles with clickbait titles, there can be a conflict between submission titles that reflect the author’s intent and titles that accurately describe the article. I don’t have a simple answer as to when submission titles should differ from the article title.

                                                                1. 3

                                                                  I think every article with a title that invokes “you” needs to be correct and ‘deliver on its promise’ for the majority of possible “you”s in its audience.

                                                                  I think I agree with this, and I think my concern comes down to disagreeing instead with the notion that the majority(/“a very substantial proportion”) of Lobsters readers have heard of Timsort. Short of a poll there’s not an actual answer to that; I just felt particularly rankled because I hadn’t, and presumably if I had I wouldn’t have bothered or thought to comment myself.

                                                                  I err on the side of preserving the article title in the submission, which I think is pretty common. Accordingly, I think most Lobsters are primed to see submission titles that aren’t necessarily addressing them as Lobsters readers, but in some context that might be quite removed.

                                                          2. 2

                                                            I thought it played a pretty big role in the Oracle vs. Google lawsuit too, making it one of the more famous algorihtms.

                                                            However see “rangeCheck” mentioned a lot, which is a trivial part of TimSort.

                                                            https://en.wikipedia.org/wiki/Oracle_America,_Inc._v._Google,_Inc.

                                                            Here it seems to cite TimSort. But for some reason I can’t find a lot of sources that talk about TimSort and the lawsuit, even though at the time I remember it being a prominent thing.

                                                            https://forums.appleinsider.com/discussion/149435/google-engineers-defend-source-code-email-in-oracle-lawsuit-over-java/p4


                                                            edit: here’s another one mentioning TimSort and the lawsuit.

                                                            https://majadhondt.wordpress.com/2012/05/16/googles-9-lines/

                                                            Googling “rangeCheck timsort lawsuit” digs up some results.

                                                          1. -1

                                                            Comments should almost never exist. They are untyped and often get outdated and then cause more harm than good. There are no refactoring tools for them. It’s generally a sign that the language is lacking necessary features or the code needs to be rewritten. An exception can be made for temporary comments like “todos”, but that’s it.

                                                            Instead of comments, make your code describe exactly what it does. useLongVariableAndMethodNames. Never have single letter variables(Yes: even use the word “index” instead of “i”). Put the time you would spend writing comments into better tests and code.

                                                            Check out the book Clean Code if you want some great advice.

                                                            1. 12

                                                              I don’t see how using “index” instead of “i” helps. If you’re obviously looping through an array, and the for loop isn’t stupidly complicated, the letter “i” conveys exactly as much information as the word “index” to the reader.

                                                              Nobody read “for (i = 0; i < len(elements); ++i) do_thing(elements[i])” and wonders what that cryptic “i” could possibly mean.

                                                              1. -4

                                                                Next generation dev tools will rely increasingly on machine learning to aid in development. When you write “index” instead of “i”, you are providing a whole lot of additional information (reduction in entropy through narrowing of the search space). This helps folks building data driven tools.

                                                                You can start using this sort of thing today. I highly recommend TabNine.

                                                                1. 6

                                                                  I get your point, but I hope (perhaps in vain) that the next generation learns from everything the previous generations have worked to provide. For instance, the almost universal convention of i meaning ‘index’.

                                                                  My vision for next-gen development involves eDSLs and metaprogramming more than machine learning. In this case, perhaps a “comment” sublanguage that is human-readable but could be used to mechanically check the invariants and reasoning that our current comments are attempting to provide.

                                                                  1. 2

                                                                    perhaps a “comment” sublanguage that is human-readable mechanically check the invariants

                                                                    I agree with this. In my Tree Languages I try to avoid having any global “comment” node type, and instead only add very specific comment nodeTypes to langauges when necessary, (such as the “todo” comment nodeType in my Grammar Language https://jtree.treenotation.org/designer/#standard%20grammar and a “reference” nodeType in a data flow language which only takes a strongly typed URL). If someone needs to add a different type of comment, there’s no mechanism to do that itself and instead they must bring their problem upstream so we can examine it and either add some new semantics to the language or add a new subclass of comment for that particular category.

                                                              2. 19

                                                                I’d take Robert Martin with a grain of salt. While he’s a good writer, he’s also anti a lot of powerful coding techniques, including strong type systems, formal verification, and property testing.

                                                                Re “make your code describe exactly what it does”, comments are really good for describing things that aren’t what the code exactly does. Descriptive code describes the implementation at the same degree the implementation itself is, and cannot speak outside that implementation. Some comments where that matters:

                                                                • “While the public method is Shutdown.run, the API call actually happens in RequestBuilder.send via ResponseParser.”
                                                                • “While this is a performance bottleneck, we’re not at a scale which it matters. If we have performance problems, optimize this first.”
                                                                • “We’re using the Runge-Kutta approximation with the Fehlburg adjustment.”
                                                                • “We do not need to call foobar() here for reasons XYZ. Please do not submit a pull request adding it.”
                                                                • “This is not the event invocation module. You want file ABC. See issue #452 for the naming reasons.”
                                                                • Any of the 2D data representations here.

                                                                Sure, comments fall out of date, but that’s less because there’s something wrong with comments and more because programmers never bothered to treat comments with dignity and care. I’ve never seen a truly self-describing codebase, but I’ve seen plenty that needed a lot more comments.

                                                                1. 2

                                                                  [against] strong type systems, formal verification, and property testing.

                                                                  I use xmobar, which was written in Haskell, which professes all of the above.

                                                                  Every 30 minutes it segfaults on my system. I literally wrote a program to start it, and restart it when it segfaults, because it’s cheaper on my time than figuring out why it segfaults.

                                                                  Those things are part of good code, but they don’t make good code. They don’t guarantee that a system is failsafe, bulletproof, or otherwise stable.

                                                                  1. 7

                                                                    That’d be a good argument if he made it. But he’s saying they’re all fundamentally inferior to unit testing:

                                                                    the reason we are facing bugs that kill people and lose fortunes, the reason that we are facing a software apocalypse, is that too many programmers think that schedule pressure makes it OK to do a half-assed job. […]

                                                                    Better REPLs are not the answer. Model Driven Engineering is not the answer. Tools and platforms are not the answer. Better languages are not the answer. Better frameworks are not the answer. […] I stood before a sea of programmers a few days ago. I asked them the question I always ask: “How many of you write unit tests on a regular basis?” Not one in twenty raised their hands.

                                                                    1. 1

                                                                      fundamentally inferior to unit testing

                                                                      I’d hands down agree with this. If I had to get in a plane with software that was tested, or one with software that wasn’t tested but was written in a very strongly typed language, I’d choose the former every time.

                                                                      Now, he is NOT against strong typing, and he specifically says that. He’s just saying that unit testing is more important, which I agree with. He is for BOTH, which I am too.

                                                                  2. 0

                                                                    These are good points, and I appreciate the examples.

                                                                    anti..strong type systems

                                                                    Is that true? Source? I never got that impression from his writings, but maybe I overlooked it.

                                                                    Your examples reminded me of a 2nd place where comments are good (besides temporary todos): links to references. But I don’t think you should put untyped blobs of 2D art or anything like that in source code directly. Instead, have something like “reference http://foo.url”. Almost by definition, comments are the wrong grammar for anything/everything. If something can’t be explained well in code, it certainly can’t be explained well in comments, which are a completely untyped grammar. Instead, perhaps a link to an SVG or some other explanation is the proper channel.

                                                                    1. 13

                                                                      Is that true? Source? I never got that impression from his writings, but maybe I overlooked it.

                                                                      He straight up calls them the dark path. He also says “You don’t need static type checking if you have 100% unit test coverage.”

                                                                      Almost by definition, comments are the wrong grammar for anything/everything. If something can’t be explained well in code, it certainly can’t be explained well in comments, which are a completely untyped grammar.

                                                                      I’d argue the exact opposite. Comments aren’t tractable to analyze, but they have infinite expressiveness. Some things we can’t explain well in code because code’s expressiveness is limited by its need to be tractable. Code can only tell us what “is”, it cannot tell us anything else. That’s the price of having something that’s implementable.

                                                                      1. 4

                                                                        Comments aren’t tractable to analyze, but they have infinite expressiveness. Some things we can’t explain well in code because code’s expressiveness is limited by its need to be tractable.

                                                                        I’d disagree with this. Natural language has finite expressiveness, and is also limited by its need to be tractable to analyze. You can’t write only for yourself, but have to write in a way that is actually tractable to analyze by an external reader, who is a finitely limited human, not some kind of magical being living outside of space and time!

                                                                        Granted, humans are much better at this currently than computers, so the difference can seem huge, to the point where humans can be glossed as having “infinite” ability. But I’m not sure it’s a difference in kind rather than amount. The quantitative difference can even narrow very quickly if you don’t make strong implicit assumptions about your reader. For example, say your comment is written in a C++ codebase in English, and your reader is a strong C++ programmer, but has weak English fluency. Are your comments still infinitely expressive and tractable for the reader to analyze?

                                                                        1. 1

                                                                          He specifically says he is not anti strong systems:

                                                                          I don’t want you to think that I’m opposed to statically typed languages. I’m not.

                                                                          That being said, thanks for pasting that. Very interesting read. The crux of the article is that you should always be “writing lots and lots of tests, no matter what language you are using!” which i agree with.

                                                                    2. 15

                                                                      This is some of the worst advice I’ve ever seen with respect to comments. If someone followed these rules, their code wouldn’t pass my code review.

                                                                      Much better advice: do what is necessary to communicate understanding of code to others reading it. If comments are required, then do it. Whether the language is expressive enough or not is immaterial. Reality dictates.

                                                                      1. -2

                                                                        It’s forward looking. Comments should only be temporary. They are code smell. Temporary todo comments are okay. Comments that are links to references (which then can be in a language more appropriate to an “Explanation”) are okay.

                                                                        I highly recommend reading Clean Code if you haven’t before.

                                                                        Untyped blob comments are by definition a terrible grammar with infinite entropy and no tooling help and should be used very sparely.

                                                                        1. 9

                                                                          This is the most ridiculous take on commenting I’ve ever seen. If this is what’s in Clean Code, then it only reinforces my opinion that reading Bob Martin’s writing is a complete and total waste of time. (Which I initially formed by trying to read his writing.)

                                                                          I don’t care about being “forward looking.” I care about whether myself and others can read my code and understand it. Comments are one of a few critical tools that help make that happen. Removing that tool because of some ideological nonsense about it being “untyped” and it therefore possibly being incorrect is one of the most crazy cases of “don’t throw the baby out with the bathwater” that I’ve ever seen.

                                                                          1. -2

                                                                            my opinion that reading Bob Martin’s writing is a complete and total waste of time.

                                                                            My god sir, stop spending time reading my silly comments and go get yourself a paper copy of Clean Code! It will be worth its weight in gold, if you care about improving your skills.

                                                                            don’t throw the baby out with the bathwater

                                                                            I do live in the future a bit, as my time is spent designing new languages. But I stand 100% by my advice, if you’ve mastered most of the other aspects of programming. And perhaps that’s too much of an ideal, and indeed I almost always do provide an escape hatch in my languages to not throw the baby out, as you put it. But if you are resorting to comments other than for a very small few categories of buckets such as 1) temporary todos 2) links to references, you are doing something wrong. Perhaps it’s using the wrong language, perhaps it’s not understanding all features available in the language, perhaps its bad identifier names, perhaps its functions that are doing more than one thing, perhaps its not writing tests, perhaps its function bodys that are too long, perhaps its because your readers dont understand the programming language, perhaps its flag arguments, perhaps its too many parameters, perhaps its side effects…and on and on….There should never be a reason why you need to describe something in an undefined implicit grammar for a “human machine”. I 100% stand by this and have an immense amount of data on my side (I’ve read and worked on code and worked on language design in probably more languages than nearly anyone on earth at this point, ftr, so it would take some really novel evidence and data, and not some exposition, to convince me otherwise).

                                                                            1. 6

                                                                              Your experience is so far removed from mine (and many others that I know) that there is just no way we are going to see eye to eye. It is a waste of my time to get into it with you and give you data. You throwing your experience around is also really off putting to be honest. Others have plenty of experience too and strongly disagree with you, like I do. As I said, your methodology wouldn’t pass my code review. Not even close.

                                                                              1. 0

                                                                                that there is just no way we are going to see eye to eye

                                                                                That’s fine.

                                                                                You throwing your experience around is also really off putting to be honest.

                                                                                I know I was afraid of that, I just wanted to put out that to convince me otherwise would take lots of data, not argument, since my corpus is quite large.

                                                                                Others have plenty of experience too and strongly disagree with you

                                                                                That’s fine. I work on the future looking at data from the past to find new things and if everyone agreed with me I wouldn’t be inventing anything novel.

                                                                                As I said, your methodology wouldn’t pass my code review. Not even close.

                                                                                And I think it’s good that you take code reviews seriously. I work on different types of things where the goal of good language design is that you’d never have to write a comment to explain what the heck is going on, as the language should allow for it.

                                                                                But again, thank you for that chat (and I would read Clean Code. It’s a fantastic book!)

                                                                              2. 5

                                                                                if you are resorting to comments other than for a very small few categories of buckets such as 1) temporary todos 2) links to references, you are doing something wrong

                                                                                I find these kind of nuance-less “strong” opinions very tiresome. Observational evidence clearly shows that there are many highly productive programmers working on high quality code bases which use comments in ways other than you described. Indeed, the entire concept of Literate Programming is pretty much the exact opposite of what you’re advocating.

                                                                                Sometimes, there is more than one path to excellence. And what works for one person, might not work so well for another. Quite frankly, every time I see someone way “do X or you’re doing it wrong” – where “X” is a reasonably popular practice – especially without showing any understanding of why people do “X” (other than “they’re stupid” or “they’re wrong”) my response is something along the lines of “oh, he just hasn’t bothered to actually understand why people are doing the thing he’s so aggressively opposed to”. Turns out, that most of the time people are not complete idiots, and have actual valid reasons for doing the things they do (both in software dev and outside of it).

                                                                                I 100% stand by this and have an immense amount of data on my side

                                                                                You can’t say something like that and then not show your alleged “immense amount of data”.

                                                                                1. 0

                                                                                  Literate Programming is pretty much the exact opposite of what you’re advocating.

                                                                                  I love LP and I would say Literate programming and no comments are not opposed. In fact, I would say the opposite. Great programming languages should be self-documenting and readable. IMO we are barely scratching the surface of great programming languages, and my research is focused on this.

                                                                                  just hasn’t bothered to actually understand why people are doing the thing he’s so aggressively opposed to”.

                                                                                  What are you talking about? I am clearly demonstrating at least 2 buckets of categories for why people are commenting code (todos and references) that I think should always be there in languages for comments. I am also very open to more buckets. I am against again type of “# or // or /* “ syntax however, as that just shows the language designers have put little thought into why people are using comments and what they are using them for.

                                                                                  You can’t say something like that and then not show your alleged “immense amount of data”.

                                                                                  Well, for starters, the OP should read Clean Code. I’ve read that book probably 5 times by this point (and about 500 other programming books – Practice of Programming, Pragmatic Programmer, Pattern on the Stone, ANTLR 4 Definitive Reference, Learn you a Haskell, SICP, Bray’s Intel x86 assembly book, Mastering Regex, Programming Collective Intelligence, Data Structures + Algos equals Programs, are some good ones to start with). Here are 19 new languages I’ve designed https://github.com/treenotation/jtree/tree/master/langs. One of those took me an hour and sat at #1 on Lobsters for 2 days. We’ve also built the largest database of computer languages in the world by far, and will be releasing it soon, here: https://github.com/treenotation/cldb.

                                                                                  Turns out, that most of the time people are not complete idiots,

                                                                                  I never said this and don’t think anything of the sort. I just think the OP’s comments were a bit sophmoric (nothing wrong with that, we all go through phases of that). The OP started his response with “This is some of the worst advice I’ve ever seen with respect to comments”. That would be like me going up to Lebron James and saying “your basektball tips are some of the worst I’ve ever seen”. As Guido once said to me ‘When your reputation precedes you, I might answer your questions’. I think OP has a long way to go before I might take his opinions seriously over Bob Martin’s.

                                                                        2. 7

                                                                          One loop: index.

                                                                          Two loops: index, jndex.

                                                                          1. 1

                                                                            Generally you’d be iterating through collections of 2 different types, or 2 orthogonal directions. So you’d want something like manufacturerIndex, modelIndex.

                                                                            1. 2

                                                                              What if you’re iterating twice through the same data?

                                                                              For example, day 2, part B of 2018’s advent of code is essentially the following problem: Given a list of strings, all of the same length, find the unique pair that differs by one character in one position only. (That such a pair exists and is unique is promised by the problem statement.)

                                                                              Pretty much everyone will solve this by doing:

                                                                              for (int i = 0; i < len; i++) {
                                                                                for (int j = i+1; j < len; j++) {
                                                                                  // check the condition, return
                                                                                }
                                                                              }
                                                                              

                                                                              Is the time spent coming up with longer loop variable names worth it, when you could call them i and j and maybe comment that the condition being checked is commutative so we’re essentially checking a symmetric matrix and can start the inner loop at i+1?

                                                                          2. 6

                                                                            What language features are you thinking of that can explain why the code was written one way instead of another?

                                                                            As far as out of date comments go, it’s a code review problem.

                                                                            1. -1

                                                                              If you can’t explain in the language itself, then choose a more appropriate language for an explanation (Diagrams, for example) and add a comment with a link to that.

                                                                              As far as out of date comments go, it’s a code review problem.

                                                                              No. When you write a comment you are also making up an ad hoc, implicit grammar on the spot, with no tooling. Grammars are important to help people and tools communicate. You are asking a code reviewer to infer what the grammar is in a comment, to parse that example, and then run checks in their head to make sure nothing broke. That should be obvious why that is a big problem.

                                                                            2. 5

                                                                              Instead of comments, make your code describe exactly what it does. useLongVariableAndMethodNames. Never have single letter variables(Yes: even use the word “index” instead of “i”). Put the time you would spend writing comments into better tests and code.

                                                                              How about this as a guiding principle; do what needs to be done for people to understand your system.

                                                                              The key bit about this is that there are people on the other side of this equation, with different ways of understanding and picking up information. It’s all good and well for me to say, well, my unit tests describe the behavior of the system, but if people struggle to pick up information quickly that way, it’s not an efficient means of documenting the system (though it continues to serve a purpose in partially enforcing the behavior of the system).

                                                                              I like the useLongVariableAndMethodNames technique - I also don’t think it works for every situation, and in those times, you need other tricks.

                                                                              1. -1

                                                                                How about this as a guiding principle; do what needs to be done for people to understand your system.

                                                                                I like the intent behind it, but generally comments to me are a code smell that something has been done wrong. So I like the principle better: “if you are resorting to comments figure out what you are doing wrong”. But as there are dozens of things (at least) you have to master to avoid resorting to comments, switching to temporary comments (todos) is fine.

                                                                              2. 3

                                                                                I have a program where the following three lines of code are preceded with a block comment:

                                                                                if not location.path:match "/$" then
                                                                                  return HTTP_MOVEPERM . uurl.escape_path:match(location.path .. "/")
                                                                                end
                                                                                

                                                                                The context is a web server, having gone though a path were it ends up pointing to a directory. Now, can you explain why this code exist? And why do I even care to check this case? If you can answer this, then maybe the comment block that precedes this could be removed.

                                                                                1. 1

                                                                                  Two ideas for how you could improve this. 1) is to create a long method name that explains what/why something is happening. 2) is to add a temporary todo comment.

                                                                                  Very rough pseudocode:

                                                                                  # todo We had some bad links that pointed to an invalid url. Keep this here until those are all gone.
                                                                                  this.redirectBadLins()
                                                                                  
                                                                                  redirectBadLinks()
                                                                                   return HTTP_MOVEPERM . uurl.escape_path:match(location.path .. "/")
                                                                                  

                                                                                  I suggest reading Clean Code though, does a message better job of explaining great practices than I can.

                                                                                  1. 5

                                                                                    You are partially correct. On the server, it’s not a bad link—it does lead to the proper resource. But a close reading of RFC-3986 reveals that <http://example.com/foo> and <http://example.com/foo/> have to be treated as two separate URLs. Fine, but I can’t control what links people use. And without the trailing ‘/’ the path merging rules break (section 5.2.3).

                                                                                    You may counter that a comment in the form of -- see RFC-3896 sec. 5.2.3 is enough, but not really. Or perhaps a pointer to document elsewhere in the repository, but that’s another file that needs to be maintained along with the code. You might be fine with that, I’m not.

                                                                                    Another aspect of comments is working around bugs in libraries. It’s a sad fact that not everyone gets to use open source and fix every bug they come across in third party code. But I have a project that is littered with comments about working around bugs in a proprietary third party library that we have no access to the source code.

                                                                                    Oh, I found another example of a good comment in come code I’ve written (not the one dealing with the proprietary library):

                                                                                    /*--------------------------------------------------------------------
                                                                                    ; Basically, if g_poolnum is 0, then idx *has* to be 0.  That only
                                                                                    ; happens in two cases---when the program first start ups, and when the
                                                                                    ; program has received too many requests in a short period of time and
                                                                                    ; just gives up and resets back to 0.
                                                                                    ;--------------------------------------------------------------------*/
                                                                                    
                                                                                    if (g_poolnum == 0)
                                                                                    {
                                                                                      idx = 0;
                                                                                    }
                                                                                    

                                                                                    Can the variables name be better? Perhaps. Am I going to wrap this up into a function like reset_pool_if_number_left_is_zero()? No. It would be a) one more bloody function to test, b) only called from one location in the entire code base. Not worth it to me (not a fan of the large number of small functions).

                                                                                    I think we’re going to have to agree to disagree on your view of comments.

                                                                                    1. 2

                                                                                      In the limit, method names have exactly the same problem that comments do: they are written in natural language and can become out of sync with what the method actually does.

                                                                                      I could have answered that “why” posed above (when eventually I parsed the regex: sunday morning brain), but only because I’ve encountered the problem before. If foo matches a directory name on the server and you have (say) an index.html file inside it, if you serve the index instead of first doing the redirect then any relative links in it will be resolved incorrectly by the browser

                                                                                      1. 0

                                                                                        In the limit, method names have exactly the same problem that comments do: they are written in natural language and can become out of sync with what the method actually does.

                                                                                        This is true in theory, but in practice I find it much less likely, if you are following other best practices (like having short function bodies, functions that do what they say and only what they say, functions that do one thing, etc). But it does indeed happen, and when I see it occur I’ll drop a “todo This fn name is out of date”

                                                                                1. 9

                                                                                  There are students that say “Why do I need computer science in my life? It’s useless! I shouldn’t have to remember all this crap!”.

                                                                                  Everyone says this about everything in high school, and I don’t even think that it’s wrong, but rather that it could be the point.

                                                                                  I believe the mistake is in seeing high school as a kind of lite-university, when you basically start learning whatever you’re going to study and have as your career later on (which is great for all those who don’t know yet). Instead everything until high school should be seen as general education. This isn’t quite the case, I’m the first to admit it, because of all the stress that is being put on the educational system, from parents who might just want the best, to schools and teachers who have to maintain some grade-average.

                                                                                  I had CS in “high school” (the german variant) too, and while I certainly disagreed with points and procedures in the curriculum (I thought python might be better than java for teaching algorithms, but I’m a bit more reserved about that now), I always held modesty up as a virtue in class. Never say stuff like “Well actually, …” or laughing down at teachers for not knowing about the newest frameworks. I was a tutor at university last year, and seeing people who do think that they are “too good for introductions” is really annoying for educators. Ultimately, I was right in being modest, for there were many things I did learn by not having an attitude that made me thing I was above and beyond all of it.

                                                                                  On a side note, one reason CS theory is nice is that it doesn’t change that much (at least when it comes to the basics, the halting problem isn’t put on it’s head daily).

                                                                                  the string functions from string.h, including the dreaded strtok

                                                                                  I might have missed something, but what’s the issue with strtok. It’s a C function with state, but other than that is pretty average C.

                                                                                  1. 7

                                                                                    I believe the mistake is in seeing high school as a kind of lite-university, when you basically start learning whatever you’re going to study and have as your career later on (which is great for all those who don’t know yet). Instead everything until high school should be seen as general education.

                                                                                    I agree, and I will say this: No part of college is job training. Vocational schools are job training. You can see this because it’s right there in their name, which should be your first clue. A college education is designed to do one thing: Create researchers and academics in an academic field. If someone can apply what they learned in a degree program to practical careers, that’s one thing, but it shouldn’t be the end goal of that degree program.

                                                                                    1. 2

                                                                                      I’m sympathetic to not viewing universities as purely job-training; among other reasons, if you really did want purely job-training, they are a pretty convoluted and inefficient way to deliver it. But I think you go a bit too far in saying that their primary purpose is to create researchers and academics, at least for the bachelor’s degree. Even in the era when many fewer people went to university, it would have been still too few if it were really only aspiring researchers who attended!

                                                                                      The specifics vary by country & era, but if you look at who got university/college degrees in, say, late 19th century America, a lot were on their way to just general “educated person jobs”. Stuff like school teacher or principal, civil servant, tutor, editor, etc. Jobs that came with a certain level of prestige and expectation of being educated beyond high-school level, but not necessarily as a researcher in a specific subject (hence the popularity at the time of broad liberal-arts degrees).

                                                                                      1. 2

                                                                                        Also worth noting that it’s relatively recent that people get a degree in a topic relating to their eventual jobs. Yes, college was about education, and not necessarily just educating researchers, but that doesn’t mean it became vocational.

                                                                                    2. 2

                                                                                      Everyone says this about everything in high school, and I don’t even think that it’s wrong, but rather that it could be the point.

                                                                                      With that quote, I was pointing out the irony that computer science (hs) students say that computer science feels useless for them. They are those who choose to study computer science in school, who either have 5 hours a week or 7 for the “harder” variant, they voluntarily enroll in this programme. Middle school together with the first two years of high school are considered general education, the last two years of high school (the blog post applies to these years) really are considered lite-university. At the end you even get a “programmer helper” license, you can get a job at other schools to become a sysadmin with it.

                                                                                      1. 1

                                                                                        If it’s anything like our system (after 7’th gradewe had to choose a focus between “natural sciences”, “languages” and “social sciences”), then I would guess there will always people who choose “what they dislike the least”, instead of a real interest. Also CS has the “Computer” appeal, and as you mention, computers are used to play games, and there are certainly a lot of people who love to play games.

                                                                                        1. 3

                                                                                          Also CS has the “Computer” appeal, and as you mention, computers are used to play games, and there are certainly a lot of people who love to play games.

                                                                                          Not to mention how many people get into the field by wanting to make their own video games.

                                                                                    1. 13

                                                                                      It’s interesting to see compile-time framed as if it were a tooling issue. Language design affects it deeply. Go will be fast no matter what. Languages like Scala and C++ have some features that are inherently costly. At that point, the real gains come from providing guidance about what features/dependency structures to avoid.

                                                                                      1. 7

                                                                                        A possible counter-example here is D, which has the same momomorphization compilation model as C++, but is prized for the fast compiler.

                                                                                        1. 2

                                                                                          Is there anything reasonably in-depth written on why it’s faster? It seems implausible that it’s entirely due to just being a really good compiler and 100% unrelated to the design of D as a language. Two heuristic reasons to guess that there must be some relationship: 1) the primary D designer was also the primary compiler implementer with a long history of compiler implementation, so seems likely to have been at least somewhat influenced in his design of D by implementation concerns, and 2) nobody has written a fast C++ compiler, despite there being a big market there (certainly bigger than the market for a fast D compiler). I could be wrong though!

                                                                                          1. 1

                                                                                            I unfortunately don’t know of such an article. The folklore explanation I think is that dmd backend is much faster than llvm/gcc.

                                                                                      1. 3

                                                                                        Have a day off in London on Saturday after a conference this past week, then flying back to the US on Sunday. And then the fall semester starts on Monday…

                                                                                        1. 6

                                                                                          Two things on this:

                                                                                          I just recently started playing with Twitter’s APIs, and was a bit surprised to discover that you can only retrieve the most recent 3,200 tweets of a user. Any tweets older than that are forever inaccessible by the semi-public API. Note that you seem to need to provide a rather detailed description of what you intend to do with it in order to get developer keys. It seems that there are commercial APIs available for a very substantial price that may allow accessing older tweets, but nobody talks much about them. I did hear a rumor that their Firehose access - all tweets sent by anybody in realtime - costs 30% of your company’s revenue, whatever that is. I’m not sure if that’s true, but it does seem odd that so much of our history on Twitter is, for all practical purposes, forever locked away behind extremely expensive contracts.

                                                                                          It also seems that Twitter’s Rules are being weaponized, by both sides of the political divide, in attempts to control the conversation. The ban lists seem semi-random, and the decisions of what is and is not considered hateful seem rather arbitrary, possibly depending on which particular moderator gets a particular case. Aside from the difficulty and expense of accessing old data in general, it’s entirely possible people are running algorithms to search Twitter for potentially actionable things said against their favorite figures, even if they were said very long ago.

                                                                                          1. 2

                                                                                            Twitter has been slowly cutting all the good bits out of itself ever since they looked up and said “oh HEY we gotta make some money!” a few years back.

                                                                                            This is why decentralized platforms will prevail, because people are just not willing to pay for social media en masse.

                                                                                            1. 3

                                                                                              This is why decentralized platforms will prevail, because people are just not willing to pay for social media en masse.

                                                                                              Users are not going to be paying for Twitter any time soon, either? Twitter’s selling their content, but the platform remains free to use, and I can’t see that changing.

                                                                                              Companies, meanwhile, are perfectly willing to pay Twitter for access to their data.

                                                                                              1. 2

                                                                                                That’s right. That’s why the Fediverse’s model will win IMO, but I predict it will take much more work to get where it needs for truly widespread adoption - that being it becoming point and drool easy to start your own instance.

                                                                                                Right now you have to have some basic sysadmin skills in order to run one.

                                                                                            2. 2

                                                                                              I might misremember this, but I believe Twitter made a big splash about how you (as a logger in user) could now access all your tweets. I seem to remember downloading an archive back when that happened.

                                                                                              1. 3

                                                                                                You can download your own tweets, yeah. In the web interface, it’s under Settings->Account->Your Twitter data. What’s not possible to do easily is get the full tweet history of anyone else.

                                                                                                1. 1

                                                                                                  What’s not possible to do easily is get the full tweet history of anyone else.

                                                                                                  I’m certain that this is for performance reasons. No doubt if you pay for API access it’s possible.

                                                                                            1. 3

                                                                                              It’s always interesting to see people tip their hand by what terms they think should be used instead… because they are always wildly different terms! This one chooses the following:

                                                                                              The hype on terms like “machine learning” and “AI” is a rebranding of the terms “statistics” and “general programming logic”.

                                                                                              You could make an argument for there being some overlap between those fields, but they aren’t the same fields, historically or today, and the overlap is nowhere near 100%. Other proposals for what AI is “just”, or should be considered a subdiscipline of, have historically included: cybernetics, applied psychology, cognitive science, philosophy, systems engineering, HCI, computational logic, operations research, decision theory, …

                                                                                              The other comment I’d make is that AI as a field isn’t “rebranded” from anything recently… the field has existed for 60+ years! There are indeed some unwarranted hyped up claims (often from people who also don’t know the history of the field they claim to be in) that could be criticized as shallow rebranding, but criticizing badly done or overhyped AI research is a bit different than trying to wholesale erase a field without really studying it.

                                                                                              1. 12

                                                                                                This has nothing to do with tech, it happens everywhere in higher paid jobs when comparing the US to the EU. Everyone is giving you gut feel answers, but this is an economics question and has answers that we can back up with data.

                                                                                                The driving force for this is inequality. The inequality in the US is far larger than that in the EU. So people who have worse jobs have worse lives than in the EU and people who have really good jobs do far better. Just to put this in context. You’re in the top 1% of earners in France with $215k per year but you need $475k per year to be in the top 1% in the US.

                                                                                                You can also see this from another angle. You say that devs are paid doctor or lawyer salaries in the US compared to the EU. But those are EU doctors and lawyers. In the US they also make far more.

                                                                                                Another, but much smaller contributor to this, is that the average salary in the US is 30% higher than the richer EU countries.

                                                                                                1. 4

                                                                                                  You can see this across almost all scales in almost all fields, too. As a professor in Denmark, I made significantly less than most American professors—but our PhD students and non-academic staff made significantly more than most American equivalents. In the US you often have 5x or more ratios between different levels, e.g.: cafeteria worker makes $20k, PhD student makes $25k, prof makes $100k, senior administrator makes $500k. In Denmark, it’s more often 0.5x than 5x, something like: cafeteria worker makes $40k, PhD student makes $55k, prof makes $70k, senior administrator makes $100k. By American standards, some of these salaries are low and some are high.

                                                                                                  1. 2

                                                                                                    The driving force for this is inequality. The inequality in the US is far larger than that in the EU.

                                                                                                    It also has to do with global inequality.

                                                                                                    1. 1

                                                                                                      I was investigating software engineering jobs in the EU about 1-2 years ago; this was roughly the conclusion I came to. The EU has less inequality and, usually, better social benefits. While I like being paid US salaries, I can’t help but think the EU is generally a healthier place on most axes.

                                                                                                    1. 13

                                                                                                      Regarding this point:

                                                                                                      Scheme seems to be an interesting compromise.

                                                                                                      Scheme is relevant to this post, I think, but in a bit different way. It has basically tried all the responses to the dilemma outlined, at various points and sometimes simultaneously. Various Schemes added more or fewer things. The 6th revision (R6RS) tried to standardize large amounts of similar functionality, but was rejected by significant parts of the community for being too big/complex. Racket split off at this point, mostly going the “big” direction and rebranding itself not a Scheme. The 7th revision (R7RS) responded by issuing two language standards, one following each path, named R7RS-large and R7RS-small. Various implementations have chosen either of those standards, or some other point near or between them, or stuck with R6RS, or stuck with R5RS, etc.

                                                                                                      I definitely think all this experimentation is interesting, but I’d argue the jury is still out on whether any kind of stable compromise in the design space has been reached.

                                                                                                      1. 13

                                                                                                        At least R7RS seems to be pretty much universally accepted, and makes it possible to write portable libraries. We’re not quite there yet, but I believe Scheme is definitely close to the point where it is easy enough to write portable code. As for all the other points, as I was reading the article I kept thinking “Scheme fits the bill” all the time, until of course the author mentioned it.

                                                                                                      1. 18
                                                                                                        1. Bellard is as impressive as always.

                                                                                                        2. Someone found a use-after-free.

                                                                                                        1. 8

                                                                                                          A bit of context on the ‘someone’ for those interested: qwertyoruiop is the individual who created (half of) the Yalu jailbreak for iOS 10, and has contributed to many other big jailbreaking releases for both iOS and other platforms (e.g., PS4).

                                                                                                          1. 2

                                                                                                            I don’t understand that use after free. Isn´t that a legit use of js? That is: isn’t the interpreter doing what it is supposed to do? Or not?

                                                                                                            1. 1

                                                                                                              use after free is using the content of a pointer after the memory of it was released, allowing writing at any part of the process running the javascript interpreter. It means, that it allow going outside of the javascript sandbox and as such allow for a webpage taking full control of your computer. As any important security bug, it is a way for any virus or malware to install itself on a computer.So definitely it is not a legit use of js.

                                                                                                              1. 1

                                                                                                                i was asking if the use after free bug is JS or in the interpreter.

                                                                                                                1. 2

                                                                                                                  The bug is in the interpreter here. The JS in the link is a proof-of-concept exploit for the bug.

                                                                                                          2. 1

                                                                                                            This is a quick reminder that script VMs are hard to develop, especially for complexe PLT such as JavaScript. Never ever run arbitrary code in those kind of interpreter, even if you believed you hardened it by removing privileged functions or I/Os. FWIW, don’t even try to run to run arbitrary code in widely used engine such as spidermonkey or V8 if they are not sandboxed. RCE still get found every now and then.

                                                                                                          1. 10

                                                                                                            Rolling one’s own Unicode! This library sounds like it could be useful on its own:

                                                                                                            A specific Unicode library was developped so that there is no dependency on an external large Unicode library such as ICU. All the Unicode tables are compressed while keeping a reasonnable access speed.

                                                                                                            The library supports case conversion, Unicode normalization, Unicode script queries, Unicode general category queries and all Unicode binary properties.

                                                                                                            The full Unicode library weights about 45 KiB (x86 code).

                                                                                                            1. 10

                                                                                                              To my knowledge tmux doesn’t advertise/support privilege-separation by pane, so I’m not sure how big a deal this is in practice (I’m fairly certain that if you start a tmux session as user you cannot send commands to another tmux sesssion which was started as root, for example).

                                                                                                              1. 4

                                                                                                                Agreed, I’m not seeing the vulnerability here. At least, not with tmux

                                                                                                                1. 0

                                                                                                                  I believe the issue is that, given one single tmux session with multiple windows/panes, you can send-keys to any pane. The user could have used su or sudo to open and keep a root shell in one of the panes. So, the send-keys is done as non-root, but the keystrokes/characters go into a root shell.

                                                                                                                  As far as I know, you can’t send from one shell session to another (as the same user) with ssh or common shells like bash or zsh.

                                                                                                                  1. 3

                                                                                                                    I think you are confusing shells, remote connect protocols and terminal multiplexers. Tmux is a terminal multiplexer, its job is to allow users with access to the tmux server to access multiple shells or programs on a single display. This is also “scriptable” using the provided tmux commands. The pseudo terminals that are accessible under tmux are under the user’s control; the user is also deciding what programs are running in these pseudo terminals (by decision, I mean he actually has to authenticate willingly as root, using su).

                                                                                                                    I really like tedu’s comment above. The analogy is quite simple to explain: if you have multiple VTs, in each VT a different user being logged on. You now have a monitor and a keyboard attached to this system. Does this mean the keyboard is vulnerable to privilege escalation, as you can now switch consoles with Ctrl+Alt+Fn ?

                                                                                                                    Here’s another analogy: you have a web browser logged-on your webmail. Does this mean that the shell spawning the browser is vulnerable to privilege escalation? Obviously, someone with shell access to that system, under your username, can read the browser cookies and use them to access your webmail account.

                                                                                                                    1. 4

                                                                                                                      I really like tedu’s comment above. The analogy is quite simple to explain: if you have multiple VTs, in each VT a different user being logged on. You now have a monitor and a keyboard attached to this system. Does this mean the keyboard is vulnerable to privilege escalation, as you can now switch consoles with Ctrl+Alt+Fn ?

                                                                                                                      The thing that I think could actually trip people up is that VT’s are routinely used to run different things at different privilege levels (especially while debugging system issues). For example, say you log on as root on VT1, and a throwaway user on VT2 to try something out. On any Unix I know of, it wouldn’t be necessary to log out of VT1 before running untrusted code on VT2, because a script run as an untrusted user on VT2 shouldn’t be able to send keystrokes to the root shell on VT1, even though you, the operator, could switch to VT1 and type keystrokes there. (There are also programmatic ways to send stuff to other terminals, but on a typical Unix, the permissions to do that require running as root.)

                                                                                                                      People sometimes treat it as a replacement for VTs, which might be surprising if code can potentially execute at the highest privilege level available in any pane of the tmux session, and/or as the owner of the tmux process itself. So it generally shouldn’t be used for sysadmin-type tasks where you might sometimes drop privileges to an untrusted user before running things. (Maybe nowadays those should all be automated and/or explicitly sandboxed anyway, but it’s not uncommon with traditional Unix system administration to sometimes have such tasks.)

                                                                                                                      edit: This is wrong, see below…

                                                                                                                      1. 7

                                                                                                                        I think this reflects a misunderstanding of what’s happening. tmux does not allow a program from one pane to send keystrokes to another pane. tmux allows the user who ran tmux to send keystrokes to their tmux session.

                                                                                                                        If user alice runs tmux, and then uses su to switch to bob in one session, bob cannot send keystrokes to any other alice session. alice, however, can send keystrokes to the session with a bob login, because alice still owns the terminal it’s running in.

                                                                                                                        1. 2

                                                                                                                          I think this reflects a misunderstanding of what’s happening. tmux does not allow a program from one pane to send keystrokes to another pane.

                                                                                                                          Yes, if the above is truly not allowed, then I admit I have misunderstood the original post. However, I just tested the following script, after using su (including manual password entry) to set up pane 1 to have a root shell. It does indeed type into the root shell in pane 1, including pressing Enter to execute the command.

                                                                                                                          #!/bin/sh
                                                                                                                          tmux send-keys -t 1 "echo 'hello'"
                                                                                                                          tmux send-keys -t 1 "Enter"
                                                                                                                          

                                                                                                                          I admire the knowledge level and logic skills of those for whom this is unsurprising, or who downplay it as equivalent to some other already-known and already-existing attack vector. I myself would not have conceived this use of tmux.

                                                                                                                          1. 1

                                                                                                                            Ah, right, sorry, I think I did misunderstand. So the case where you drop privileges in one tmux pane to a user who doesn’t own the tmux session is still safe-ish, which I thought this link had claimed wasn’t the case.

                                                                                                                          2. 1

                                                                                                                            It is possible to send keystrokes from VT1 to VT2, by using the uinput kernel module. This is exactly the behaviour happening in tmux. If the user is able to access /dev/uinput (or the tmux session), then it can send any keystroke to any VT (or any tmux window/pane).

                                                                                                                            Edit: I do agree that the typical Linux does not have this module loaded the device accessible by default. Neither is the tmux session of a single user.

                                                                                                                    2. 1

                                                                                                                      Just to satisfy my curiosity, I logged into my ubuntu machine and started a tmux session as root. Then I started a session as my regular user. As expected, the sessions aren’t able to see one another.

                                                                                                                      1. 1

                                                                                                                        The problem scenario is different than that:

                                                                                                                        Start one session as a non-root user. Open two windows/panes. su or sudo to become root in one pane. Then you can send-keys from the non-root pane to the root pane. send-keys can be executed from the CLI (that is, can be done from shell scripts or programs executing shell commands).

                                                                                                                        1. 1

                                                                                                                          I understand that, it just doesn’t personally bother me all that much. It doesn’t seem conceptually different than running a sudo command, and then running some shell script which then attempts to do something with sudo permissions. It is worth being aware of, but seems more like an expected (if edge-case) property of how tmux works rather than some security hole. That’s just my personal view though.

                                                                                                                          1. 1

                                                                                                                            Well, for me, the key difference is passwordless vs. not.

                                                                                                                            1. 1

                                                                                                                              But that would be the same with the sudo case too, right? I don’t know what the sudo password expiry time limit is, but on my system if I issue a sudo command, there is a window of time in which a subsequent sudo command will not require a pw (and if it comes from a script etc issued by my user in that session, there is no difference)

                                                                                                                              It will be interesting to see what if anything tmux changes about this - I’m not sure, if that were my project, what I would do. Even reading the comments here, I’m finding my opinion sort of still in formation.

                                                                                                                              1. 1

                                                                                                                                Perhaps one option is to offer two binaries, one with send-keys, one without. I’ve never used that feature, and probably wouldn’t, even now that I’ve found out about it. A similar pair of options could be offered for source builders via ./configure switches, perhaps.

                                                                                                                    1. 12

                                                                                                                      My brother who studies maths just took an exam for the programming course at his uni, which was taught in C using a terrible old IDE and seemed to mostly focus on undefined behavior, judging from the questions in the exam. The high school programming class was similar, from what he told me.

                                                                                                                      I’m baffled that this is considered acceptable and even normal, and that Racket, with its beautiful IDE, its massive standard library and its abundance of introductory programming course material is not even considered. I know there’s a lot of understandable reasons for this, but it’s still so backwards.

                                                                                                                      1. 8

                                                                                                                        Ha! Yes. That reminds me how angry I used to get about mediocre, obsolete, industry-driven CS pedagogy as a student. I dealt with it in part by finding a prof who was willing to sponsor an independent study course (with one other CS student) where we worked through Felleisen’s How To Design Programs, using what was called Dr Scheme at the time. But eventually I gave up on CS as a major, and switched to Mathematics. Encountered some backwardness there too, but I’ve never regretted it – much better value for my time and money spent on higher ed. The computer trivia can always be picked up as needed, like everybody does anyway.

                                                                                                                        From what I understand, my school now teaches the required intro CS courses in Python. This seems like a reasonable compromise to me, because average students can get entry-level Python jobs right out of school.

                                                                                                                        1. 7

                                                                                                                          As someone who has had to deal with a lot of code written by very smart non-computer-scientist academics, please be careful telling yourself things like “The computer trivia can always be picked up as needed”. Good design is neither trivial nor taught in mathematics classes.

                                                                                                                          Usually isn’t taught in CS classes either, I confess, but the higher level ones i’ve experienced generally at least try.

                                                                                                                          1. 3

                                                                                                                            I agree completely, and I actually took most of the upper-division CS courses that seemed genuinely valuable, even though they didn’t contribute to my graduation requirements after I switched. (The “software engineering” course was… disappointing.) But I’ve learned a ton about good engineering practices on the job, which is where I strongly suspect almost everybody actually learns them.

                                                                                                                            I currently deal with a lot of code written by very smart CS academics, and most of it is pretty poorly engineered too.

                                                                                                                        2. 4

                                                                                                                          Racket is used in the intro course at Northeastern University, where several of the developers are faculty, so there’s at least one place it’s possible to take that route. I think this might be either the only or one of the only major universities using a Lisp-related language in its intro course though. MIT used Scheme in its intro course for years, but switched to Python a few years ago.

                                                                                                                          I haven’t been seeing much C at the intro level in years though (I don’t doubt it’s used, just not in the corners of academia I’ve been in). We use Python where I teach, and I think that’s overwhelmingly becoming the norm. C is used here only in the Operating Systems class. When I was a CS undergrad in the early 2000s, seemingly everywhere used Java.

                                                                                                                          1. 3

                                                                                                                            Sounds like the exam was designed to teach the sorts of things he’ll be asked in programming interviews. Now he has great “fundamentals”!

                                                                                                                            1. 3

                                                                                                                              Same here. Professors suck at my university, which happens to be one of the top universities in China (It’s sponsored by Project 985). Our C++ exams are mostly about undefined behaviors from an infamous but widespread textbook, the SQL course still teaches SQL Server 2008 which has reached its EoL over 5 years ago and cannot be installed on a MacBook, and it’s mandatory to learn SAS the Legendary Enterprise Programming Language (mostly SAS is used in legacy software). Well, I’m cool with it because I’m a fair self-learner, but many of my fellows are not.

                                                                                                                              I have a feeling that the professors are not really into teaching, and maybe they don’t care about the undergraduates at all. Spending time on publishing more papers for themselves is probably more rewarding than picking up some shiny “new technologies” which can benefit their students. I guess they would be more willing to tutor graduate students which can help to build their academic career.

                                                                                                                              1. 1

                                                                                                                                Our first three programming courses were also in C (first two general intro, the third one was intro to algorithms and data structures). After that, there was a C++ course. This is the first time I had an academic introduction to C++–I already knew it was a beast from personal use, but seeing it laid out in front of me in a few months of intense study really drove the point home. I was told this was the first year they were using C++11 (!)

                                                                                                                                Programming education in math departments seems to be aimed at making future math people hate it (and judging by my friends they’ve quite succeeded, literally everyone I ask says they’re relieved that they “never have to do any programming again”).

                                                                                                                                1. 2

                                                                                                                                  Programming education in math departments seems to be aimed at making future math people hate it

                                                                                                                                  Exactly! I can’t imagine how somebody with no background in programming would enjoy being subjected to C, let alone learn anything useful from such bad courses, especially at university age.

                                                                                                                                  1. 2

                                                                                                                                    I thought C was awesome when university gave us a 6-week crash course in it, we had to program these little car robots.

                                                                                                                                    1. 4

                                                                                                                                      “6-week crash course in it” “program these little car robots.”

                                                                                                                                      The choice of words is interesting given all the automotive C and self-driving cars. Is it your past or something prophetic you’re talking about?