1.  

    but I’ll focus on Vulkan because it’s the greatest programming API to ever exist

    I assume they’re being consciously hyperbolic, here, but it doesn’t inspire confidence in the rest of their analysis.

    1. 1

      From the linked video:

      The stack [interpreter] has done three instructions, whereas the register [interpreter] has only done two instructions

      Generally, stack machines need more instructions, in order to shuffle data around. You see this especially with getlocal and storelocal (or equivalent) instructions, which are completely obviated on a register machine.

      And ‘reasonable encoding’ there are much more compact encodings. There are also much more extensive encodings (which you almost certainly want). ‘Increment’ and ‘add immediate’ are almost certainly things you want, on either style of vm.

      [compiler is not cheap]

      Both truffle/graal and jitter are able to automatically generate a compiler from a ‘cheap’ interpreter. I believe pypy does similarly. Additionally, compilers are not actually so hard as the video makes them out to be; a very dumb compiler is similarly complex to a very smart interpreter, and similarly performant as well.

      1. 1

        Would you have a link for Jitter? I searched, but not sure what I am finding is related.

        1. 2
          1. 1

            Thanks!

      1. 3

        Nice. Brings me back to writing the network stack for https://sneakysnake.io. The whole WebRTC data channel spec is a complicated mess, even when you’re just implementing enough of it for unreliable message transmission. But at least it’s fairly well specified. Of course, that doesn’t mean the browsers implement the spec accurately. They don’t.

        1. 7

          You can have type level programming without distinguishing between type level programming and value level programming. Dependent types achieves this, by making types and values live in the same namespace.

          1. 1

            Do you have any favorite examples?

            1. 6

              Dependent types make stuff like type families, GADTs and kinds obsolete, which are all special-cased in Haskell today.

              So since we’re talking about removing code, I it is weird to bring an example. But really just take any arbitrary type family, and that can serve as an example.

              A function like this has to be written using a type family or GADT in Haskell:

              the (Bool -> Type) (\x => if x then Float else Int)
              

              (this is Idris syntax)

              1. 2

                Does that mean Zig has some form of dependent types?

                fn the(x: bool) type {
                    return if (x) f64 else i64;
                }
                
                pub fn main() void {
                    @import("std").debug.warn("wee " ++ @typeName(the(true)));
                }
                
                1. 4

                  I would say that Zig does have dependent types, but it’s dependent type system is “unsound”, in the sense that a function call can type check in one use case but might fail type checking in another use case, due to how types are only checked after use and monomorphization. It’s arguably better than not having them at all, though, and I think that’s born out in practice.

                  It certainly limits the value of dependent types, though. You can’t reliably use Zig’s type system as a proof assistant in the way that you can use Coq’s, Agda’s, or Idris’s.

                  1. 2
            1. 17

              Very insightful. I do find type level programming in Haskell (and, to a lesser extent, Rust) to be a confusing nightmare. Nonetheless, Rust could not exist without traits (i.e. without bounded polymorphism). The Sync and Send marker traits (combined with borrow checking) are the basis of thread safety.

              I think Zig takes an interesting approach here, with it’s compile-time programming (i.e. type level programming with the same syntax and semantics as normal programming), but it suffers from the same typing issues as C++ templates, i.e. types are only checked after use and monomorphization. Rust’s bounded polymorphism can and is type checked before monomorphization, so you know if there are type errors in general. In Zig (and C++), you only know if there are type errors with a particular type, and only after using a function (template) on that type.

              I think there’s room for an approach that’s more like Zig’s, but with sound polymorphic typing, using dependent type theory. Coq, Agda, and Idris include type classes (aka implicits, bounded polymorphism), but it doesn’t seem like type classes should be necessary in a dependently typed language. In particular, it doesn’t seem like they should provide any increase in expressiveness, though perhaps they reduce verbosity.

              1. 5

                Fwiw, even in Haskell you only really need one extension to obviate type classes in terms of “expressiveness,” namely RankNTypes. See https://www.haskellforall.com/2012/05/scrap-your-type-classes.html

                …though it doesn’t solve the verbosity issues. But I suspect that a language with better support for records might make this a pretty good solution (I have a side project where I am working on such a language).

                1. 2

                  RankNTypes is my top pick for something to add to Haskell. however, for common cases type classes have the advantage of having decidable inference.

                  1. 3

                    Note that in the context of replacing type classes, the usual decidability problem with inference doesn’t really come up, because either way the higher rank types only show up in type definitions. E.g.

                    class Functor f where
                        fmap :: (a -> b) -> f a -> f b
                    

                    vs.

                    data Functor' f = Functor'
                        { fmap' :: forall a b. (a -> b) -> f a -> f b
                        }
                    

                    In the latter case, the problems with inference don’t come up, because the higher-rank quantifier is “hidden” behind the data constructor, so normal HM type inference can look at a call to fmap' and correctly infer that its argument needs to be a Functor' f, which it can treat opaquely, not worrying about the quantifier.

                    You can often make typchecking advanced features like this easier by “cheating” and either hiding them behind a nominal type, or bundling them with a other features as a special case.

                    (N.B. I should say that for just Functor' you only need Rank2Types, which actually is decidable anyway – but I don’t think GHC actually decides it in practice, so it’s kindof a nitpick).

                    Of course this is talking about type inference, whereas type classes are really more about inferring the values, which, as I said, this doesn’t solve the verbosity issues.

                2. 5

                  Type classes aren’t just about verbosity, global coherence is a very important aspect. Any decision on whether to use a type class vs. explicit dictionary passing needs to consider the implications of global coherence. I think Type Classes vs. the World is a must-watch in order to engage in productive discussion about type classes.

                1. 3

                  As seen in this tweet, it can run a full Linux emulator, inside a GPU shader, on an Oculus Quest in VRchat.

                  1. 1

                    Whaaaat? 🤯🤯🤯

                  1. 1

                    Bounced after the second time the page stole my focus away from the text. I barely got past reading the title.

                    1. 3
                      1. 1

                        Indeed, there are a couple of annoying pop-ups - but the content was interesting IMNHO.

                      1. 38

                        FWIW the motivation for this was apparently a comment on a thread about a review of the book “Software Engineering at Google” by Titus Winters, Tom Manshreck, and Hyrum Wright.

                        https://lobste.rs/s/9n7aic/what_i_learned_from_software_engineering

                        I meant to comment on that original thread, because I thought the question was misguided. Well now that I look it’s actually been deleted?

                        Anyway the point is that is that the empirical question isn’t really actionable IMO. You could “answer it” and it still wouldn’t tell you what to do.

                        I think you got this post exactly right – there’s no amount of empiricism that can help you. Software engineering has changed so much in the last 10 or 20 years that you can trivially invalidate any study.

                        Yaron Minsky has a saying that “there’s no pile of sophomores high enough” that is going to prove anything about writing code. (Ironically he says that in advocacy of static typing, which I view as an extremely domain specific question.) Still I agree with his general point.


                        This is not meant to be an insult, but when I see the names Titus Winters and Hyrum Wright, I’m less interested in the work. This is because I worked at Google for over a decade and got lots of refactoring and upgrade changelists/patches from them, as maintainer of various parts of the codebase. I think their work is extremely valuable, but it is fairly particular to Google, and in particular it’s done without domain knowledge. They are doing an extremely good job of doing what they can to improve the codebase without domain knowledge, which is inherent in their jobs, because they’re making company-wide changes.

                        However most working engineers don’t improve code without domain knowledge, and the real improvements to code require such knowledge. You can only nibble at the edges otherwise.

                        @peterbourgon said basically what I was going to say in the original thread – this is advice is generally good in the abstract, but it lacks context.

                        https://lobste.rs/s/9n7aic/what_i_learned_from_software_engineering

                        The way I learned things at Google was to look at what people who “got things done” did. They generally “break the rules” a bit. They know what matters and what doesn’t matter.

                        Jeff Dean and Sanjay Ghewamat indeed write great code and early in my career I exchanged a few CLs with them and learned a lot. I also referenced a blog post by Paul Bucheit in The Simplest Explanation of Oil.

                        For those who don’t know, he was creator of GMail, working on it for 3 years as a side project (and Gmail was amazing back then, faster than desktop MS Outlook, even though it’s rotted now.) He mentions in that post how he prototyped some ads with the aid of some Unix shell. (Again, ads are horrible now, a cancer on the web – back then they were useful and fast. Yes really. It’s hard to convey the difference to someone who wasn’t a web user then.)

                        As a couple other anecdotes, I remember people a worker complaining that Guido van Rossum’s functions were too long. (Actually I somewhat agreed, but he did it in service of getting something done, and it can be fixed later.)

                        I also remember Bram Moolenaar’s (author of Vim) Java readability review, where he basically broke all the rules and got angry at the system (for a brief time I was one of the people who picked the Python readability reviewers, so I’m familiar with this style of engineering. I had to manage some disputes between reviewers and applicants.).

                        So you have to take all these rules with a grain of salt. These people can obviously get things done, and they all do things a little differently. They don’t always write as many tests as you’d ideally like. One of the things I tried to do as the readability reviewer was to push back against dogma and get people to relax a bit. There is value to global consistency, but there’s also value to local domain-specific knowledge. My pushing back was not really successful and Google engineering has gotten more dogmatic and sclerotic over the years. It was not fun to write code there by the time I left (over 5 years ago)


                        So basically I think you have to look at what people build and see how they do it. I would rather read a bunch of stories like “Coders at Work” or “Masterminds of Programming” than read any empirical study.

                        I think there should be a name for this empirical fallacy (or it probably already exists?) Another area where science has roundly failed is nutrition and preventative medicine. Maybe not for the same exact reasons, but the point is that controlled experiments are only one way of obtaining knowledge, and not the best one for many domains. They’re probably better at what Taleb calls “negative knowledge” – i.e. disproving something, which is possible and valuable. Trying to figure out how to act in the world (how to create software) is less possible. All things being equal, more testing is better, but all things aren’t always equal.

                        Oil is probably the most rigorously tested project I’ve ever worked on, but this is because of the nature of the project, and it isn’t right for all projects as a rule. It’s probably not good if you’re trying to launch a video game platform like Stadia, etc.

                        1. 8

                          Anyway the point is that is that the empirical question isn’t really actionable IMO. You could “answer it” and it still wouldn’t tell you what to do.

                          I think you got this post exactly right – there’s no amount of empiricism that can help you.

                          This was my exact reaction when I read the original question motivating Hillel’s post.

                          I even want to take it a step further and say: Outside a specific context, the question doesn’t make sense. You won’t be able to measure it accurately, and even if you could, there would such huge variance depending on other factors across teams where you measured it that your answer wouldn’t help you win any arguments.

                          I think there should be a name for this empirical fallacy

                          It seems especially to afflict the smart and educated. Having absorbed the lessons of science and the benefits of skepticism and self-doubt, you can ask of any claim “But is there a study proving it?”. It’s a powerful debate trick too. But it can often be a category error. The universe of useful knowledge is much larger than the subset that has been (or can be) tested with a random double blind study.

                          1. 5

                            I even want to take it a step further and say: Outside a specific context, the question doesn’t make sense. You won’t be able to measure it accurately, and even if you could, there would such huge variance depending on other factors across teams where you measured it that your answer wouldn’t help you win any arguments.

                            It makes a lot of sense to me in my context, which is trying to convince skeptical managers that they should pay for my consulting services. But it’s intended to be used in conjunction with rhetoric, demos, case studies, testimonials, etc.

                            It seems especially to afflict the smart and educated. Having absorbed the lessons of science and the benefits of skepticism and self-doubt, you can ask of any claim “But is there a study proving it?”. It’s a powerful debate trick too. But it can often be a category error. The universe of useful knowledge is much larger than the subset that has (or can) be tested with a random double blind study.

                            I’d say in principle it’s Scientism, in practice it’s often an intentional sabotaging tactic.

                            1. 1

                              It makes a lot of sense to me in my context, which is trying to convince skeptical managers that they should pay for my consulting services. But it’s intended to be used in conjunction with rhetoric, demos, case studies, testimonials, etc.

                              100%.

                              I should have said: I don’t think it would help you win any arguments with someone knowledgeable. I completely agree that in the real world, where people are making decisions off rough heuristics and politics is everything, this kind of evidence could be persuasive.

                              So a study showing that “catching bugs early saves money” functions here like a white lab coat on a doctor: it makes everyone feel safer. But what’s really happening is that they are just trusting that the doctor knows what he’s doing. Imo the other methods for establishing trust you mentioned – rhetoric, demos, case studies, testimonials, etc. – imprecise as they are, are probably more reliable signals.

                              EDIT: Also, just to be clear, I think the right answer here, the majority of the time, is “well obviously it’s better to catch bugs early than later.”

                              1. 2

                                the majority of the time

                                And in which cases is this false? Is it when the team has lots of senior engineers? Is it when the team controls both the software and the hardware? Is it when OTA updates are trivial? (Here is a knock-on effect: what if OTA updates make this assertion false, but then open up a huge can of security vulnerabilities, which overall negates any benefit that the OTA updates add?) What does a majority here mean? I mean, a majority of 55% means something very different from a majority of 99%.

                                This is the value of empirical software study. Adding precision to assertions (such as understanding that a 55% majority is a bit pathological but a 99% majority certainly isn’t.) Diving into data and being able to understand and explore trends is also another benefit. Humans are motivated to categorize their experiences around questions they wish to answer but it’s much harder to answer questions that the human hasn’t posed yet. What if it turns out that catching bugs early or late is pretty much immaterial where the real defect rate is simply a function of experience and seniority?

                                1. 1

                                  This is the value of empirical software study. I think empirical software study is great, and has tons of benefits. I just don’t think you can answer all questions of interest with it. The bugs question we’re discussing is one of those.

                                  And in which cases is this false? Is it when the team has lots of senior engineers? Is it when the team controls both the software and the hardware? Is it when OTA updates are trivial? (Here is a knock-on effect: what if OTA updates make this assertion false, but then open up a huge can of security vulnerabilities, which overall negates any benefit that the OTA updates add?)

                                  I mean, this is my point. There are too many factors to consider. I could add 50 more points to your bullet list.

                                  What does a majority here mean?

                                  Something like: “I find it almost impossible to think of examples from my personal experience, but understand the limits of my experience, and can imagine situations where it’s not true.” I think if it is true, it would often indicate a dysfunctional code base where validating changes out of production (via tests or other means) was incredibly expensive.

                                  What if it turns out that catching bugs early or late is pretty much immaterial where the real defect rate is simply a function of experience and seniority?

                                  One of my points is that there is no “turns out”. If you prove it one place, it won’t translate to another. It’s hard even to imagine an experimental design whose results I would give much weight to. All I can offer is my opinion that this strikes me as highly unlikely for most businesses.

                                  1. 4

                                    Why is software engineering such an outlier when we’ve been able to measure so many other things? We can measure vaccine efficacy and health outcomes (among disparate populations with different genetics, diets, culture, and life experiences), we can measure minerals in soil, we can analyze diets, heat transfer, we can even study government policy, elections, and even personality 1 though it’s messy. What makes software engineering so much more complex and context dependent than even a person’s personality?

                                    The fallacy I see here is simply that software engineers see this massive complexity in software engineering because they are software experts and believe that other fields are simpler because software engineers are not experts in those fields. Every field has huge amounts of complexity, but what gives us confidence that software engineering is so much more complex than other fields?

                                    1. 3

                                      Why is software engineering such an outlier when we’ve been able to measure so many other things?

                                      You can measure some things, just not all. Remember the point of discussion here is: Can you empirically investigate the claim “Finding bugs earlier saves overall time and money”? My position is basically: “This is an ill-defined question to ask at a general level.”

                                      We can measure vaccine efficacy and health outcomes (among disparate populations with different genetics, diets, culture, and life experiences)

                                      Yes.

                                      we can measure minerals in soil, we can analyze diets, heat transfer,

                                      Yes.

                                      we can even study government policy

                                      In some way yes, in some ways no. This is a complex situation with tons of confounds, and also a place where policy outcomes in some places won’t translate to other places. This is probably a good analog for what makes the question at hand difficult.

                                      and even personality

                                      Again, in some ways yes, in some ways no. With the big 5, you’re using the power of statistical aggregation to cut through things we can’t answer. Of which there are many. The empirical literature on “code review being generally helpful” seems to have a similar force. You can take disparate measures of quality, disparate studies, and aggregate to arrive at relatively reliable conclusions. It helps that we have an obvious, common sense causal theory that makes it plausible.

                                      What makes software engineering so much more complex and context dependent than even a person’s personality?

                                      I don’t think it is.

                                      Every field has huge amounts of complexity, but what gives us confidence that software engineering is so much more complex than other fields?

                                      I don’t think it is, and this is not where my argument is coming from. There are many questions in other fields equally unsuited to empirical investigation as: “Does finding bugs earlier save time and money?”

                                      1. 2

                                        In some way yes, in some ways no. This is a complex situation with tons of confounds, and also a place where policy outcomes in some places won’t translate to other places. This is probably a good analog for what makes the question at hand difficult.

                                        That hasn’t stopped anyone from performing the analysis and using these analyses to implement policy. That analysis of this data is imperfect is beside the point; it still provides some amount of positive value. Software is in the data dark ages in comparison to government policy; what data driven decision has been made among software engineer teams? I don’t think we even understand whether Waterfall or Agile reduces defect rates or time to ship compared to the other.

                                        With the big 5, you’re using the power of statistical aggregation to cut through things we can’t answer. Of which there are many. The empirical literature on “code review being generally helpful” seems to have a similar force. You can take disparate measures of quality, disparate studies, and aggregate to arrive at relatively reliable conclusions. It helps that we have an obvious, common sense causal theory that makes it plausible.

                                        What’s stopping us from doing this with software engineering? Is it the lack of a causal theory? There are techniques to try to glean causality from statistical models. Is this not in line with your definition of “empirically”?

                                        1. 5

                                          That hasn’t stopped anyone from performing the analysis and using these analyses to implement policy. That analysis of this data is imperfect is beside the point; it still provides some amount of positive value.

                                          It’s not clear to me at all that, as a whole, “empirically driven” policy has had positive value? You can point to successful cases and disasters alike. I think in practice the “science” here is at least as often used as a veneer to push through an agenda as it is to implement objectively more effective policy. Just as software methodologies are.

                                          Is it the lack of a causal theory?

                                          I was saying there is a causal theory for why code review is effective.

                                          What’s stopping us from doing this with software engineering?

                                          Again, some parts of it can be studied empirically, and should be. I’m happy to see advances there. But I don’t see the whole thing being tamed by science. The high-order bits in most situations are politics and other human stuff. You mentioned it being young… but here’s an analogy that might help with where I’m coming from. Teaching writing, especially creative writing. It’s equally ad-hoc and unscientific, despite being old. MFA programs use different methodologies and writers subscribe to different philosophies. There is some broad consensus about general things that mostly work and that most people do (workshops), but even within that there’s a lot of variation. And great books are written by people with wildly different approaches. There are a some nice efforts to leverage empiricism like Steven Pinker’s book and even software like https://hemingwayapp.com/, but systematization can only go so far.

                                      2. 2

                                        We can measure vaccine efficacy and health outcomes (among disparate populations with different genetics, diets, culture, and life experiences)

                                        Good vaccine studies are pretty expensive from what I know, but they have statistical power for that reason.

                                        Health studies are all over the map. The “pile of college sophomores” problem very much applies there as well. There are tons of studies done on Caucasians that simply don’t apply in the same way to Asians or Africans, yet some doctors use that knowledge to treat patients.

                                        Good doctors will use local knowledge and rules of thumb, and they don’t believe every published study they see. That would honestly be impossible, as lots of them are in direct contradiction to each other. (Contradiction is a problem that science shares with apprenticeship from experts; for example IIRC we don’t even know if a high fat diet causes heart disease, which was accepted wisdom for a long time.)

                                        https://www.nytimes.com/2016/09/13/well/eat/how-the-sugar-industry-shifted-blame-to-fat.html

                                        I would recommend reading some books by Nassim Taleb if you want to understand the limits of acquiring knowledge through measurement and statistics (Black Swan, Antifragile, etc.). Here is one comment I made about them recently: https://news.ycombinator.com/item?id=27213384

                                        Key point: acting in the world, i.e. decision making under risk, are fundamentally different than scientific knowledge. Tinkering and experimentation are what drive real changes in the world, not planning by academics. He calls the latter “the Soviet-Harvard school”.

                                        The books are not well organized, but he hammers home the difference between acting in the world and knowledge over and over in many different ways. If you have to have scientific knowledge before acting, you will be extremely limited in what you can do. You will probably lose all your money in the markets too :)


                                        Update: after Googling the term I found in my notes, I’d say “Soviet-Harvard delusion” captures the crux of the argument here. One short definition is the the (unscientific) overestimation of the reach of scientific knowledge.

                                        https://www.grahammann.net/book-notes/antifragile-nassim-nicholas-taleb

                                        https://medium.com/the-many/the-right-way-to-be-wrong-bc1199dbc667

                                        https://taylorpearson.me/antifragile-book-notes/

                                        1. 2

                                          This sounds like empiricism. Not in the sense of “we can only know what we can measure” but in the sense of “I can only know what I can experience”. The Royal Society’s motto is “take nobody’s word for it”.

                                          Tinkering and experimentation are what drive real changes in the world, not planning by academics.

                                          I 100% agree but it’s not the whole picture. You need theory to compress and see further. It’s the back and forth between theory and experimentation that drives knowledge. Tinkering alone often ossifies into ritual. In programming, this has already happened.

                                          1. 1

                                            I agree about the back and forth, of course.

                                            I wouldn’t agree programming has ossified into ritual. Certainly it has at Google, which has a rigid coding style, toolchain, and set of languages – and it’s probably worse at other large companies.

                                            But I see lots of people on this site doing different things, e.g. running OpenBSD and weird hardware, weird programming languages, etc. There are also tons of smaller newer companies using different languages. Lots of enthusiasm around Rust, Zig, etc. and a notable amount of production use.

                                            1. 1

                                              My bad, I didn’t mean all programming has become ritual. I meant that we’ve seen instances of it.

                                          2. 1

                                            Good vaccine studies are pretty expensive from what I know, but they have statistical power for that reason.

                                            Oh sure, I’m not saying this will be cheap. In fact the price of collecting good data is what I feel makes this research so difficult.

                                            Health studies are all over the map. The “pile of college sophomores” problem very much applies there as well. There are tons of studies done on Caucasians that simply don’t apply in the same way to Asians or Africans, yet some doctors use that knowledge to treat patients.

                                            We’ve developed techniques to deal with these issues, though of course, you can’t draw a conclusion with extremely low sample sizes. One technique used frequently to compensate for low statistical power studies in meta studies is called Post-Stratification.

                                            Good doctors will use local knowledge and rules of thumb, and they don’t believe every published study they see. That would honestly be impossible, as lots of them are in direct contradiction to each other. (Contradiction is a problem that science shares with apprenticeship from experts; for example IIRC we don’t even know if a high fat diet causes heart disease, which was accepted wisdom for a long time.)

                                            I think medicine is a good example of empiricism done right. Sure, we can look at modern failures of medicine and nutrition and use these learnings to do better, but medicine is significantly more empirical than software. I still maintain that if we can systematize our understanding of the human body and medicine that we can do the same for software, though like a soft science, definitive answers may stay elusive. Much work over decades went into the medical sciences to define what it even means to have an illness, to feel pain, to see recovery, or to combat an illness.

                                            I would recommend reading some books by Nassim Taleb if you want to understand the limits of acquiring knowledge through measurement and statistics (Black Swan, Antifragile, etc.). Here is one comment I made about them recently: https://news.ycombinator.com/item?id=27213384

                                            Key point: acting in the world, i.e. decision making under risk, are fundamentally different than scientific knowledge. Tinkering and experimentation are what drive real changes in the world, not planning by academics. He calls the latter “the Soviet-Harvard school”.

                                            I’m very familiar with Taleb’s Antifragile thesis and the “Soviet-Harvard delusion”. As someone well versed in statistics, these are theses that are both pedestrian (Antifragile itself being a pop-science look into a field of study called Extreme Value Theory) and old (Maximum Likelihood approaches to decision theory are susceptible to extreme/tail events which is why in recent years Bayesian and Bayesian Causal analyses have become more popular. Pearson was aware of this weakness and explored other branches of statistics such as Fiducial Inference). (Also I don’t mean this as criticism toward you, though it’s hard to make this tone come across over text. I apologize if it felt offensive, I merely wish to draw your eyes to more recent developments.)

                                            To draw the discussion to a close, I’ll try to summarize my position a bit. I don’t think software empiricism will answer all the questions, nor will we get to a point where we can rigorously determine that some function f exists that can model our preferences. However I do think software empiricism together with standardization can offer us a way to confidently produce low-risk, low-defect software. I think modern statistical advances have offered us ways to understand more than statistical approaches in the ‘70s and that we can use many of the newer techniques used in the social and medical sciences (e.g. Bayesian methods) to prove results. I don’t think that, even if we start a concerted approach today to do this, that our understanding will get there in a matter of a few years. To do that would be to undo decades of software practitioners creating systemic analyses from their own experiences and to create a culture shift away from the individual as artisan to a culture of standardization of both communication of results (what is a bug? how does it affect my code? how long did it take to find? how long did it take to resolve? etc) and of team conditions (our team has n engineers, our engineers have x years of experience, etc) that we just don’t have now. I have hope that eventually we will begin to both standardize and understand our industry better but in the near-term this will be difficult.

                                2. 5

                                  Here’s a published paper that purposefully illustrates the point you’re trying to make: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC300808/. It’s an entertaining read.

                                  1. 1

                                    Yup I remember that from debates on whether to wear masks or not! :) It’s a nice pithy illustration of the problem.

                                  2. 2

                                    Actually I found a (condescending but funny/memorable) name for the fallacy – the “Soviet-Harvard delusion” :)

                                    An (unscientific) overestimation of the reach of scientific knowledge.

                                    I found it in my personal wiki, in 2012 notes on the book Antifragile.

                                    Original comment: https://lobste.rs/s/v4unx3/i_ing_hate_science#c_nrdasq

                                  3. 3

                                    I’m reading a book right now about 17th century science. The author has some stuff to say about Bacon and Empiricism but I’ll borrow an anecdote from the book. Boyle did an experiment where he grew a pumpkin and measured the dirt before and after. The weight of the dirt hadn’t changed much. The only other ingredient that had been added was water. It was obvious that the pumpkin must be made of only water.

                                    This idea that measurement and observation drive knowledge is Bacon’s legacy. Even in Bacon’s own lifetime, it’s not how science unfolded.

                                    1. 2

                                      Fun fact: Bacon is often considered the modern founder of the idea that knowledge can be used to create human-directed progress. Before him, while scholars and astronomers used to often study things and invent things, most cultures still viewed life and nature as a generally haphazard process. As with most things in history the reality involves more than just Bacon, and there most-certainly were non-Westerners who had similar ideas, but Bacon still figures prominently in the picture.

                                      1. 1

                                        Hm interesting anecdote that I didn’t know about (I looked it up). Although I’d say that’s more an error of reasoning within science? I realized what I was getting at could be called the Soviet-Harvard delusion, which is overstating the reach of scientific knowledge (no insult intended, but it is a funny and memorable name): https://lobste.rs/s/v4unx3/i_ing_hate_science#c_nrdasq

                                        1. 1

                                          To be fair, the vast majority of the mass of the pumpkin is water. So the inference was correct to first order. The second-order correction of “and carbon from the air”, of course, requires being much more careful in the inference step.

                                        2. 2

                                          So basically I think you have to look at what people build and see how they do it. I would rather read a bunch of stories like “Coders at Work” or “Masterminds of Programming” than read any empirical study.

                                          Perhaps, but this is already what happens, and I think it’s about time we in the profession raise our standards, both of pedagogy and of practice. Right now you can take a casual search on the Web and you can find respected talking-heads talk about how their philosophy is correct, despite being in direct contrast to another person’s philosophy. This behavior is reinforced by the culture wars of our times, of course, but there’s still much more aimless discourse than there is consistency in results. If we want to start taking steps to improve our practice, I think it’s important to understand what we’re doing right and more importantly what we’re doing wrong. I’m more interested here in negative results than positive results. I want to know where as a discipline software engineering is going wrong. There’s also a lot at stake here purely monetarily; corporations often embrace a technology methodology and pay for PR and marketing about their methodology to both bolster their reputations and to try to attract engineers.

                                          think there should be a name for this empirical fallacy (or it probably already exists?) Another area where science has roundly failed is nutrition and preventative medicine.

                                          I don’t think we’re even at the point in our empirical understanding of software engineering where we can make this fallacy. What do we even definitively understand about our field? I’d argue that psychology and sociology have stronger well-known results than what we have in software engineering even though those are very obviously soft sciences. I also think software engineers are motivated to think the problem is complex and impossible to be empirical for the same reason that anyone holds their work in high esteem; we believe our work is complicated and requires highly contextual expertise to understand. However if psychology and sociology can make empirical progress in their fields, I think software engineers most definitely can.

                                          1. 2

                                            Do you have an example in mind of the direct contradiction? I don’t see much of a problem if different experts have different opinions. That just means they were building different things and different strategies apply.

                                            Again I say it’s good to “look at what people build” and see if it applies to your situation; not blindly follow advice from authorities (e.g. some study “proved” this, or some guy from Google who may or may not have built things said this was good; therefore it must be good).

                                            I don’t find a huge amount of divergence in the opinions of people who actually build stuff, vs. talking heads. If you look at what says John Carmack says about software engineering, it’s generally pretty level-headed, and he explains it well. It’s not going to differ that much from what Jeff Dean says. If you look at their C++ code, there are even similarities, despite drastically different domains.

                                            Again the fallacy is that there’s a single “correct” – it depends on the domain; a little diversity is a good thing.

                                            1. 4

                                              Do you have an example in mind of the direct contradiction? I don’t see much of a problem if different experts have different opinions. That just means they were building different things and different strategies apply.

                                              Here’s two fun ones I like to contrast: The Unreasonable Effectiveness of Dynamic Typing for Practical Programs (Vimeo) and The advantages of static typing, simply stated. Two separate authors that came to different conclusions from similar evidence. While yes their lived experience is undoubtedly different, these are folks who are espousing (mostly, not completely) contradictory viewpoints.

                                              I don’t find a huge amount of divergence in the opinions of people who actually build stuff, vs. talking heads. If you look at what says John Carmack says about software engineering, it’s generally pretty level-headed, and he explains it well. It’s not going to differ that much from what Jeff Dean says. If you look at their C++ code, there are even similarities, despite drastically different domains.

                                              Who builds things though? Several people build things. While we hear about John Carmack and Jeff Dean, there are folks plugging away at the Linux kernel, on io_uring, on capability object systems, and all sorts of things that many of us will never be aware of. As an example, Sanjay Ghewamat is someone who I wasn’t familiar with until you talked about him. I’ve also interacted with folks in my career who I presume you’ve never interacted with and yet have been an invaluable source of learnings for my own code. Moreover these experience reports are biased by their reputations; I mean of course we’re more likely to listen to John Carmack than some Vijay Foo (not a real person, as far as I’m aware) because he’s known for his work at iD, even if this Vijay Foo may end up having as many or more actionable insights than John Carmack. Overcoming reputation bias and lack of information about “builders” is another side effect I see of empirical research. Aggregating learnings across individuals can help surface lessons that otherwise would have been lost due to structural issues of acclaim and money.

                                              Again the fallacy is that there’s a single “correct” – it depends on the domain; a little diversity is a good thing.

                                              This seems to be a sentiment I’ve read elsewhere, so I want to emphasize: I don’t think there’s anything wrong with diversity and I don’t think Emprical Software Engineering does anything to diversity. Creating complicated probabilistic models of spaces necessarily involve many factors. We can create a probability space which has all of the features we care about. Just condition against your “domain” (e.g. kernel work, distributed systems, etc) and slot your result into that domain. I don’t doubt that a truly descriptive probability space will be very high dimensional here but I’m confident we have the analytical and computational power to perform this work nonetheless.

                                              The real challenge I suspect will be to gather the data. FOSS developers are time and money strapped as it is, and excluding some exceptional cases such as curl’s codebase statistics, they’re rarely going to have the time to take the detailed notes it would take to drive this research forward. Corporations which develop proprietary software have almost no incentive to release this data to the general public given how much it could expose about their internal organizational structure and coding practices, so rather than open themselves up to scrutiny they keep the data internal if they measure it at all. Combating this will be a tough problem.

                                              1. 2

                                                Yeah I don’t see any conflict there (and I’ve watched the first one before). I use both static and dynamic languages and there are advantages and disadvantages to each. I think any programmer should comfortable using both styles.

                                                I think that the notion that a study is going to change anyone’s mind is silly, like “I am very productive in statically typed languages. But a study said that they are not more productive; therefore I will switch to dynamically typed”. That is very silly.

                                                It’s also not a question that’s ever actionable in reality. Nobody says “Should I use a static or dynamic language for this project?” More likely you are working on existing codebase, OR you have a choice between say Python and Go. The difference between Python and Go would be a more interesting and accurate study, not static vs. dynamic. But you can’t do an “all pairs” comparison via scientific studies.

                                                If there WERE a study definitely proving that say dynamic languages are “better” (whatever that means), and you chose Python over Go for that reason, that would be a huge mistake. It’s just not enough evidence; the languages are different for other reasons.

                                                I think there is value to scientific studies on software engineering, but I think the field just moves very fast, and if you wait for science, you’ll be missing out on a lot of stuff. I try things based on what people who get things done do (e.g. OCaml), and incorporate it into my own work, and that seems like a good way of obtaining knowledge.

                                                Likewise, I think “Is catching bugs earlier less expensive” is a pretty bad question. A better scientific question might be “is unit testing in Python more effective than integration testing Python with shell” or something like that. Even that’s sort of silly because the answer is “both”.

                                                But my point is that these vague and general questions simply leave out a lot of subtlety of any particular situation, and can’t be answered in any useful way.

                                                1. 2

                                                  I think that the notion that a study is going to change anyone’s mind is silly, like “I am very productive in statically typed languages. But a study said that they are not more productive; therefore I will switch to dynamically typed”. That is very silly.

                                                  While the example of static and dynamic typing is probably overbroad to be meaningless, I don’t actually think this would be true. It’s a bit like saying “Well I believe that Python is the best language and even though research shows that Go has propertries <x, y, and z> that are beneficial to my problem domain, well I’m going to ignore them and put a huge prior on my past experience.” It’s the state of the art right now; trust your gut and the guts of those you respect, not the other guts. If we can’t progress from here I would indeed be sad.

                                                  It’s also not a question that’s ever actionable in reality. Nobody says “Should I use a static or dynamic language for this project?” More likely you are working on existing codebase, OR you have a choice between say Python and Go. The difference between Python and Go would be a more interesting and accurate study, not static vs. dynamic. But you can’t do an “all pairs” comparison via scientific studies.

                                                  Sure, as you say, static vs dynamic languages isn’t very actionable but Python vs Go would be. And if I’m starting a new codebase, a new project, or a new company, it might be meaningful to have research that shows that, say, Python has a higher defect rate but an overall lower mean time to resolution of these defects. Prior experience with Go may trump benefits that Python has (in this synthetic example) if project time horizons are short, but if time horizons are long Go (again in the synthetic example) might look better. I think this sort of comparative analysis in defect rates, mean time to resolution, defect severity, and other attributes can be very useful.

                                                  Personally, I’m not satisfied by the state of the art of looking at builders. I think the industry really needs a more rigorous look at its assumptions and even if we never truly systematize and Fordify the field (which fwiw I don’t think is possible), I certainly think there’s a lot of progress for us to make yet and many pedestrian questions that we can answer that have no answers yet.

                                                  1. 2

                                                    Sure, as you say, static vs dynamic languages isn’t very actionable but Python vs Go would be. And if I’m starting a new codebase, a new project, or a new company, it might be meaningful to have research that shows that, say, Python has a higher defect rate but an overall lower mean time to resolution of these defects.

                                                    Python vs Go defect rates also seem to me to be far too general for an empirical study to produce actionable data.

                                                    How do you quantify a “defect rate” in a way that’s relevant to my problem, for example? There are a ton of confounds: genre of software, timescale of development, size of team, composition of team, goals of the project, etc. How do I know that some empirical study comparing defect rates of Python vs. Go in, I dunno, the giant Google monorepo, is applicable to my context? Let’s say I’m trying to pick a language to write some AI research software, which will have a 2-person team, no monorepo or formalized code-review processes, a target 1-year timeframe to completion, and a primary metric of producing figures for a paper. Why would I expect the Google study to produce valid data for my decision-making?

                                                  2. 1

                                                    Nobody says “Should I use a static or dynamic language for this project?”

                                                    Somebody does. Somebody writes the first code on a new project and chose the language. Somebody sets the corporate policy on permissible languages. Would be amazing if even a tiny input to these choices were real instead of just perceived popularity and personal familiarity.

                                            2. 2

                                              I meant to comment on that original thread, because I thought the question was misguided. Well now that I look it’s actually been deleted?

                                              Too many downvotes this month. ¯\_(ツ)_/¯

                                              1. 1

                                                This situation is not ideal :(

                                            1. 20

                                              Confusing naming. There’s already a rust project called warp, but it’s a web framework.

                                              1. 14

                                                Funny story: Warp (the terminal emulator) is actually listing warp (the web framework) as a dependency: https://github.com/warpdotdev/warp#open-source-dependencies

                                                1. 3

                                                  What does God need with a Starship^W^W^W^W^Wa terminal need a web framework for?

                                                  1. 3

                                                    How else are you going to make a GUI? Surely you’re not suggesting something as silly as, say, using a GUI toolkit, the way they did in the stone age!

                                                    1. 1

                                                      I’d love to know as well but as of now Warp (the terminal emulator) is closed beta.

                                                      1. 1

                                                        @fasterthanlime has a nice video on why would you ever want to use a web browser for a random program’s GUI.

                                                        1. 1

                                                          I appreciate the effort, but you’ve highlighted another generational difference: using a video to express a point that could hopefully, succinctly be made in the space of a few paragraphs, without requiring people to look at your face for ten minutes :)

                                                      2. 2

                                                        I assume that’s for the shared editing feature.

                                                        1. 10

                                                          Not to mention OS/2 Warp!

                                                          1. 5

                                                            Not to mention the good ol’ Warp drive

                                                            1. 10

                                                              I’m worried I’ll confuse it with the original warp lines controlled by the original punch cards.

                                                        2. 2

                                                          And OS/2 Warp.

                                                        1. 10

                                                          I hate how Go’s and Deno’s approach to dependencies of just pasting the URL to the web front-end of the git hosting service used by the library seems to be taking off. I think it’s extremely useful to maintain a distinction between the logical identifier for a library and the physical host you talk to over the network to download it.

                                                          1. 4

                                                            I like the idea of using URL fragments for importing. There’s a beautiful simplicity and universality to it. You don’t need a separate distributed package system—any remote VCS or file system protocol can work. However, it needs to be combined with import maps, so that you can hoist the location and version info out of the code, when desired. And there should be support/tools for explicitly downloading dependencies to a local cache, and for enforcing offline running. This is the approach I plan to take for Dawn.

                                                            1. 2

                                                              This strikes me as problematic as well. LibMan in .NET is the same way. npm audit may be flawed, but npm itself at least provides a mechanism for evaluating common dependency chains for vulnerabilities.

                                                              Ryan Dahl and and Kit Kelly drew the opposite conclusion in their work on Deno. They believe that a central authority for package identity creates a false sense of security and that washing their hands of package identity altogether is the solution. Deno does at least have a registry for third party modules of sorts, but installation is still URL based.

                                                              1. 1

                                                                Think of it like this. The URI is just a slightly longer-than-usual package name. As a handy side-effect, you can also fetch the actual code from it. There’s nothing stopping you from having your build tools fetch the same package from a different host (say, an internal package cache) using that URI as the lookup key.

                                                                The big benefit is that instead of having to rely on a single source of truth like the npm repository, the dependency ecosystem is distributed by default. Instead of needing to jump through hoops to set up a private package repository it’s just… the git server you already have. Easy.

                                                                1. 4

                                                                  The problem is that it’s precisely not just a slightly longer than usual package name. It’s a package name which refers to working web infrastructure. If you ever decide to move your code to another git host, every single source file has to be updated.

                                                                  I have nothing against the idea of using VCS for distribution (or, well, I do have concerns there but it’s not the main point). But there has to be a mapping from logical package name to physical package location. I want my source code to refer to a logical package, and then some file (package.toml?) to map the logical names to git URIs or whatever.

                                                                  I don’t want to have to change every single source file in a project to use a drop-in replacement library (as happened with the satori/go.uuid -> gofrs/uuid thing in the Go world), or to use a local clone of the library, or to move. Library to another git host.

                                                                  1. 1

                                                                    It’s a package name which refers to working web infrastructure.

                                                                    But that’s true about more classical packaging systems, like Cargo. If crates.io goes down, all dependency specifications become a pumpkin.

                                                                    It seems to me that deno’s scheme allows to have roughly the same semantics as cargo. You don’t have to point urls directly at repos, you can point them at some kind of immutable register. If you want to, I think you can restrict, transitively, all the deps to go only via such a registry. So, deno allows, but does not prescribe, a specific registry.

                                                                    To me, it seems not a technical question of what is possible, but rather a social question of what such a distributed ecosystem would look like in practice.

                                                                    1. 1

                                                                      If you want to complain that rust is too dependent on crates.io then I agree of course. But nothing about a Rust package name indicates anything about crates.io; you’re not importing URLs, you’re importing modules. Those modules can be on your own filesystem, or you can let Cargo download them from crates.io for you.

                                                                      If your import statement is the URL to “some kind of immutable register” then your source code still contains URLs to working web infrastructure. It literally doesn’t fix anything.

                                                                    2. 0

                                                                      Well, Go has hard-coded mappings for common code hosting services, but as a package author you can map logical module name to repository location using HTML meta tags. Regarding forks, you can’t keep the same logical name without potentially breaking backwards compatibility.

                                                                      1. 3

                                                                        The HTML meta tag solution is so ridiculous. It doesn’t actually fix the issue. There’s a hard dependency in the source code on actual working web infrastructure, be it a web front-end for your actual repo or an HTML page with a redirect. It solves absolutely none of the issues I have with Go’s module system.

                                                                1. 1

                                                                  Looks interesting. JavaScript isn’t a particularly great language for this, but I definitely love the general idea. I want/hope to build something like this in Dawn, once it’s a bit further along. Eventually, I want to be able to compile the source code down to FPGA or DSP code, though that’s even farther off.

                                                                  1. 1

                                                                    If by “JavaScript isn’t a particularly great language for this” you mean it isn’t sufficiently performant, the author addresses this on the website:

                                                                    Under the hood, Elementary is composed of a wide array of highly optimized, native audio processing blocks. On top, Elementary is built on Node.js, a technology proven across multiple domains for high performance applications.

                                                                    I take this to mean that the processing blocks are compiled binaries originally written in some other language like C. JavaScript is just used for the API. It’s a pity the Github repo doesn’t include any actual source code, though.

                                                                    1. 1

                                                                      No, I assumed the actual DSP code was in C++. I just meant that JavaScript isn’t the best language for writing pure functional code. Too much syntax, and a bit awkward.

                                                                  1. 3

                                                                    People joke about how we’re now going to need 128-bit integers…well, anyone who works with IPv6 addresses or UUIDs loves 128-bit integers.

                                                                    1. 3

                                                                      Yeah but IPv6 addresses and UUIDs are really opaque blobs of bits. You’re not doing arithmetic on them. Bitmasking for IPv6, maybe.

                                                                      1. 2

                                                                        You can do 64-bit multiplication without UB overflow if you have 128-bit integers. Then you can more easily and without UB check for overflow.

                                                                        1. 4

                                                                          This seems like a strange way to check for overflow. Frankly, C should just have built-in checked arithmetic intrinsics. Rust got this right.

                                                                          1. 1

                                                                            I agree.

                                                                            1. 1

                                                                              gcc does have built-in checked arithmetic. Standard C doesn’t have add-with-overflow, but gcc alone is more portable than rustc.

                                                                              1. 2

                                                                                Yeah, gcc and clang have intrinsics.

                                                                                Fair point about Rust.

                                                                                Just checked, and Zig also has intrinsics for this, so at least newer languages are learning from the lack of these in the C standard.

                                                                                1. 1

                                                                                  Swift traps on overflow by default; if you want values to wrap, you use &+ &- etc operators.

                                                                        2. 1

                                                                          32-bit GCC doesn’t provide 128-bit integers, which complicates these things indeed.

                                                                          1. 1

                                                                            however, on a RV128I you just need a long ;)

                                                                        1. 29

                                                                          Well written, this were exactly my thoughs when i read this. We don’t need faster programmers. We need more thorough programmers.

                                                                          Software could be so much better (and faster) if the market would value quality software higher than “more features”

                                                                          1. 9

                                                                            We don’t need faster programmers. We need more thorough programmers.

                                                                            That’s just a “kids these days…” complaint. Programmers have always been fast and sloppy and bugs get ironed out over time. We don’t need more thorough programmers, like we don’t need more sturdy furniture. Having IKEA furniture is amazing.

                                                                            1. 12

                                                                              Source code is a blueprint. IKEA spends a lot of time getting their blueprints right. Imagine if every IKEA furniture set had several blueprint bugs in it that you had to work around.

                                                                              1. 5

                                                                                We’re already close though. We have mature operating systems, language runtimes, and frameworks. Going forward I see the same thing happening to programming that happens to carpentry or cars now. A small set of engineers develop a design (blueprint) and come up with lists of materials. From there, technicians guide the creation of the actual design. Repairs are performed by contractors or other field workers. Likewise, a select few will work on the design for frameworks, operating systems, security, IPC, language runtimes, important libraries, and other core aspects of software. From there we’ll have implementors gluing libraries together for common tasks. Then we’ll have sysadmins or field programmers that actually take these solutions and customize/maintain them for use.

                                                                                1. 7

                                                                                  I think we’re already completely there in some cases. You don’t need to hire any technical people at all if you want to set up a fully functioning online store for your small business. Back in the day, you would have needed a dev team and your own sysadmins, no other options.

                                                                                  1. 1

                                                                                    I see the same thing happening to programming that happens to carpentry or cars now. […] From there we’ll have implementors gluing libraries together for common tasks.

                                                                                    Wasn’t this the spiel from the 4GL advocates in the 80s?

                                                                                    1. 2

                                                                                      Wasn’t this the spiel from the 4GL advocates in the 80s?

                                                                                      No, it was the spiel of OOP/OOAD advocates in the 80s. Think “software IC’.

                                                                                2. 1

                                                                                  Maybe, maybe not. I just figured that if i work more thoroughly, i get to my goals quicker, as i have less work to do and rewrite my code less often. Skipping error handling might seem appealing at frist, as i reach my goal earlier, but the price for this is that either me or someone else has to fix that sooner or later.

                                                                                  Also mistakes or just imperformance in software nowadays have huge impact due to being so widespread.

                                                                                  One nice example i like to make:

                                                                                  Wikimedia foundation got 21.035.450.914 page views last month [0]. So if we optimize that web server by a single instruction per page view, assuming the CPU runs at 4 GHz, with a perfect optimized code of 1.2 instructions per cycle, we can shave off 4.382 seconds per month. Assuming wikipedia runs average servers [1], this means we shave of 1.034 watt hour of energy per month. With a energy price of 13.24 euro cent [2], this means a single cycle costs us roughly 0.013 euro cent.

                                                                                  Now imagine you can make the software run 1% faster, which are 48.000.000 instructions, this is suddenly 6240€ per month savings. For 1% overall speedup!

                                                                                  High-quality software is not only pleasant for the user. It also saves the planet by wasting less energy and goes easy on your wallet.

                                                                                  So maybe

                                                                                  Programmers have always been fast and sloppy and bugs get ironed out over time. We don’t need more thorough programmers,

                                                                                  this should change. For the greater good of everyone

                                                                                  [0] https://stats.wikimedia.org/#/all-projects/reading/total-page-views/normal|table|2-year|~total|monthly
                                                                                  [1] https://www.zdnet.com/article/toolkit-calculate-datacenter-server-power-usage/
                                                                                  [2] https://www.statista.com/statistics/1046605/industry-electricity-prices-european-union-country/

                                                                                3. 9

                                                                                  Software could be so much better (and faster) if the market would value quality software higher than “more features”

                                                                                  The problem is there just aren’t enough people for that. That’s basically been the problem for the last 30+ years. It’s actually better than it used to be; there was a time not so long ago where everyone who could sum up numbers in Excel was a programmer and anyone who knew how to defrag their C:\ drive was a sysadmin.

                                                                                  Yesterday I wanted to generate a random string in JavaScript; I knew Math.random() isn’t truly random and wanted to know if there’s something better out there. The Stack Overflow question is dominated by Math.random() in more variations that you’d think possible (not all equally good I might add). This makes sense because for a long time this was the only way to get any kind of randomness in client-side JS. It also mentions the newer window.crypto API in some answers which is what I ended up using.

                                                                                  I can make that judgment call, but I’m not an ML algorithm. And while on Stack Overflow I can add context, caveats, involved trade-offs, offer different solutions, etc. with an “autocomplete code snippet” that’s a lot more limited. And especially for novice less experienced programmer you wouldn’t necessarily know a good snippet from a bad one: “it seems to work”, and without the context a Stack Overflow answer has you just don’t know. Stack Overflow (and related sites) are more than just “gimme teh codez”; they’re also teaching moments.

                                                                                  Ideally, there would be some senior programmer to correct them. In reality, due the limited number of people, this often doesn’t happen.

                                                                                  We’ll have to wait and see how well it turns out in practice, but I’m worried for an even greater proliferation of programmers who can’t really program but instead just manage to clobber something together by trail-and-error. Guess we’ll have to suffer through even more ridiculous interviews to separate the wheat from the chaff in the future…

                                                                                  1. 2

                                                                                    We’ll have to wait and see how well it turns out in practice, but I’m worried for an even greater proliferation of programmers who can’t really program

                                                                                    I don’t see this as a problem. More mediocre programmers available doesn’t lower the bar for places that need skilled programmers. Lobste.rs commenters often talk of the death of the open web for example. If this makes programming more accessible, isn’t that better for the open web?

                                                                                  2. 6

                                                                                    We don’t need faster programmers. We need more thorough programmers.

                                                                                    Maybe we need more than programmers and should aim to deserve the title of software engineers. Writing code should be the equivalent of nailing wood, whether you use a hammer or AI assisted nailgun shouldn’t matter much if you are building a structure that can’t hold the weight it is designed for or can’t deal with a single plank that is going to break or rot.

                                                                                    1. 6

                                                                                      We don’t need faster programmers. We need more thorough programmers.

                                                                                      Not for everything, but given we spend so much time debugging and fixing things, thoroughness is usually faster.

                                                                                      1. 6

                                                                                        Slow is smooth and smooth is fast.

                                                                                    1. 17

                                                                                      I think this mini-renaissance of Ada awareness is in part due to the “mass discovery” of safe systems programming led by Rust.

                                                                                      I like everything I’ve read about Ada, and I think SPARK is its killer feature. That being said, in the realm of “safe systems programming” I don’t think Ada’s going to be able to withstand the onslaught of Rust – Rust has the modern tooling, the ecosystem, and all the momentum.

                                                                                      (This pains me to say because I’m not the biggest Rust fan and I really like what I’ve seen of Ada, but if I’m going to learn one strict language, it seems like the best one career-wise would be Rust. I’d love to be proven wrong, though; the world needs to ring with the keyboards of a thousand languages.)

                                                                                      1. 11

                                                                                        Ada’s been flying under the radar doing it’s own thing, until AdaCore started open sourcing things and pushing awareness a few years ago. GNAT is actually a mature ecosystem (gpr, gnat studio, gnatdoc, gnattest, gnatpretty, etc) and has libraries which go back decades, it’s just not well known. The issue is that there had been no unifying system for distributing libraries before, which has hampered things, but Alire is fixing that now and also streamlining working in the language as well.

                                                                                        Rust is doing well for good reason. It has an easy way to start, build, test and share things, which is exactly why Alire is based on the work that community (and the npm community) has done. The big philosophical difference I’ve found between writing Rust and Ada is that Rust focuses on traits/types, whereas Ada focuses on functions since it relies predominantly on function overloading and separation into packages (modules), though you can use ML-style signatures. It’s interesting because in Ada, in general (except for tasks/protected objects), types aren’t namespaces for functions, whereas Rust relies on this with traits. In this way, Ada feels more to me like a functional language like Haskell and Rust feels more like an OOP language like Java. I’ve always found this bizarre because it should be the opposite.

                                                                                        Ada’s big advantage is how its general conceptual models are much more similar to C or C++ (think type safe C with function overriding, with classes if you want them), but it builds in a much stronger type system and concurrency types as well. Built-in pre/postcondition, invariants, and type range checks also goes a lot towards correctness, even if you don’t want to fully commit to format verification with SPARK. Note also that SPARK isn’t an all or nothing thing either with Ada code, you can verify individual components.

                                                                                        1. 7

                                                                                          While I’m generally a huge user of Rust and a proponent of it for many CPU-bound workloads, Rust really doesn’t have much in the way of safety features except when you compare it strictly to C and C++. Rust has a lot of antipatterns that the ecosystem has broadly adopted around error handling and error-prone techniques in general, and the overall complexity of the language prevents a lot of subtle issues from being cheap to find in code review. Building databases in Rust is not all that different from building them in C++, except there is a bit narrower of a search scope when I discover memory corruption bugs during development.

                                                                                          Ada (and much more so, SPARK) are a completely different tool for a completely different class of problems. The proof functionality of why3 that you gain access to in SPARK has no usable analogue in Rust. The ability to turn off broad swaths of language features is one of the most powerful tools for reducing the cost to build confidence in implementations for safety critical domains. There are so many Rust features I personally ban while working on correctness-critical projects (async, most usage of Drop, lots of logical patterns that have nothing to do with memory safety directly etc…, many patterns for working with files that must be excruciatingly modeled and tested to gain confidence, etc…) but with Ada it’s so much easier to just flip off a bunch of problematic features using profiles.

                                                                                          Ada SPARK faces zero competition from Rust for real safety critical work. It is simply so much cheaper to build confidence in implementations than Rust. I love Rust for building things that run on commodity servers, but if I were building avionics etc… I think it would be a poor business choice TBH.

                                                                                          1. 3

                                                                                            The ability to turn off broad swaths of language features is one of the most powerful tools for reducing the cost to build confidence in implementations for safety critical domains

                                                                                            This is something I never thought about, but yeah, Ada is the only language I’ve even heard of that lets you globally disable features.

                                                                                            1. 2

                                                                                              antipatterns that the ecosystem has broadly adopted around error handling and error-prone techniques […]

                                                                                              What kind of antipatterns are you thinking of? A lot of people appreciate the Result type and the use of values instead of hidden control flow for error handling.

                                                                                              1. 3

                                                                                                Try backed by Into is far, far worse than Java’s checked exceptions in terms of modeling error handling responsibilities. I’ve written about its risks here. Async increases bugs significantly while also increasing compile times and degrading performance to unacceptable degrees that make me basically never want to rely on broad swaths of essentially untested networking libraries in the ecosystem. It makes all of the metrics that matter to me worse. There are a lot of things that I don’t have the energy to write about anymore, I just use my own stuff that I trust and increasingly avoid relying on anything other than the compiler and subsets of the standard library.

                                                                                                1. 1

                                                                                                  I think your concerns are valid, but it’s basically like saying “don’t write int foo() throws Exception” in java, which would be the equivalent.

                                                                                                  There is a strong tendency in the Rust community to throw all of our errors into a single global error enum.

                                                                                                  citation needed there, I guess? Each library does tend to have its own error type, which means you’re going to have to compose it somewhere, but it’s not like everyone returns Box<dyn Error>, which would be the single global error type (although that’s what anyhow is for applications). The right balance should be that functions return a very precise error type, but that there’s Into implementations into a global foo::Error (for library foo) so that if you don’t care about which errors foo operations raise, you can abstract that. If you do care then don’t upcast.

                                                                                                  I think async is overused, but claiming it degrades performance is again [citation needed]?

                                                                                              2. 1

                                                                                                antipatterns that the ecosystem has broadly adopted around error handling and error-prone techniques in general

                                                                                                I’m interested to hear some of these problematic antipatterns around error handling and error-prone techniques in general?

                                                                                              3. 3

                                                                                                You’re right about the tooling, ecosystem and momentum, but Rust is a very long way from provability. I wonder if we’ll ever see something SPARK-like but for a subset of Rust.

                                                                                                1. 4

                                                                                                  The question is how much provability matters. My guess is not a whole lot in most cases.

                                                                                                  I feel like Rust’s momentum is largely a combination of, “C and C++ have too many footguns,” “Mutable shared state is hard to get right in a highly concurrent program,” and, “Everyone is talking about this thing, may as well use it for myself.”

                                                                                                  1. 5

                                                                                                    Provability is starting to matter more and more. There are very big companies getting involved in that space - Nvidia has started using Spark, Microsoft has been funding work on things like Dafny and F* for a long time, Amazon uses TLA+… Provability is just getting started :).

                                                                                                    1. 3

                                                                                                      I think provability definitely has some applications, largely in the space that Ada is used for currently – safety-critical systems mostly. I want to be able to prove that the software in between my car’s brake pedal and my actual brakes is correct, for example.

                                                                                                      1. 18

                                                                                                        I want to be able to prove that the software in between my car’s brake pedal and my actual brakes is correct, for example.

                                                                                                        I think we’ve shown pretty conclusively you can’t solve the car halting problem.

                                                                                                        1. 17

                                                                                                          I want to be able to prove that the software in between my car’s brake pedal and my actual brakes is correct, for example.

                                                                                                          If you’re going to use 11,253 read/write global variables (yes, eleven thousand, not a typo) in your car software with very limited and broken failsafes against bit flips and other random faults while intentionally ignoring OS errors, then you’re going to run in to problems no matter which tool you use.

                                                                                                          Just running codesonar on the Toyota software resulted in:

                                                                                                          • 2272 - global variable declared with different types
                                                                                                          • 333 - cast alters value
                                                                                                          • 99 - condition contains side-effect
                                                                                                          • 64 - multiple declaration of a global
                                                                                                          • 22 - uninitialized variable

                                                                                                          The throttle function was 1300 lines of code. With no clear tests.

                                                                                                          These issues are only partly technical problems IMO; something like Ada can help with those, but the far bigger problem isn’t technical. People choose to write software like this. Toyota choose to not take the reports seriously. They choose to lie about all sorts of things. They choose not to fix any of this. The US gov’t choose to not have any safety standards at all. Regulators choose not to take action until there really wasn’t any other option.

                                                                                                          And then it took a four-year investigation and an entire team of experts to independently verify the code before action finally got taken. You didn’t need any investigation for any of this. You just needed a single software developer with a grain of sense and responsibility to look at this for a few days to come to the conclusion that it was a horribly unacceptable mess.

                                                                                                          (And yes, you do need a proper investigation for legal accountability and burden of proof, and more in-depth analysis of what exactly needs fixing, but the problems were blatantly obvious.)

                                                                                                          I’d wager all of this would have failed with Ada as well. There weren’t even any specifications for large parts of the code (and other specifications didn’t match the code), there wasn’t even a bug tracker. If you’re going to cowboy code this kind of software then you’re asking for trouble. Do you think that Ada, TLA+, or SPARK would have fared better in this kind of culture? At the end of the day a tool is only as good as your willingness to use it correctly. Software people tend to think in terms of software to solve these problems. But if you look a bit beyond all the software bugs and WTFs, the core problem wasn’t really with the language they used as such. They knew it was faulty already.

                                                                                                          Does provability matter? I guess? But only if you’re willing to invest in it, and it seems abundantly clear that willingness wasn’t there; they willingly wrote bad software. It’s like trying to solve the VW emission scandal by using a safer programming language or TLA+. That’s not going to stop people from writing software to cheat the regulatory tests.

                                                                                                          It’s kind of amazing how few faults there were, because in spite of the many problems it did work, mostly, and with some extra care it probably would have always worked. More guarantees are always good, but to me this seems to indicate that maybe provability isn’t necessarily that critical (although it can probably replace parts of the current verification tools/framework, but I’m not so sure if it has a lot of additional value beyond economics).

                                                                                                          Summary of the reports: one, two.

                                                                                                          1. 1

                                                                                                            Does provability matter? I guess?

                                                                                                            Great, I’m glad we agree.

                                                                                                            More seriously: sure, culture is a problem; I certainly wasn’t trying to discount it.

                                                                                                            But this is an AND situation: to have proved-correct software you need a culture that values it AND the technology to support it. Writing off provability as a technology because the culture isn’t there yet is nonsensical. It actively undermines trying to get to the desirable state of proved-correct safety-critical software.

                                                                                                            I appreciate you wanted to have a rant about software culture, and believe me, I love to get angry about it too. You’re just aiming at something other than what I said.

                                                                                                            1. 1

                                                                                                              I appreciate you wanted to have a rant about software culture, and believe me, I love to get angry about it too. You’re just aiming at something other than what I said.

                                                                                                              It wasn’t intended as a rant, it was to demonstrate that at Toyota things were really bad and that this is why it failed at Toyota specifically. This wasn’t about “software culture”, it was about Toyota.

                                                                                                              Writing off provability as a technology because the culture isn’t there yet is nonsensical. It actively undermines trying to get to the desirable state of proved-correct safety-critical software.

                                                                                                              Of course you need to consider the actual reasons it failed before considering what the solution might be. Overall, the existing systems and procedures work pretty well, when followed. When is the last time your accelerator got stuck due to a software bug? It’s when you choose to not follow them that things go wrong. If Toyota has just followed their own procedures we wouldn’t talking about this now.

                                                                                                              Great, I’m glad we agree.

                                                                                                              I don’t think we agree at all, as I don’t think that provability will increase the average quality of this kind of software by all that much, if at all. The existing track record for this kind of software is already fairly good and when problems do arise it’s rarely because of a failure in the existing procedures as such, it’s because people decided to not follow them. Replacing the procedures with something else isn’t going to help much with that.

                                                                                                              It may replace existing test procedures and the like with something more efficient and time-consuming, but that’s not quite the same.

                                                                                                              And the Toyota problem wasn’t even necessarily something that would have been caught, as it was essentially a hardware problem inadequately handled by software. You can’t prove the correctness of something you never implemented.

                                                                                                              1. 1

                                                                                                                Proving software correct is categorically better verification than testing it with examples. If we can achieve provably-correct software, we should not throw that away just because people are not even verifying with example testing.

                                                                                                                Programming languages that are provable increase the ceiling of potential verification processes. Whether a particular company or industry chooses to meet that ceiling is an entirely different discussion.

                                                                                                    2. 3

                                                                                                      Have you run across any good SPARK resources? I’m interested in learning not only about how to use it, but also the theory behind it and how it’s implemented.

                                                                                                      1. 4

                                                                                                        https://learn.adacore.com/ should be a good starting point to learn Spark. If you’re interested in how it’s implemented, reading https://www.adacore.com/papers might be a good way to learn.

                                                                                                    1. 26

                                                                                                      These are all valid criticisms of certain patterns in software engineering, but I wouldn’t really say they’re about OOP.

                                                                                                      This paper goes into some of the distinctions of OOP and ADTs, but the summary is basically this:

                                                                                                      • ADTs allow complex functions that operate on many data abstractions – so the Player.hits(Monster) example might be rewritten in ADT-style as hit(Player, Monster[, Weapon]).
                                                                                                      • Objects, on the other hand, allow interface-based polymorphism – so you might have some kind of interface Character { position: Coordinates, hp: int, name: String }, which Player and Monster both implement.

                                                                                                      Now, interface-based polymorphism is an interesting thing to think about and criticise in its own right. It requires some kind of dynamic dispatch (or monomorphization), and hinders optimization across interface boundaries. But the critique of OOP presented in the OP is nothing to do with interfaces or polymorphism.

                                                                                                      The author just dislikes using classes to hold data, but a class that doesn’t implement an interface is basically the same as an ADT. And yet one of the first recommendations in the article is to design your data structures well up-front!

                                                                                                      1. 15

                                                                                                        The main problem I have with these “X is dead” type article is they are almost always straw man arguments setup in a way to prove a point. The other issue I have is the definition or interpretation of OOP is so varied that I don’t think you can in good faith just say OOP as a whole is bad and be at all clear to the reader. As an industry I actually think we need to get past these self constructed camps of OOP vs Functional because to me they are disingenuous and the truth, as it always does, lies in the middle.

                                                                                                        Personally, coming mainly from a Ruby/Rails environment, use ActiveRecord/Class to almost exclusively encapsulate data and abstract the interaction with the database transformations and then move logic into a place where it really only cares about data in and data out. Is that OOP or Functional? I would argue a combination of both and I think the power lies in the middle not one versus the other as most articles stipulate. But a middle ground approach doesnt get the clicks i guess so here we are

                                                                                                        1. 4

                                                                                                          the definition or interpretation of OOP is so varied that I don’t think you can in good faith just say OOP as a whole is bad and be at all clear to the reader

                                                                                                          Wholly agreed.

                                                                                                          The main problem I have with these “X is dead” type article is they are almost always straw man arguments setup in a way to prove a point.

                                                                                                          For a term that evokes such strong emotions, it really is poorly defined (as you observed). Are these straw man arguments, or is the author responding to a set of pro-OOP arguments which don’t represent the pro-OOP arguments with which you’re familiar?

                                                                                                          Just like these criticisms of OOP feel like straw men to you, I imagine all of the “but that’s not real OOP!” responses that follow any criticism of OOP must feel a lot like disingenuous No-True-Scotsman arguments to critics of OOP.

                                                                                                          Personally, I’m a critic, and the only way I know how to navigate the “not true OOP” dodges is to ask what features distinguish OOP from other paradigms in the opinion of the OOP proponent and then debate whether that feature really is unique to OOP or whether it’s pervasive in other paradigms as well and once in a while a feature will actually pass through that filter such that we can debate its merits (e.g., inheritance).

                                                                                                          1. 4

                                                                                                            I imagine all of the “but that’s not real OOP!” responses that follow any criticism of OOP must feel a lot like disingenuous No-True-Scotsman arguments to critics of OOP.

                                                                                                            One thing I have observed about OOP is how protean it is: whenever there’s a good idea around, it absorbs it then pretend it is an inherent part of it. Then it deflects criticism by crying “strawman”, or, if we point out the shapes and animals that are taught for real in school, they’ll point out that “proper” OOP is hard, and provide little to no help in how to design an actual program.

                                                                                                            Here’s what I think: in its current form, OOP won’t last, same as previous form of OOP didn’t last. Just don’t be surprised if whatever follows ends up being called “OOP” as well.

                                                                                                        2. 8

                                                                                                          The model presented for monsters and players can itself be considered an OO design that misses the overarching problem in such domains. Here’s a well-reasoned, in-depth article on why it is folly. Part five has the riveting conclusion:

                                                                                                          Of course, your point isn’t about OOP-based RPGs, but how the article fails to critique OOP.

                                                                                                          After Alan Kay coined OOP, he realized, in retrospect, that the term would have been better as message-oriented programming. Too many people fixate on objects, rather than the messages passed betwixt. Recall that the inspiration for OOP was based upon how messages pass between biological cells. Put another way, when you move your finger: messages from the brain pass to the motor neurons, neurons release a chemical (a type of message), muscles receive those chemical impulses, then muscle fibers react, and so forth. At no point does any information about the brain’s state leak into other systems; your fingers know nothing about your brain, although they can pass messages back (e.g., pain signals).

                                                                                                          (This is the main reason why get and set accessors are often frowned upon: they break encapsulation, they break modularity, they leak data between components.)

                                                                                                          Many critique OOP, but few seem to study its origins and how—through nature-inspired modularity—it allowed systems to increase in complexity by an order of magnitude over its procedural programming predecessor. There are so many critiques of OOP that don’t pick apart actual message-oriented code that beats at the heart of OOP’s origins.

                                                                                                          1. 1

                                                                                                            Many critique OOP, but few seem to study its origins and how—through nature-inspired modularity—it allowed systems to increase in complexity by an order of magnitude over its procedural programming predecessor.

                                                                                                            Of note, modularity requires neither objects nor message passing!

                                                                                                            For example, the Modula programming language was procedural. Modula came out around the same time as Smalltalk, and introduced the concept of first-class modules (with the data hiding feature that Smalltalk objects had, except at the module level instead of the object level) that practically every modern programming language has adopted today - including both OO and non-OO languages.

                                                                                                          2. 5

                                                                                                            I have to say, after read the first few paragraphs, I skipped to ‘What to do Instead’. I am aware of many limitations of OOP and have no issue with the idea of learning something new so, hit me with it. Then the article is like ’hmm well datastores are nice. The end.”

                                                                                                            The irony is that I feel like I learned more from your comment than from the whole article so thanks for that. While reading the Player.hits(Monster) example I was hoping for the same example reformulated in a non-OOP way. No luck.

                                                                                                            If anyone has actual suggestions for how I could move away from OOP in a practical and achievable way within the areas of software I am active in (game prototypes, e.g. Godot or Unity, Windows desktop applications to pay the bills), I am certainly listening.

                                                                                                            1. 2

                                                                                                              If you haven’t already, I highly recommend watching Mike Acton’s 2014 talk on Data Oriented Design: https://youtu.be/rX0ItVEVjHc

                                                                                                              Rather than focusing on debunking OOP, it focuses on developing the ideal model for software development from first principles.

                                                                                                              1. 1

                                                                                                                Glad I was helpful! I’d really recommend reading the article I linked and summarised – it took me a few goes to get through it (and I had to skip a few sections), but it changed my thinking a lot.

                                                                                                              2. 3

                                                                                                                [interface-based polymorphism] requires some kind of dynamic dispatch (or monomorphization), and hinders optimization across interface boundaries

                                                                                                                You needed to do dispatch anyway, though; if you wanted to treat players and monsters homogenously in some context and then discriminate, then you need to branch on the discriminant.

                                                                                                                Objects, on the other hand, allow interface-based polymorphism – so you might have some kind of interface […] which Player and Monster both implement

                                                                                                                Typeclasses are haskell’s answer to this; notably, while they do enable interface-based polymorphism, they do not natively admit inheritance or other (arguably—I will not touch these aspects of the present discussion) malaise aspects of OOP.

                                                                                                                1. 1

                                                                                                                  You needed to do dispatch anyway, though; if you wanted to treat players and monsters homogenously in some context and then discriminate, then you need to branch on the discriminant.

                                                                                                                  Yes, this is a good point. So it’s not like you’re saving any performance by doing the dispatch in ADT handling code rather than in a method polymorphism kind of way. I guess that still leaves the stylistic argument against polymorphism though.

                                                                                                                2. 2

                                                                                                                  Just to emphasize your point on Cook’s paper, here is a juicy bit from the paper.

                                                                                                                  Any time an object is passed as a value, or returned as a value, the object-oriented program is passing functions as values and returning functions as values. The fact that the functions are collected into records and called methods is irrelevant. As a result, the typical object-oriented program makes far more use of higher-order values than many functional programs.

                                                                                                                  1. 2

                                                                                                                    Now, interface-based polymorphism is an interesting thing to think about and criticise in its own right. It requires some kind of dynamic dispatch (or monomorphization), and hinders optimization across interface boundaries.

                                                                                                                    After coming from java/python where essentially dynamic dispatch and methods go hand in hand I found go’s approach, which clearly differentiates between regular methods and interface methods, really opened my eyes to overuse of dynamic dispatch in designing OO apis. Extreme late binding is super cool and all… but so is static analysis and jump to definition.

                                                                                                                  1. 1

                                                                                                                    “Total type” might be the phrase to use, like “total functions”.

                                                                                                                    1. 1

                                                                                                                      Hmmm. I like the idea of using an existing function description, but “total” doesn’t seem quite right. A total function is one that always returns a valid type, but it could map any number of input values onto a single output value or visa versa. A “one-to-one” function, though, maps each input value to a distinct output value. So perhaps these “tight” types could be described as “one-to-one”?

                                                                                                                    1. 3

                                                                                                                      Honest question: Why is curl so large? Is it mostly dealing with TLS? A lot of configurability? Obscure intricacies of HTTP? All of the above?

                                                                                                                      1. 3

                                                                                                                        All of the above?

                                                                                                                        Plus, it’s not just HTTP: cURL can send and read email, speak SMB, FTP, and much more.

                                                                                                                        Any one of those protocols is full of intricate pain points worked out over years - decades, even. Curl goes after them all.

                                                                                                                        Whether you think that’s awesome or terrible is for you to decide. :-)

                                                                                                                        1. 2

                                                                                                                          As @owen points out, libcurl does a huge amount more than fetch files over HTTP. If that’s all you need, libfetch is far smaller and has had a lot more security review (which is possible, because it’s a lot smaller). If you’re expecting to deal with URLs from untrusted sources, libfetch is far more likely to fail closed (i.e. just reject a URL and not know how to deal with it) whereas libcurl is more likely to either work or fail open (have a security vulnerability).

                                                                                                                        1. 3

                                                                                                                          Ah, I’ve been wondering what people mean when they say that Clojure has a bad license. Thanks for this.

                                                                                                                          On a separate note, I don’t see the benefit of using the MIT license over an even more permissive license, like the Unlicense. I would love to hear some arguments for using the former, as I personally am quite unsure of what to license my own projects (they’re either not been licensed at all, or using the Unlicense).

                                                                                                                          1. 6

                                                                                                                            I think Google and other big corps don’t allow contributing to or using unlicensed projects because public domain is not legally well defined in some states lawyer pedantry, which to me seems like a positive thing :^)

                                                                                                                            Personally I go with Unlicense for one-off things and projects I don’t really want/need to maintain, MIT or ISC (a variant of MIT popular in the OCaml ecosystem) if I’m making a library or something I expect people to actually use because of the legal murkiness of the Unlicense, and if I were writing something like the code to a game or some other end-user application I’d probably use the GPLv3, for example if it was a mobile app to discourage people from just repackaging it and adding trackers or ads and dumping it on the play store.

                                                                                                                            1. 4

                                                                                                                              Yes! “Copyleft is more appropriate to end-user apps” is my philosophy as well. Though actually I end up using the Unlicense for basically all the things.

                                                                                                                              legal murkiness of the Unlicense

                                                                                                                              Isn’t that kinda just FUD? The text seems good to me, but IANAL of course.

                                                                                                                              1. 2

                                                                                                                                Isn’t that kinda just FUD?

                                                                                                                                Reading the other comments seems like it is, I guess I was just misinformed. I still prefer MIT because, as others have said, it’s more well known.

                                                                                                                              2. 2

                                                                                                                                This is somewhat off-topic, but I never thought the ISC license was really popular in the OCaml ecosystem. For a crude estimate:

                                                                                                                                $ cd ~/.opam/repo/default/
                                                                                                                                $ grep -r 'license: "ISC"' . | wc -l
                                                                                                                                1928
                                                                                                                                $ grep -r 'license: "MIT"' . | wc -l
                                                                                                                                4483
                                                                                                                                
                                                                                                                                
                                                                                                                                1. 2

                                                                                                                                  I think it’s more popular than in most other ecosystems at least.

                                                                                                                                  1. 3

                                                                                                                                    Might be. It would be interesting to get some stats about language/package ecosystem and license popularity.

                                                                                                                                    1. 2

                                                                                                                                      Here it is for Void Linux packages; not the biggest repo but what I happen to have on my system:

                                                                                                                                      $ rg -I '^license' srcpkgs |
                                                                                                                                        sed 's/license="//; s/"$//; s/-or-later$//; s/-only$//' |
                                                                                                                                        sort | uniq -c | sort -rn
                                                                                                                                         1604 GPL-2.0
                                                                                                                                         1320 MIT
                                                                                                                                          959 GPL-3.0
                                                                                                                                          521 LGPL-2.1
                                                                                                                                          454 BSD-3-Clause
                                                                                                                                          392 Artistic-1.0-Perl, GPL-1.0
                                                                                                                                          357 Apache-2.0
                                                                                                                                          222 BSD-2-Clause
                                                                                                                                          150 GPL-2
                                                                                                                                          133 ISC
                                                                                                                                          114 LGPL-3.0
                                                                                                                                          104 Public Domain
                                                                                                                                           83 LGPL-2.0
                                                                                                                                           83 GPL-2.0-or-later, LGPL-2.1
                                                                                                                                           63 GPL-3
                                                                                                                                           50 MPL-2.0
                                                                                                                                           47 OFL-1.1
                                                                                                                                           41 AGPL-3.0
                                                                                                                                           36 Zlib
                                                                                                                                           31 BSD
                                                                                                                                           26 GPL-2.0-or-later, LGPL-2.0
                                                                                                                                           23 Unlicense
                                                                                                                                           21 Artistic, GPL-1
                                                                                                                                           20 Apache-2.0, MIT
                                                                                                                                           19 ZPL-2.1
                                                                                                                                           19 BSL-1.0
                                                                                                                                      [...]
                                                                                                                                      

                                                                                                                                      It groups the GPL “only” and “-or-later” in the same group, but doesn’t deal with multi-license projects. It’s just a quick one-liner for a rough indication.

                                                                                                                                2. 1

                                                                                                                                  This sounds like a nice scheme for choosing a license. Thanks for you explanations regarding choosing each one of them.

                                                                                                                                3. 4

                                                                                                                                  On a separate note, I don’t see the benefit of using the MIT license over an even more permissive license, like the Unlicense

                                                                                                                                  It’s impossible to answer the question without context. No license is intrinsically better or worse than another without specifying what you want to achieve from a license. With no license, you prevent anyone from doing anything, so any license is a vector away from this point, defining a set of things that people can do with your code. For example:

                                                                                                                                  • Do you want to allow everyone to modify and redistribute your code? If not, then you don’t want a F/OSS license.
                                                                                                                                  • Do you want to allow people to modify and redistribute your code without giving their downstream[1] the code and rights to do the same? If so, you want a copyleft license of some kind.
                                                                                                                                  • Do you want to allow people to modify and redistribute your code linked to any other open source code? If so, then you want either a permissive license or a copyleft license with specific exemptions (making something that is both copyleft and compatible with both GPLv2 and Apache 2 is non-trivial, for example).
                                                                                                                                  • Do you want people who are not lawyers to be able to understand what they can do with your code, when composed with whatever variations on copyright law apply in their particular jurisdiction? Then you want a well-established license such as BSD/MIT, one of the Creative Commons family, Apache, or GPL, for which there are a lot of human-readable explanations.
                                                                                                                                  • Do you want to be able to take contributions from other folks and still use the code in other projects under any license? If so, then you want a permissive license or to require copyright assignment.
                                                                                                                                  • Do you intend to sue people for violating your license? If not, then you probably won’t gain anything from a license with terms that are difficult to comply with because unscrupulous people can happily violate them, unless you assign copyright to the FSF or a similar entity[2].
                                                                                                                                  • Do you want to allow people to pretend that they wrote your code? If so, then you want to avoid licenses with an attribution requirement and go for something like Unlicense.

                                                                                                                                  From this list, there are two obvious differences between the MIT license and Unlicense: MIT is well-established and everyone knows what it means, so there’s no confusion about what it means and what a court will decide it means, and it requires attribution and so I can’t take an MIT-licensed file, put it in my program / library and pretend that I wrote it. Whether these are advantages depends on what you want to allow or disallow with your license.

                                                                                                                                  [1] There’s a common misconception that the GPL and similar licenses require people to give back. They don’t, they require people to give forwards, which amounts to the same thing for widely-distributed things where it’s easy to get a copy but is not so helpful to the original project if it’s being embedded in in-house projects.

                                                                                                                                  [2] Even then, YMMV. The FSF refused to pursue companies that were violating the LGPL for GNUstep. Being associated with the FSF was a serious net loss for the project overall.

                                                                                                                                  1. 2

                                                                                                                                    I should have clarified that I don’t care about attribution. Thank you for the informative and well structued overview.

                                                                                                                                    1. 3

                                                                                                                                      Looking at the text of Unlicense, it also does not contain the limitations of liability or warranty. That’s probably not a problem - when the BSD / MIT licenses were written there was a lot of concern about implied warranty and fitness for purpose, but I think generally that’s assumed to be fine for things that are given away for free.

                                                                                                                                      You might want to rethink the attribution bit though. It can be really useful when you’re looking for a job to have your name associated with something that your employer is able to look at. It is highly unlikely that anyone will choose to avoid a program or library because it has an attribution clause in the library, so the cost to you of requiring attribution is negligible whereas the benefits can be substantial.

                                                                                                                                      If you’re looking for people to contribute to your projects, that can have an impact as well.

                                                                                                                                      1. 3

                                                                                                                                        I don’t care about attribution mainly for philosophical reasons. I dislike copyright as a concept and want my software to be just that, software. People should be able to use it without attributing the stuff to me or anyone else.

                                                                                                                                        1. 2

                                                                                                                                          Attribution is more closely related to moral rights than IP rights, though modern copyright has subsumed both. The right of a creator to be associated with their work predates copyright law in Europe. Of course, that’s not universal: in China for a long time it was considered rude to claim authorship and so you got a lot of works attributed to other people.

                                                                                                                                          1. 2

                                                                                                                                            Right, I don’t want to claim authorship of much of the stuff I create. I simply want to have it be a benefit to the people who use it. I don’t have a moral issue with not crediting myself, so I won’t.

                                                                                                                                      2. 2

                                                                                                                                        Perhaps you would like the ZLib license, then? Unlike MIT, it does not require including the copyright and license text in binary distributions.

                                                                                                                                      3. 2

                                                                                                                                        I’m no lawyer, but as I understand it, authorship is a “natural right” that cannot be disclaimed at least within U.S. law. It is separate from copyright. The Great Gatsby is in the public domain, but that doesn’t mean that I get to say that I wrote it. You probably can’t press charges against me for saying so as an individual, but plagiarism is a serious issue in many industries, and may have legal or economic consequences.

                                                                                                                                        My point is that the Unlicense revokes copyright, but that someone claiming to have created the work themselves may still face consequences of a kind. Whether that is sufficient protection of your attribution is a matter of preference.

                                                                                                                                        1. 3

                                                                                                                                          My understanding is that it’s a lot more complex in the US. Authorship is under the heading of ‘moral rights’, but these are covered by state law and not federal. There are some weird things, such as only applying to statues in some states.

                                                                                                                                      4. 3

                                                                                                                                        Not licensing make the product proprietary, even when the source is publicly shown no one can use it without your permission. IANAL but Unlicense (just like CC0) aren’t really legally binding in some countries (you cannot make your work public domain without dying and waiting). So MIT is not that bad choice as the only difference is that you need to be mentioned by the authors of the derivative work.

                                                                                                                                        1. 3

                                                                                                                                          0-BSD is more public domain-like as it has zero conditions. It’s what’s known as a “public-domain equivalent license”.

                                                                                                                                          https://en.wikipedia.org/wiki/Public-domain-equivalent_license

                                                                                                                                          1. 3

                                                                                                                                            The Unlicense is specifically designed to be “effectively public domain” in jurisdictions that don’t allow you to actually just put something in the public domain, by acting as a normal license without any requirements.

                                                                                                                                            That’s, like, the whole point of the Unlicense :) Otherwise it wouldn’t need to exist at all.

                                                                                                                                            1. 2

                                                                                                                                              I have heard that the Unlicense is still sometimes not valid in certain jurisdictions. 0-BSD is a decent alternative as it’s “public-domain equivalent”, i.e. it has no conditions.

                                                                                                                                            2. 2

                                                                                                                                              Right, I’ve heard there’s some legal issues with it before, thanks for reminding me.

                                                                                                                                              EDIT: Looks like there’s no public domain problems with the Unlicense after all, so I’m not worried about this.

                                                                                                                                              1. 1

                                                                                                                                                Looks like there’s no public domain problems with the Unlicense after all

                                                                                                                                                Where did you see this?

                                                                                                                                              2. 2

                                                                                                                                                The whole point of the CC0 is to fully disclaim all claims and rights inherent to copyright to the fullest extent possible in jurisdictions where the concept of Public Domain does not exist or cannot be voluntarily applied. There’s very little reason to suspect that choosing the CC0 is less legally enforceable than MIT.

                                                                                                                                                1. 3

                                                                                                                                                  CC0 seems fine but is somewhat complex. I prefer licenses that are very simple and easy to digest.

                                                                                                                                              3. 3

                                                                                                                                                I don’t see the benefit of using the MIT license over an even more permissive license, like the Unlicense

                                                                                                                                                Purely pragmatically, the MIT license is just better known. Other than that: the biggest difference is that the MIT requires attribution (“The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.”), and the Unlicense doesn’t.

                                                                                                                                                As for the concerns over “public domain”, IMHO this is just lawyer pedantry. The text makes it plenty clear what the intent is, and I see no reason why it shouldn’t be upheld in court. The gist of the first paragraph is pretty much identical as MIT except with the “the above copyright notice and this permission notice shall be included in all copies” omitted. If it would only say “this is public domain”: then sure, you could land in to a conflict what “public domain” means exactly considering this isn’t a concept everywhere. But that’s not the case with the Unlicense.

                                                                                                                                                1. 2

                                                                                                                                                  This is comforting to know. Thank you for the clarification!

                                                                                                                                                2. 1

                                                                                                                                                  I’ve released a lot of code under a dual MIT/Unlicense scheme. ripgrep is used by millions of people AFAIK (through VS Code) and is under this scheme. I have heard zero complaints. The purpose of such a thing is to make an ideological point with the Unlicense, while also providing the option to use something like the MIT which is a bit more of a known quantity. Prior to this, I was releasing code with just the Unlicense and I did receive complaints about it. IIRC, from big corps but also individuals from jurisdictions that don’t recognize public domain. It wasn’t so much that they definitively said they couldn’t use the Unlicense, but rather, that it was too murky.

                                                                                                                                                  IANAL although sometimes I play one on TV. In my view, the Unlicense is just fine and individuals and big corps who avoid it are likely doing it because of an overly conservative risk profile. Particularly with respect to big corps, it’s easy to see how the incentive structure would push them to do conservative things with respect to the law for something like software licenses.

                                                                                                                                                  While the dual licensing scheme seems to satisfy all parties from a usage perspective, I can indeed confirm that this prevents certain big corps from contributing changes back to my projects. Because those changes need to be licensable under both the MIT and Unlicense. To date, I do not know the specific reasons for this policy.

                                                                                                                                                  1. 2

                                                                                                                                                    I never really understood how dual-licensing works, can you explain a bit? Do users pick the license that they want or can they even cherry pick which clauses of each license they want to abide by?

                                                                                                                                                    1. 2

                                                                                                                                                      AIUI, you pick one of the licenses (or cascade the dual licensing scheme). That’s how my COPYING file is phrased anyway.

                                                                                                                                                      But when you contribute to a project under a dual licensing scheme, your changes have to be able to be licensed under both licenses. Otherwise, the dual license choice would no longer be valid.

                                                                                                                                                  2. 1

                                                                                                                                                    As I state in the article, I don’t think the EPL is a “bad license.” Clojure Core uses the EPL for very good reasons — it’s just that most of those reasons are unlikely to apply to Random Clojure Library X.

                                                                                                                                                    EDIT: I had replied regarding The Unlicense, but I see other folks have done a more thorough job below, so I’m removing that blurb. Thanks all.

                                                                                                                                                    1. 1

                                                                                                                                                      I should have expressed myself more clearly. I’ve heard people mention Clojure’s license as a downside to the language, and now that I’ve read your article I have an idea of what they’re talking about.

                                                                                                                                                    2. 0

                                                                                                                                                      Unlicense

                                                                                                                                                      I recommend against using this license, because making the ambiguous license name relevant anywhere makes everyone’s life harder. It makes it hard to distinguish between “CC0’d” and “in license purgatory”:

                                                                                                                                                      “What is this code’s license?”
                                                                                                                                                      “It’s Unlicensed
                                                                                                                                                      <Person assumes it’s not legally safe to use, because it’s unlicensed>

                                                                                                                                                      I wish that license would either rename or die.

                                                                                                                                                    1. 4

                                                                                                                                                      This is awesome. When combined with a formally verified RISC-V core, this could form the basis of a probably correct and secure computing platform.

                                                                                                                                                      1. 6

                                                                                                                                                        Did you mean provably correct? If something is provably correct it is probably correct, but still.

                                                                                                                                                        1. 10

                                                                                                                                                          Perhaps it’s a tongue-in-cheek way of reminding us all that the proof always has a boundary. Is the proof checker and logic sound? Does the theorem state what we think it does? Has the hardware synthesis toolkit been proven correct, or its output been checked? Does the proof model electromagnetic interference (think Rowhammer)? Does it model timing sidechannels (think Spectre/Meltdown)? Do you trust your chip foundry (supply chain attacks)?

                                                                                                                                                          Not trying to detract from the value of formal methods, but it’s important to keep the limitations in mind.

                                                                                                                                                          1. 2

                                                                                                                                                            Is the proof checker and logic sound?

                                                                                                                                                            I think there has been a single bug in Coq … ever.

                                                                                                                                                            Does the theorem state what we think it does?

                                                                                                                                                            ZCash and Monero have had unlimited inflation bugs but no one noticed because vanishingly few people are capable of finding those types of bugs. Most software doesn’t bother with even informal security models.

                                                                                                                                                            Has the hardware synthesis toolkit been proven correct, or its output been checked?

                                                                                                                                                            When SeL4 came out, everyone was quick to complain about CPU and compiler bugs. They got around to verifying those layers as well. I’m sure they will get to this.

                                                                                                                                                            Does the proof model electromagnetic interference (think Rowhammer)?

                                                                                                                                                            ECC + improvements to memory manufacturing has got that covered.

                                                                                                                                                            Does it model timing sidechannels (think Spectre/Meltdown)?

                                                                                                                                                            YUP! Gernot had been complaining to Intel/AMD/ARM that this could happen before Spectre was discovered, outlined how to fix it shortly thereafter, and led the effort to enhance the RISC-V spec to prevent similar issues.

                                                                                                                                                            Do you trust your chip foundry (supply chain attacks)?

                                                                                                                                                            Are you up against a nation state attacker who is willing to blow millions of dollars to compromise your system?

                                                                                                                                                            Not trying to detract from the value of formal methods, but it’s important to keep the limitations in mind.

                                                                                                                                                            🙄 This is the most common (and naive) critique of formal methods work. The industry has a defect rate of 1-10 bugs per 1,000 lines of code. Stating that “Nothing is a panacea!!!” just detracts from the radical benefits of formal verification. It’s like commenting on a Rust article that we have had memory safe languages for decades.

                                                                                                                                                            1. 1

                                                                                                                                                              🙄 This is the most common (and naive) critique of formal methods work. The industry has a defect rate of 1-10 bugs per 1,000 lines of code. Stating that “Nothing is a panacea!!!” just detracts from the radical benefits of formal verification. It’s like commenting on a Rust article that we have had memory safe languages for decades.

                                                                                                                                                              I’m not saying it to discourage the usage of formal methods. I spend a lot of time in a proof assistant myself (mostly because I find it enjoyable…). I just want to set the right expectations - even when the software has been proven correct, things can go wrong, because there will always exist failure modes the theorem that has been proven doesn’t exclude.

                                                                                                                                                              I do agree that formal methods allow defect rates much, much lower than attainable otherwise.

                                                                                                                                                              1. 1

                                                                                                                                                                I just want to set the right expectations - even when the software has been proven correct, things can go wrong, because there will always exist failure modes the theorem that has been proven doesn’t exclude.

                                                                                                                                                                I just don’t think this type of commentary sets the right expectations. A common issue with security analysis (of which I myself am guilty) is constructing an esoteric threat model that invalidates the security provided. But infosec engineering is built on inflating the cost of an attack beyond the payoff.

                                                                                                                                                                The SeL4 and RISC-V proofs ensures that the memory model is not violated. It is possible that there is a mistake in the proof somewhere, but it would take an enormously expensive research effort to find it. And even if one is ever found, the proof system would be patched to ensure that specific type of error would never occur again.

                                                                                                                                                                This gives us the foundation we need to make cracking commodity systems beyond the budget of ransomware gangs. I’m too lazy to find the reference right now, but analysis of critical Linux zero days showed that the vast majority of them would be downgraded to DoS attacks if they used an SeL4 kernel with a microservices approach. In SeL4, each process can have protections beyond what a conventional Linux machine can provide via virtual machines.

                                                                                                                                                                From there we can start to build high assurance microservices (filesystem, networking, etc) that are formally verified. Network switches and routers wouldn’t require security patches post deployment. And even for software that is too complex to verify, capabilities make it so that a zero day in an end-user application can’t be used to do anything other than manipulate the file that’s already open.

                                                                                                                                                                I do agree that formal methods allow defect rates much, much lower than attainable otherwise.

                                                                                                                                                                A zero-click Windows RCE zero day already costs ~$1 million. It’s possible push that price to the moon, if can convince people that the protections gained are significant.

                                                                                                                                                            2. 2

                                                                                                                                                              Does it model timing sidechannels (think Spectre/Meltdown)?

                                                                                                                                                              In this case, the article links this: https://ts.data61.csiro.au/publications/csiro_full_text/Wistoff_SGBH_21.pdf

                                                                                                                                                              Data61 is doing good work on that.

                                                                                                                                                              1. 2

                                                                                                                                                                I <3 this comment. It would be nice if more programmers acknowledged these boundaries. Cosmic rays, race conditions, backhoes tearing up fiber are all non-malicious IRL things that are difficult to prove your way out of too.

                                                                                                                                                                1. 2

                                                                                                                                                                  Race conditions can be handled just fine. See: concurrent separation logic.

                                                                                                                                                                  1. 2

                                                                                                                                                                    “Just” is doing a lot of work there. It possible, but very hard!

                                                                                                                                                                    1. 2

                                                                                                                                                                      Well, but compared to cosmic rays and backhoes tearing up fiber, it’s something you can model fully…

                                                                                                                                                                  2. 2

                                                                                                                                                                    Your point is right that there are often concerns outside of proofs. I figured I’d chime in to let you know the three you mentioned have been addressed by proofs and/or better design.

                                                                                                                                                                    For cosmic rays, see Tandem NonStop or Rockwell-Collins AAMP7G CPU. They both use redundancy with voters to deal with stuff like that.

                                                                                                                                                                    For races, see Concurrent Pascal or Eiffel’s SCOOP. Recently Rust.

                                                                                                                                                                    For backhoes, one can model loss of connectivity in many ways. Many protocols already address it.

                                                                                                                                                                    At least one on each list is used in commercial applications.

                                                                                                                                                                2. 6

                                                                                                                                                                  The time from the open source release of seL4 and the first serious security advisory being announced was about 18 hours. Formal verification is always limited by the set of things that you treat as axioms and the set of properties that you prove. A group that I work with had a problem with some verified code a couple of years back: it was verified as memory safe. Temporal safety was expressed as a proof that no object was referenced after it was deallocated. This was trivially verified mechanically: the code never freed any memory. Adding the additional constraint that all memory allocated during a particular sequence of operations was deallocated at the end made the proofs much harder.

                                                                                                                                                                  Folks who work on formal verification are open about these limitations: verified code is not bug-free code, it’s code that doesn’t have bugs of specific categories that you can enumerate and express. Folks outside the field tend to hype it a lot more.

                                                                                                                                                                  In the seL4 case, a bunch of the things that it does to make the forward-progress guarantees in the kernel possible make writing software that runs on top of it vastly harder (everything can fail spuriously, everything needs you to handle back-off and retry, good luck if you want forward-progress guarantees for anything outside of the microkernel).

                                                                                                                                                                  1. 3

                                                                                                                                                                    In the seL4 case, a bunch of the things that it does to make the forward-progress guarantees in the kernel possible make writing software that runs on top of it vastly harder (everything can fail spuriously, everything needs you to handle back-off and retry, good luck if you want forward-progress guarantees for anything outside of the microkernel).

                                                                                                                                                                    This is interesting, and the first time I heard about it. Can you elaborate or otherwise source?

                                                                                                                                                                    1. 2

                                                                                                                                                                      This is mostly from talking to folks on the project that have written software on seL4.

                                                                                                                                                                    2. 2

                                                                                                                                                                      The time from the open source release of seL4 and the first serious security advisory being announced was about 18 hours.

                                                                                                                                                                      That certainly sounds interesting. I briefly tried to find the advisory you mention, but wasn’t successful. Could you point me in the right direction?

                                                                                                                                                                      1. 1

                                                                                                                                                                        It was in the press a lot at the time of the release. It was in part of the system call interface, which was outside of the part that was verified.

                                                                                                                                                                    3. 1

                                                                                                                                                                      🤦‍♂️ yes, I meant provably. My phone auto-incorrected me.