1. 25
  1.  

  2. 5

    It seems odd to spend so much space discussing the complexities of software development, only to conclude that the answer is empiricism. Surely the number of variables and their non-linear effects make experimentation difficult. I think $10M is an extremely tiny fraction of what a reproducible experiment would cost, and it would take a long time to run. You need a huge scale in number of projects and also lengthy longitudinal studies of their long-term impacts. And after you did all that the gigantic experiment would be practically certain to perturb what it’s trying to measure: programmer behavior. Because no two projects I’m on get the same me. I change, mostly by accumulating scar tissue.

    Empiricism works in a tiny subset of situations where the variables have been first cleaned up into orthogonal components. I think in this case we have to wait for the right perspective to find us. We can’t just throw money at the problem.

    1. 6

      What else can we do? Our reason is fallible, our experiences are deceitful, and we can’t just throw our hands up and say “we’ll never know”. Empiricism is hard and expensive, but at least we know it works. It’s gotten us results about things like n-version programming and COCOMO and TDD and formal methods. What would you propose we do instead?

      1. 4

        Empiricism is by no means the only thing that works. Other things that work: case studies, taxonomies, trial and error with motivation and perseverance. Other things, I’m sure. All of these things work some of the time – including empiricism. It’s not like there’s some excluded middle between alchemy and science. Seriously, check out that link in my comment above.

        I’m skeptical that we have empirically sound results about any of the things you mentioned, particularly TDD and formal methods. Pointers? For formal methods I find some of @nickpsecurity’s links kinda persuasive. On some mornings. But those are usually case studies.

        Questions like “static or dynamic typing” are deep in not-even-wrong territory. Using empiricism to try to answer them is like a blind man in a dark room looking for a black cat that isn’t there.

        Even “programming” as a field of endeavor strikes me as a false category. Try to understand that domain you’re interested in well enough to automate it. Try a few times and you’ll get better at it – in this one domain. Leave the task of generalization across domains to future generations. Maybe we’ll eventually find that some orthogonal axis of generalization works much better. Programming in domain X is like ___ in domain X more than it is like programming in domain Y.

        You ask “what else is there?” I respond in the spirit of Sherlock Holmes: “when you have excluded the impossible, whatever is left, however unlikely, is closer to the answer.” So focus on your core idea that the number of variables is huge, and loosen your grip on empiricism. See where that leads you.

        1. 5

          I think we’re actually on the same page here. I consider taxonomies, ethnographies, case studies, histories, and even surveys as empirical. It’s not just double blind clinical studies: as Making Software put it, qualitative findings are just as important.

          I reject the idea that these kinds of questions are “not even wrong”, though. There’s no reason to think programming is any more special than the rest of human knowledge.

          1. 2

            Ah ok. If by empiricism you mean, “try to observe what works and do more of that”, sure. But does that really seem worth saying?

            It can be hard psychologically to describe a problem well in an article and then not suggest a solution. But sometimes that may be the best we can do.

            I agree that programming is not any more special than the rest of human knowledge. That’s why I claim these questions are not even wrong. Future generations will say, “sure static typing is better than dynamic typing by about 0.0001% on average, but why did the ancients spend so much time on that?” Consider how we consider ancient philosophers who worried at silly questions like whether truth comes from reason or the senses.

            Basically no field of human endeavor had discovered the important questions to ask in its first century of existence. We should spend more time doing and finding new questions to ask, and less time trying to generalize the narrow answers we discover.

            1. 3

              Ah ok. If by empiricism you mean, “try to observe what works and do more of that”, sure. But does that really seem worth saying?

              It’s not quite that. It’s all about learning the values and limitations of all the forms of knowledge-collection. What it means to do a case study and how that differs from a controlled study, where ethnographies are useful, etc. It’s not “observe what works and do more of that”, it’s “systematically collect information on what works and understand how we collect and interpret it.”

              Critically in that is that the information we collect by using “reason” alone is minimal and often faulty, but it’s how almost everybody interprets software. That and appealing to authority, really.

              Basically no field of human endeavor had discovered the important questions to ask in its first century of existence. We should spend more time doing and finding new questions to ask, and less time trying to generalize the narrow answers we discover.

              The difference is that we’ve already given software control over the whole world. Everything is managed with software. It guides our flights and runs our power grid. Algorithms decide whether people go to jail or go free. Sure, maybe code will look radically different in a hundred years, but right now it’s here and present and we have to understand it.

              1. 2

                It is fascinating that we care about the same long-term problem but prioritize sub-goals so differently. Can you give an example of a more important question than static vs dynamic typing that you want to help answer by systematically collecting more information?

                Yes, we have to deal with the code that’s here and present. My answer is to reduce scale rather than increase it. Don’t try to get better at running large software projects. Run more small projects; those are non-linearly more tractable. Gradually reduce the amount of code we rely on, and encourage more people to understand the code that’s left. A great example is the OpenBSD team’s response to Heartbleed. That seems far more direct an attack on existing problems than any experiments I can think of. Experiments seem insufficiently urgent, because they grow non-linearly more intractable with scale, while small-scale experiments don’t buy you much: if you don’t control for all variables you’re still stuck using “reason”.

                1. 2

                  Can you give an example of a more important question than static vs dynamic typing that you want to help answer by systematically collecting more information?

                  Sure. Just off the top of my head:

                  • How much does planning ahead improve error rate? What impacts, if any, does agile have?
                  • What are the main causes of cascading critical failures in systems, and what can we do about them?
                  • When it comes to maximizing correctness, how much do intrinsic language features matter vs processes?
                  • I don’t like pair programming. Should I be doing it anyway?
                  • How do we audit ML code?
                  • How much do comments help? How much does documentation help?
                  • Is goto actually harmful?

                  Obviously each of these have ambiguity and plenty of subquestions in them. The important thing is to consider them things we can investigate, and that investigating them is important.

                  1. 0

                    Faced with Cthulhu, you’re trying to measure how the tips of His tentacles move. But sometimes you’re conflating multiple tentacles! Fthagn!

                    1. 1

                      As if you can measure tentacles in non-Euclidean space without going mad…

                      1. 1

                        Now you’re just being rude. I made a good faith effort to answer all of your questions and you keep condescending to me and insulting me. I respect that you disagree with me, but you don’t have to be an asshole about it.

                        1. 2

                          Not my intention at all! I’ll have to think about why this allegory came across as rude. (I was more worried about skirting the edge when I said, “is that really worth talking about?”) I think you’re misguided, but I’m also aware that I’m pushing the less likely theory. It’s been fun chatting with you precisely because you’re trying to steelman conventional wisdom (relative to my more outre idea), and I find a lot to agree with. Under it all I’ve been hoping that somebody will convince me to return to the herd so I can stop wasting my life. Anyway, I’ll stop bothering you. Thanks for the post and the stimulating conversation.

                2. 2

                  “try to observe what works and do more of that”

                  That is worth saying, because it can easily get lost when you’re in the trenches at your job, and can be easy to forget.

            2. 2

              What else can we do?

              If describing in terms of philosophies, then there’s also reductionism and logic. The hardware field turning analog into digital lego’s and Oberon/Forth/Simula for software come to mind for that. Maybe model-driven engineering. They break software into fundamental primitives that are well-understood which then compose into more complex things. This knocks out tons of problems but not all.

              Then, there’s logical school that I’m always posting about as akkartik said where you encode what you want, the success/failure conditions, how you’re achieving them, and prove you are. Memory safety, basic forms of concurrency safety, and type systems in general can be done this way. Two of those have eliminated entire classes of defects in enterprise and FOSS software using such languages. CVE list indicates the trial-and-error approach didn’t works as well. ;) Failure detection/recovery algorithms, done as protocols, can be used to maintain reliability in all kinds of problematic systems. Model-checking and proof has been most cost-effective in finding protocol errors, esp deep ones. Everything being done with formal methods also falls into this category. Just highlighting high impact stuff. Meyer’s Eiffel Method might be said to combine reductionism (language design/style) and logic (contracts). Cleanroom, too. Experimental evidence from case studies showed Cleanroom was very low defect, even on first use by amateurs.

              Googled a list of philosophies. Let’s see. There’s the capitalism school that says the bugs are OK if profitable. The existentialists say it only matters if you think it does. The phenomenologists say it’s more about how you perceived the failure from the color of the screen to the smell of the fire in the datacenter. The emergentists say throw college grads at the problem until something comes out of it. The theologists might say God blessed their OS to be perfect with criticism not allowed. The skeptics are increasingly skeptical of the value of this comment. The… I wonder if it’s useful to look at it in light of philosophy at all given where this is going so far. ;)

              I look at it like this. We have most of what we want out of a combo of intuition, trial-and-error, logic, and peer review. This is a combo of individuals’ irrational activities with rational activity on generating and review side of ideas. I say apply it all with empirical techniques used to basically just catch nonsense from errors, bias, deception, etc. The important thing for me is whether something is working for what problems at what effort. If it works, I don’t care at all whether there’s studies about it with enough statistical algorithms or jargon used in them. However, at least how they’re tested and vetted… the evidence they work… should have rigor of some kind. I also prefer ideological diversity and financial independence in reviewers to reduce collusion problem science doesn’t address enough. A perfectly-empirical study with 100,000+ data points refusing my logic that Windows is insecure is less trustworthy when the people who wrote it are Microsoft employees wanting a NDA for the data they used, eh?

              I’ll throw out another example that illustrates it nicely: CompCert. That compiler is an outlier where it proves little to nothing about formal verification in general most empiricists might tell you. Partly true. Skepticism’s followers might add we can’t prove that this problem and only this problem was the one they could express correctly with logic if they weren’t misguided or lying to begin with. ;) Well, they use logical school of specifying stuff they prove is true. We know from testing vs formal verification analysis that testing or trial-and-error can’t ensure the invariants due to state space exploration. Even that is a mathematical/logical claim because otherwise you gotta test it haha. The prior work with many formal methods indicate they reduce defects a lot in a wide range of software at high cost with simplicity of software required. Those generalizations have evidence. The logical methods seem to work within some constraints. CompCert pushes those methods into new territory in specification but reuses logical system that worked before. Can we trust claim? Csmith throws CPU years of testing against it and other compilers. Its defect rate bottoms out, mainly spec errors, unlike about any compiler ever tested that I’ve seen in the literature. That matches prediction of logical side where errors in proven components about what’s proven should be rare to nonexistent.

              So, the empirical methods prove certain logical systems work in specific ways like ensuring proof is at least as good as the specs. We should be able to reuse logical systems proven to work to do what they’re proven to be good at. We can put less testing into components developed that way when resources are constrained. Each time something truly new is done like that we review and test the heck out of it. Otherwise, we leverage it since things that logically work for all inputs to do specific things will work for the next input with high confidence since we vetted the logical system itself already. Logically or empirically, we can therefore trust methods ground in logic as another tool. Composable, black boxes connected in logical ways plus rigorous testing/analysis of the box and composition methods are main ways I advocate doing both programming and verification. You can keep applying those concepts over and over regardless of tools or paradigms you’re using. Well, so far in what I’ve seen anyway…

              @derek-jones, tag you’re it! Or, I figure you might have some input on this topic as a devout empiricist. :)

              1. 2

                Googled a list of philosophies. Let’s see. There’s the capitalism school that says the bugs are OK if profitable. The existentialists say it only matters if you think it does. The phenomenologists say it’s more about how you perceived the failure from the color of the screen to the smell of the fire in the datacenter. The emergentists say throw college grads at the problem until something comes out of it. The theologists might say God blessed their OS to be perfect with criticism not allowed. The skeptics are increasingly skeptical of the value of this comment. The… I wonder if it’s useful to look at it in light of philosophy at all given where this is going so far. ;)

                Awesome, hilarious paragraph.

                We have most of what we want out of a combo of intuition, trial-and-error, logic, and peer review. This is a combo of individuals’ irrational activities with rational activity on generating and review side of ideas. I say apply it all with empirical techniques used to basically just catch nonsense from errors, bias, deception, etc. The important thing for me is whether something is working for what problems at what effort. If it works, I don’t care at all whether there’s studies about it with enough statistical algorithms or jargon used in them.

                Yes, totally agreed.

              2. 2

                What else can we do? Our reason is fallible, our experiences are deceitful, and we can’t just throw our hands up and say “we’ll never know”.

                Why not? The only place these arguments are questioned or really matter is when it comes to making money and software has ridiculous margins, so maybe it’s just fine not knowing. I know high-risk activities like writing airplane and spaceship code matter, but those folks seem to not really have much contention about about if their methods work. It’s us folks writing irrelevant web services that get all uppity about these things.

            3. 4

              Empirical Software Engineering: I followed a course at my university on this. It was an eye opener. I can publish some of my reviews and summaries sometime. Let me already give you some basic ideas:

              1. Use sensible data. Not just SLOC to measure “productivity,” but also budget estimates, man-hours per month, commit logs, bug reports, as much stuff you can find.
              2. Use basic research tools. Check whether your data makes sense and filter outliers. Gather groups of similar projects and compare them.
              3. Use known benchmarks in your advantage. What are known failure rates, how can these be influenced? When is a project even considered a success?
              4. Know about “time compression” and “operational cost tsunamis”: these are phenomena such as an increase in the total cost by “death march” projects, and how operational costs are incurred already during development.
              5. Know about quality of estimates, and estimates of quality and how these can improve over time. Estimates of the kind “this is what my boss wants to hear” are harmful. Being honest about total costs allows you to manage expectations: some ideas (lets build a fault-tolerant and distributed, scalable and adaptable X) are more expensive than others (use PHP and MySQL to build simple prototype for X).
              6. Work together with others in business. Why does another department need X? What is the reason for deadline X? What can we deliver and how to decrease cost, so we can develop oportunity X or research Y instead?
              7. Optimize on the portfolio level. Why does my organization have eight different cloud providers? Why does every department build its own tools to do X? What are strategic decisions and what are operational decisions? How can I convince others of doing X instead? What is organizational knowledge? What are the risks involved when merging projects?

              Finally, I became convinced that for most organizations software development is a huge liability. But, as a famous theoretical computer scientist said back in the days: we have to overcome this huge barrier, because without software some things are simply impossible to do. So keep making the trade off: how much are you willing to lose, with the slight chance of high rewards?

              1. 2

                Any books you want to recommend?

                1. 1

                  I’m also interested in that but online articles instead of books. The very nature of empiricism is to keep looking for more resources or angles as new work might catch what others miss. Might as well apply it to itself in terms of what methods/practices to use for empirical investigation. Let get meta with it. :)

              2. 1

                Oh yeah, @hwayne, the other thing I was going to tell you in support of empirical investigation is we already have a group doing it with over a hundred million in funding. That’s the Software Engineering Institute. Hard to say how much high-impact work they’ve done, truth they’ve discovered, or brilliant methods they’ve invented. There was so much BS coming out of them that I mostly stopped following. They do good stuff, too, though. If anything, they seem to ride waves of real innovators or on verification/testing be behind groups with less funding. Might say something about using empirical investigation alone if that’s what they’re doing. I know there would be argument about that.

                However, that organization would be a good start on trying to get empirical projects done or funded if their politics don’t get in the way. Attention to those projects over others might increase funding for it over time. The money is not a problem for sure. :)