I think the author completely missed the point of people’s complaints.
Nobody expects academics to write shipping “production” code that can be dropped into existing an existing codebase and used immediately.
People want to see the source code used to draw conclusions from research so that they can verify the conclusions actually hold and see exactly what’s being done. It doesn’t even matter if it’s half assed, one off code tied to a specific environment.
Announcing results based on home made software and not releasing the code is like a mathematician releasing results, but skipping the proofs and saying, “Trust me, I’ve done the math and proved this, don’t worry about it!”
Asking everybody to trust your research because you’ve run the numbers through your secret software and verified it, is hand wavy and not very open. It’s far too easy to write software that really only tells you what you want to hear.
Your point is just so, so important.
I recently attempted to reproduce a result published in an academic paper as a first step to extending it and validating my results against it. The author provided no code (though he did, for some reason, point out that he used Lisp). I wasn’t too worried upon first reading the paper because the important algorithms were described in some detail. Once I started, however, I realized that there were critical holes in the descriptions, transformations that must have occurred but weren’t documented. If the author had simply published the code all my questions would have been answered (even if the code was ugly). Instead I’m left guessing.
The source code is the only complete description of a CS research project (obviously if your work is pure theory then proofs take the place of code, and those have always been published), and it blows my mind that people are just now starting to admit it.
One of the difficulties with publishing code is that sometimes research projects are the work of semi-proprietary research. In these situations the holes in the description are purposeful, as they don’t want another organization to be able to replicate the parts of a system they consider proprietary.
That wasn’t the case in my situation, and frankly, if that is the case then publishing in a peer-reviewed journal doesn’t seem appropriate to me. How can someone possibly review a project properly if they can’t even know how it works? If a reader can’t reasonably reproduce the work based on the paper, then the paper really makes no contribution to science. It’s the academic equivalent of leaving a comment on a blog post with nothing but “FIRST!”
Note that the parts of the system they consider proprietary are not the parts that the research is focused on (if they are, then you are right, the research is anti-scientific, as it actively limits replication). Rather, the authors of these papers are often engaging in a careful dance to describe the research-relevant portions of the system without giving away proprietary information adjacent to the research. The degree to which these researchers are successful varies, but the general inclination is to err on the side of giving away less information, not more.
Perhaps public money shouldn’t go to unreplicable research. It seems like it only benefits the holder of the proprietary technology. It’s not actually science if it can’t be tested…
What it seems to be saying to me is almost “People expect research to matter more than it’s meant to.”
Which is a thought-provoking thing to hear, because I suppose I have indeed been assuming that the driving motivation behind academic research is to do something that’s important to somebody. I’m not clear on what worldview it would take to dedicate one’s life to it if one didn’t believe that, and I’m not convinced I understood correctly.
I would argue that a number of academics are in academia not because they find their work important (to themselves or to the world), but because they find it interesting.
That would certainly explain it.
To me it’s more like a chemist running experiments, describing how to run them, but not shipping the original test apparatus and chemicals with the paper. The paper should contain enough detail that you can, independently, build a similar apparatus and replicate the results. Of course, the chemist doesn’t ship the original apparatus for practicality reasons, but it’s also ideal if someone trying to replicate the results doesn’t use the original material and apparatus anyway, because independent replication can catch hidden dependencies that would be glossed over if you just reused the original apparatus.
I agree it’s a big problem if CS papers don’t include (at least in some extended tech-report version) enough details to independently implement and replicate the results. Releasing the code can be a stop-gap to paper that over, but I think not a real fix. That’s more like the chemist just having an open-house day where you can come re-run the experiment on their own equipment. Better than nothing, but not independent replication. It’s especially not an independent replication if the code release is (as some people are now advocating) just some big VM image. The fact that you can re-run the bit-identical VM image to get bit-identical results doesn’t really say a lot about whether the paper’s claimed result is actually true, in most cases.
If I have the time, I find it pretty educational to try to make my own small implementations of papers even when there is a code release, and in doing so I avoid reading their code in order to keep my implementation as independent as possible. It illuminates where there are unarticulated assumptions etc.
I think your analogy to chemistry is exactly wrong, but wrong in very useful way!
The great thing about software is that it can be shipped and duplicated without loss to any number of people. For research in any field, it would be tremendously useful to be able to start from a known (or claimed) working apparatus and then tweak until the same results are being generated by different pathways.
In CS research, that is actually possible. It is because of that fact that the omission of source code (and whatever other details are necessary to reproduce a paper) is so very unforgivable.
I do see your point about the usefulness of parallel implementations, but those can be reached much faster if the reference implementation is open-source.
This is the part I’m skeptical of; I think there’s a pretty big risk of not-really-parallel implementations that are cribbing too much uncritically from the source, in the style of people cargo-culting from StackOverflow. Why does this code do X? Well this R package that so-and-so wrote 15 years ago did X and I didn’t know why so I just copied it. That’s one reason I prefer not to look at source when reimplementing papers if at all possible, or at least do it as a last resort, because it’s quite hard to keep yourself honest if you do.
And I think the reason I disagree is that I believe this is not right,
it would be tremendously useful to be able to start from a known (or claimed) working apparatus and then tweak until the same results are being generated by different pathways.
This is useful as an approach to hacking or tinkering, perhaps even engineering sometimes, but I think not science, and the exact opposite of reproducibility. A useful paper usually abstracts something; not just that this pile of stuff I have happened to churn out some numbers that I reported, but the reason I put this pile of stuff together is that I believe I’m testing X, or proposing theory Y. And one test that you’ve done this abstraction successfully, rather than actually not testing the right thing or derailed by various confounds, is that someone can independently achieve the same result without starting from your original apparatus and all its baggage.
Some of this may depend on whether you’re more worried about false negatives or false positives. Maximally independent replication is intended to guard against false confirmation of results: if someone claims a certain chemical process does something, the gold standard replication is to try that process, as they describe, but on totally different equipment, from different manufacturers, in a differently configured lab with different experimenters, with chemicals sourced from different suppliers, etc. If you get the same result, you have fairly good reason to be confident that it’s correct. If you don’t, then it’s time to go hunting… maybe there’s a hidden dependence on specific impurities in a specific supplier’s product, so the original work wasn’t testing what it thought it was (this is pretty common), maybe there’s a dependence on some specific characteristic of one piece of equipment that didn’t seem important but turns out to be critical, etc. If you had just started from the exact original setup, it’s far more likely you’d miss some of this and think you confirmed the results, even though they may not really be correct.
Very much agree about independent verification. presumably your code implements some technique to achieve a certain effect. It should be possible for me to implement that technique.
But publishing original code helps a great deal in identifying unintended effects. Sometimes code works for the wrong reasons. As you allude at the end, we might blindly confirm the result by copying too much. Nevertheless, that’s also worthwhile so that somebody can extract the working part. Failure to independently repel doesn’t necessarily mean it didn’t happen the first time. It’s a balancing act.
The code is the steps to reproduce. Without it the paper cannot be peer reviewed conclusively, and the “expected results” are merely conjecture. Though I agree the code should not be the only method to reproduce, but it is an extremely helpful screen.
Actually, I like Racket and Haskell because the scientists in those communities do ship the code.
Not many people argue that society shouldn’t allocate a certain amount of capital to invest in research on the 10-100 year horizon. Nation states are the only entities that really can take that risk and history has shown that it pays off. The argument about the benefits of long-term fundamental research is not really something most rational people would debate, the only thing we do debate is how much capital nation states should allocate and who they should allocate it to.
Over the years, I’ve seen people evaluate research by how closely the paper
translates into a startup idea. I’ve seen people evaluate research by how
easily the work can get media attention.
This is where it argument looses people like me, who in industry, are constantly forced to justify our work to investors and shareholders because that’s how our system works. If you can’t justify your work to the funding agencies then you have to adjust your ideas to be more relevant to people who will allocate capital (like industry), that’s not ideal, but that’s life in a society based on capitalism.
In an ideal post-scarcity world, we’d be able to give every person enough resources to go off and build projects that benefit society on their own terms. But we don’t live in that world and that means tactical allocations of resources to people who can justify returns on investment, be they societal or economic.
I don’t know if it’s fair claim this is due to capitalism. I’m not sure decision makers in a communist system would be any more likely to allocate funding for crazy, out in left field ideas with no clear real world application.
There’s a never ending supply of ideas that will never result in any practical real world benefit to anybody, or will never go anywhere no matter how much time and money get thrown at them.
People are free to do all the research they want, but if they’re depending on other people’s resources they’ll always have to justify why the resources should be spent on that project instead of another one.
In case anybody missed the earlier piece, this is a followup to https://lobste.rs/s/i1qv9r/myth_cs_researchers_dont_publish_code