I regularly email researchers asking for a copy of their data. The most common reply is no reply, followed by “I no longer have the laptop that had the data on it”.
If the data is really interesting I will make an effort to extract it.
The article makes the big leap between “shippable products” and the much simpler goal of just having code that is available and compileable. Shippable product by industry standard means at the very minimum the code is on some public VCS, permissively licensed, documented, has a public bug tracker and is tested on major platforms. That’s not what’s expected of researchers.
For code distributed for research, it is very common to not even have it a) public b) a list of directions on how to build it, what’s needed is not a well-supported product; just the minimum viable effort to reproduce it. And in this day and age of free code hosting, free continuous integration and abundance of open source if academics aren’t giving minimal reproducible artifacts I have little sympathy for when their work gets judged as impractical and tossed out.
As I’ve moved away from academic work, I’m quite bitter about how many academics are readily willing to complain about the lack of industry uptake but are not willing to spend 30 minutes to learn Git or make a short list of install directions for a README file. There’s this vast asymmetry in the time it takes to reproduce bulids, 10 minutes on your end can literally save 10,000 community hours.
I find the whole debate unsettling.
On the one hand: it is amusing that software has a problem with reproducibility. There’s no reason this should be a problem.
On the other hand, expecting that research should come prepackaged for industry use strikes me as presumptuous. It wasn’t made for you to npm install and forget about.
It’s not necessarily that it should be prepackaged for industry, but the code that produced the results should be available in a form that would let me examine it, and verify that the results hold for other workloads or systems, as part of checking whether it would be useful in industry, rather than producing useful results only in a small set of scenarios.
Often, papers are not sufficient for a useful implementation.
Library scientists have a lot to say on the human factors of why this turns out to be a problem. A lot of what they study these days is even specifically about preserving software and data.
I have not seen the debate cast in terms of academia versus industry before this article, and I would be surprised if there were a large volume of complaints coming from prospective industry users. Software written for research purposes is unmaintained the moment it’s finished. I can’t count how often I’ve thought “hmmm, no commits in the past year, there’s no way my project is using this library” - and not even the most-popular academic code would pass this test.
While I know that it’s common for industrial users to want free support for their free software, it’s hard to picture anyone writing to a researcher who hasn’t released code, and asking if they would do so for the purpose of using it commercially. I suppose anyone who does do that, deserves to be called out on how presumptuous it is.
The discussion as I wish it were happening should be that it’s really important, science-wise, to be able to reproduce results. Especially when researchers have relied on empirical analysis of performance and haven’t done any theoretical complexity analysis. I’m certainly glad to hear about these efforts to improve it, even though they’re clearly not there yet.
The other hand is a bit of a straw man, since the article is not espousing that. Obviously if any paper could not be reproduced by a competent individual because of NEI, the research is worthless, and should have never been done. Documentation and replication is the backbone of science, without it you just have conjecture. Math based proofs without code are fine, but if you write code as a public servant, that code should be available.
The artifact evaluations the author talks about is a step in the right direction but it isn’t perfect. First, I don’t believe the submitted code is required to be published publically, its just run by the committee. And second, there’s nothing stopping an unethical researcher from submitting an artifact which just does sleep(8000); printData(); return 0;.
sleep(8000); printData(); return 0;
I’m not sure what a better solution would look like.