Meh. This is partly a joke (only partly), but it comes out of a belief in some parts of academia that writing bad code is the norm in academia, because “it’s only a prototype”. I think that writing bad code is generally a bad idea, including in academia, and that it does not actually come from core truth about prototyping or scientific research, but as an instance of a general decrease in research quality suffered under publish-or-perish pressure. If you keep asking people to produce more papers (which include code), you are going to get more code, worse. You are also going to get worse (less reliable) mathematical proofs, worse benchmarks, worse measurements, etc.
I’m not sure that there are differences between research fields. But in the context where this was originally written (PL academia), I believe this is mostly a bad idea, coming out of wrong institutional incentives. I think of the CRAPL or similar semi-jokes as an attempt to normalize or provide alternative justifications for what is, I think, just lower-quality work.
Idk, it takes a long time design and writing software as a full time job to get actually good at writing maintainable, robust, software, and even then it can be a fractal of complexity, with practices and techniques from one domain not translating well to others.
Researches might write a lot of code, but it’s under an unbelievably different set of circumstances, and certainly not the only thing they do, full time. I think it is actually reasonable for their code to be kinda shit from the perspective of professional software engineering.
That said, I do agree that the publish or perish model is fucked up and leads to all sorts of perverse incentives. I just don’t think that shaming researches for their crap code will help make anything better.
I recognize that many different practices exist, but here are some intuitions I have against this idea that “bad-practices code” is somehow a natural outcome of scientific prototyping:
Prototyping is not at all unique to research, people prototype in the industry or as hobbyists all the time. All reasons you can think of, in academia, to write shit code, they also exist somewhere in industry, many times worse. So the idea that academic code would somehow be special in how shitty it is written doesn’t sound very convincing to me. (One thing special about research code is that it tends to be solving difficult problems. I don’t see a relation between the fact that the problem domain is technical/difficult and the idea of giving up on good implementation practices.)
You can save a lot of work in research environment due to the fact that your code does not have users, or you are the only user. Decent user interface, error messages, etc., those take a lot of time in real life (if you aim to produce good software), and you can get rid of them. We often say that there is a 80/20 principle where 20% of the software is the core logic, and 80% is interface layers around it, a research prototyping environment lets you focus on the 20% and mostly ignore the 80%. I think that there are other properties of research environments that let us cut corner, and note that this is completely orthogonal to code quality / implementation practices.
There are several things to keep in mind when discussing software coming from academia:
If PhD students would have chosen industry instead, they would have had started as juniors
most PhD advisors are actually not great coders nor have industry experience
goal of PhD studies is to explore a topic in depth, in a limited time, and publish something others have not. This translates to very little overlap (kudos to advisors who can carve out pieces of a bigger project to different students). This further translates to very little reviewing and collaboration on the code, teams are small.
No one will jeopardize the quality of results or make code that spits out wrong results, be it benchmarking or whatnot. But the incentives are not there to produce maintainable nor understandable code, nor the system rewards for it.
The argument of “academics can produce good code” is quite similar to “one can write memory safe code in C”. Given enough time and support, yes. But there is simply not structure to bring that quality front and center.
Finally, this license is a pretty crappy one, and the joke is not that even funny.
As someone who worked (briefly) in a role supporting academic software, what you’re describing is what I experienced.
Essentially, there were a lot of people who were very skilled at their particular field of research, but would have been junior developers in the software industry - often despite having worked for many decades writing software for academia. A lot of them were aware that their software was bad, but more because they knew that they were inexperienced than because they knew what specifically was wrong. I remember one project where a reviewer had demanded that the code used in a paper be rewritten because it was so bad. The team kept on pointing to things like the lack of tests or not following naming conventions, but fundamentally the whole thing was a chaotic mess, right down to the architectural roots. It worked (probably), but it was written by people who were just trying to get it to work at each step, and didn’t understand how to think more globally.
One of the reasons I ended up leaving relatively quickly (<1y, iirc?) was that our team, supporting these academics, just had no idea what support to offer. Fundamentally, the people we were supporting needed either to take several years off working with real software working with experienced developers, or they needed someone to write their software for them, neither of which were realistic options. Instead, we offered a lot of courses in how to use git, and how to run pylint+black - both useful skills, but both essentially putting lipstick on a pig.
In fairness, I have not worked with academics in software research, and I would hope things are a little better there. In the rest of academia, though, programming is essentially just a tool you use to beat the HPC system until it gives you results that look right. And there’s no reason or real means to improve your software development skills, because like you say, there aren’t the incentives, nor the culture or structure that could support such an improvement. And I don’t think something like this license adds to that at all.
Or maybe it’s also a matter of culture and shared values, and we can actually teach this to students (and arrange, for example, for people to review each other code), and emphasize that it matters for good professional practices and helps further research. Encouraging people to use a CRAPL license is another way to influence a culture and transmit values, my point is that this move probably goes in the wrong direction.
Yes, absolutely agree on both points. The point I was trying to make is that there is no such widespread culture, nor incentive structure that would surface good programming practices.
I mostly agree, but it’s worth remembering that there are a lot of outliers.
I’ve worked with a lot of PhD students who either had a pile of open-source experience prior to their start or took a few years between undergrad and PhD in industry. The latter is often a good idea because a bit of time in industry gives you a better understanding of the interesting unsolved problems. Without that, it’s easy to spend your entire PhD solving a problem that doesn’t really exist.
Even without that, it’s common for PhD students to do one or more three-month internships with engineering groups. This gives them an intensive crash course in how to build real things.
Beyond that, academics are increasingly measured on ‘impact’ and, in applied sciences, that means ‘do people actually use the results of your work?’. It’s common for people who supervise PhDs to spend some time consulting and doing other forms of technology-transfer work. Internships for their students can form part of this (send the person who did the work to a company for a bit to work with them on getting it into production). Releasing code in a way that makes it easy for people to pick it up and build on it is great for this. One colleague recommended putting grant numbers in license headers so that you can just do a GitHub search and find all of the places your code for a particular project has ended up when you go to ask the same funding body for more money.
There are incentives to produce readable and maintainable code, they just aren’t the only incentives. The fact that (in computer science) there’s often one top-tier place to publish research and it has a single deadline per year is the biggest counter. More rolling deadlines would help a lot. If missing a deadline meant submitting to the same place a couple of months later, that would reduce the pressure to make something that works right now a lot.
A thing you are not mentioning is the academic definition of novelty which means that a lot of improving code for the benefit of future improvements cannot pay off within academia (not even for a different research group). Industry is free to put the code quality expectations into the grant agreements they propose.
And when the problem is hard, it does add the costs for deviating from whatever shape the authors conceptualise the code in, which is hopefully somewhat similar to the manuscript structure, but not whatever structure people call good practices this year. Especially given that the best practices people are talking about are about the mode of maintainability that is not a relevant consideration anyway.
Also, just the amount of hardcoding the inputs/parameters justified for a one-off experiment is already enough to call the code unacceptably low-quality for a deployment with multiple users…
There is of course yet another part of the problem, although its strength varies across fields, where the review practices push some subfields towards a shared narrative of «applications» of their theoretical research which is completely disconnected from, and with a bit of bad luck can be directionally opposite to, what would be worth rewriting into high quality code from an actually applied point of view. This is just sad, but true. I guess product development side of industrial software development employs enough PhDs to approach conferences and offer a handful of developer-days (with education in the relevant domains) per paper for reviewing from their perspective.
worse benchmarks, worse measurements
The way to improve it is to show what is used now, so that «well, we could also do X for the same effort as Y» can be discussed usefully.
Anyway, overall the only way to change incentives is to (a) do the cheap things that are clear-small-positives, (b) make clear that any other improvements require resource investment into the change of incentives. Yes, (b) can be called normalising the drawbacks. I would say that CRAPL is a part of both (a) and (b), making an effort to scope the code publication in a way that makes it cheap and encourages it, while stressing that other deisrable things are expensive.
A thing you are not mentioning is the academic definition of novelty which means that a lot of improving code for the benefit of future improvements cannot pay off within academia.
I don’t agree. I think that what you mean is that people cannot reuse their code later, because if they did it would not be novel. But this is false: it is pretty common to reuse something and then extend it with something new, or to use something in a new way that warrants publication. For example, specifically in PL research (which is the scientific context in which the CRAPL was written), there has been an explosion in the last decade or so building on Iris, a framework to build program logics on top of separation logic that is mechanized in the Coq/Rocq proof assistant. Iris is a freely available library, and people have written dozens of paper reusing it, extending it, etc. This is an example of a sub-sub-domain being created by the distribution and reuse of good-quality research code, and there are many similar examples – for example the work on top of the egg library.
Industry is free to put the code quality expectations into the grant agreements they propose.
I don’t know which form of research you are familiar with, but in our world “industry” does not offer research grants, the vast majority of research funding is of public origin. (Industrial companies fire their own research or research-and-development subgroups, except if they do machine-learning, and hope that buying startups regularly will suffice to keep innovating.)
when the problem is hard, it does add the costs for deviating from whatever shape the authors conceptualise the code in, [..] but not whatever structure people call good practices this year.
The sort of code that could want to use the CRAPL is the code that really is “crap”, as the acronym says. We’re not talking about following good practices that are outdated. We are talking about:
code that doesn’t compile with the current HEAD
there are no tests
variable names are shit
there are no comments
there is no packaging information, no information about what the dependencies might be, etc.
This sort of things.
I guess product development side of industrial software development employs enough PhDs to approach conferences and offer a handful of developer-days (with education in the relevant domains) per paper for reviewing from their perspective.
Something not entirely different happens in my field, called “artifact evaluation”. If a paper at a SIGPLAN conference is accepted, the authors are offered the possibility to upload their “artifacts” (software code, mechanized proofs, benchmarks scripts), and there is an “artifact evaluation committee” (AEC) that reviews them. AEC reviewers check that they can also build the code locally, that the proofs are valid, that the benchmark results on their machine are consistent with the paper’s qualitative claims, etc. (This review step is optional and does not affect acceptance of the paper.) See for example the POPL 2024 AEC pages. Artifact reviewers are typically PhD students, post-docs, and people who semi-recently migrated from academia to industry.
Note that this is a change that was decided within the research community, by researchers, who are trying to move the needle in common expectations in the community. To my knowledge there was no involvement of grant funding agencies or the industry in shaping this process to improve things.
I would say that CRAPL [makes] an effort to scope the code publication in a way that makes it cheap and encourages it.
I fail to see how CRAPL is better than just telling people: “Releasing your code is easy, just slap the MIT license on it and then make a public repo on {github,gitlab}. Oh, and write a three-paragraph README that summarizes the context of this code and clarifies that it is un-maintained.”
while stressing that other desirable things are expensive.
Are they, though? It’s not clear to me that writing shit code lets you do good research better than if you write good code. The cost of following semi-decent practices is not very high, and it’s easy to make result-invalidating mistakes or to find yourself blocked in your explorations if your code is really bad.
I think that what you mean is that people cannot reuse their code later, because if they did it would not be novel.
In itself, this is not blocking. But for a lot of advances, things which make sense to on top of the code of an article will get labelled «incremental advancements» and pushed far enough down the conference ranking to be not worth starting in the first place. Which is often a problem, there are quite a few follow-up papers in different fields that would get cited but won’t get published high enough to be worth the effort — but better code won’t solve it.
Sometimes people manage to structure the research project in a way compatible with adding parts to a framework. If it is Coq/Rocq at INRIA it can even work out better than SageMath…
I don’t know which form of research you are familiar with, but in our world “industry” does not offer research grants, the vast majority of research funding is of public origin.
They do create research collaboration with universities, but rarely. And as long as they do not do it more often, their interests are not academia interests, that’s my point exactly.
Note that this is a change that was decided within the research community, by researchers, who are trying to move the needle in common expectations in the community. To my knowledge there was no involvement of grant funding agencies or the industry in shaping this process to improve things.
Also note that the current artifact evaluation is exactly «can we reproduce the exact same result without looking into the code», not the code quality under the hood. Which is indeed useful for reproducibility / comparisons (but reuse is not anywhere near the priorities).
code that doesn’t compile with the current HEAD
there is no packaging information, no information about what the dependencies might be, etc.
Usually the former is because of the latter. Yeah, artifact evaluation has kind of solved specifying the dependency versions. Usually at least one dependency is pinned to an obsolete version, though, because without a good chance of stable resource investment there is no point in upgrading just to demonstrate that the algorithm works.
there are no tests
The test suite is literally generating what is used in the article. More tests could be useful for refactoring, but the chances were evaluated as low.
Comprehensive test suites tend to be more code than the thing tested, so they need a lot of code evolution to pay off.
variable names are shit
there are no comments
Like half of the article is usually more or less de-facto the comments for the algorithm. Variable names often are similar, too. Long formulas with long variable names would be less readable in the article, so nope.
The cost of following semi-decent practices is not very high, and it’s easy to make result-invalidating mistakes or to find yourself blocked in your explorations if your code is really bad.
You know when you switch from explore to exploit, which allows to cut a ton of corners. The code most thoughtfully structured for reuse that I have seen is very prone to hiding result-invalidating mistakes of some kinds (because of all the extra connections between the well-defined and well-isolated parts).
I fail to see how CRAPL is better than just telling people:
Because people are more likely to go «wait what» about a non-standard license than finish reading a README.
I remember years ago trying to use some academic code for image processing that turned out to have an enormous memory leak, to the point that my little work laptop could not run it successfully and the only answer was “get a bigger machine” because trying to fix this mess of spaghetti C++ was not worth the time.
Is it normalization of deviance, or addressing an elephant in the room? There seems to be an unspoken assumption that sloppy and ad-hoc methods (including but hardly limited to code artifacts) may nonetheless support good (or at least adequate) science. This is clearly true in some cases (as pathfinding is not highway engineering!) and clearly false in others. There has been a long ongoing “reproduction crisis” in the experimental sciences at large, and I’m not aware of it getting any better.
Given that engineering standards are sometimes in fact so low in research, regardless of the cause or any proposed remedy, is it not better to at least allow one’s work to be made public and inspected, rather than hidden for shame or fear of having to support it? That’s the spirit that I read in this. I’m not sure how realistic it is, as a proposal, but it’s at least a gesture.
It varies a lot between research groups, but generally ‘research quality’ code will cut corners in places that won’t affect the experiment, or which can be explained in the evaluation. For example, there were some things in CHERI Clang where you’d get a compiler crash instead of an error message. We also just disabled autovectorisation entirely (which made no difference on our early prototypes because they didn’t have vector units). These were known limitations but things that would need fixing before a real release (the Arm folks fixed a lot of these things for Morello).
That said, I discovered that there were some product groups at Microsoft who had lower standards for production code than I did for research-quality code, so there’s probably more variation between individual teams than there is between academic and industrial code.
Meh. This is partly a joke (only partly), but it comes out of a belief in some parts of academia that writing bad code is the norm in academia, because “it’s only a prototype”. I think that writing bad code is generally a bad idea, including in academia, and that it does not actually come from core truth about prototyping or scientific research, but as an instance of a general decrease in research quality suffered under publish-or-perish pressure. If you keep asking people to produce more papers (which include code), you are going to get more code, worse. You are also going to get worse (less reliable) mathematical proofs, worse benchmarks, worse measurements, etc.
I’m not sure that there are differences between research fields. But in the context where this was originally written (PL academia), I believe this is mostly a bad idea, coming out of wrong institutional incentives. I think of the CRAPL or similar semi-jokes as an attempt to normalize or provide alternative justifications for what is, I think, just lower-quality work.
Idk, it takes a long time design and writing software as a full time job to get actually good at writing maintainable, robust, software, and even then it can be a fractal of complexity, with practices and techniques from one domain not translating well to others.
Researches might write a lot of code, but it’s under an unbelievably different set of circumstances, and certainly not the only thing they do, full time. I think it is actually reasonable for their code to be kinda shit from the perspective of professional software engineering.
That said, I do agree that the publish or perish model is fucked up and leads to all sorts of perverse incentives. I just don’t think that shaming researches for their crap code will help make anything better.
I recognize that many different practices exist, but here are some intuitions I have against this idea that “bad-practices code” is somehow a natural outcome of scientific prototyping:
Prototyping is not at all unique to research, people prototype in the industry or as hobbyists all the time. All reasons you can think of, in academia, to write shit code, they also exist somewhere in industry, many times worse. So the idea that academic code would somehow be special in how shitty it is written doesn’t sound very convincing to me. (One thing special about research code is that it tends to be solving difficult problems. I don’t see a relation between the fact that the problem domain is technical/difficult and the idea of giving up on good implementation practices.)
You can save a lot of work in research environment due to the fact that your code does not have users, or you are the only user. Decent user interface, error messages, etc., those take a lot of time in real life (if you aim to produce good software), and you can get rid of them. We often say that there is a 80/20 principle where 20% of the software is the core logic, and 80% is interface layers around it, a research prototyping environment lets you focus on the 20% and mostly ignore the 80%. I think that there are other properties of research environments that let us cut corner, and note that this is completely orthogonal to code quality / implementation practices.
There are several things to keep in mind when discussing software coming from academia:
No one will jeopardize the quality of results or make code that spits out wrong results, be it benchmarking or whatnot. But the incentives are not there to produce maintainable nor understandable code, nor the system rewards for it.
The argument of “academics can produce good code” is quite similar to “one can write memory safe code in C”. Given enough time and support, yes. But there is simply not structure to bring that quality front and center.
Finally, this license is a pretty crappy one, and the joke is not that even funny.
As someone who worked (briefly) in a role supporting academic software, what you’re describing is what I experienced.
Essentially, there were a lot of people who were very skilled at their particular field of research, but would have been junior developers in the software industry - often despite having worked for many decades writing software for academia. A lot of them were aware that their software was bad, but more because they knew that they were inexperienced than because they knew what specifically was wrong. I remember one project where a reviewer had demanded that the code used in a paper be rewritten because it was so bad. The team kept on pointing to things like the lack of tests or not following naming conventions, but fundamentally the whole thing was a chaotic mess, right down to the architectural roots. It worked (probably), but it was written by people who were just trying to get it to work at each step, and didn’t understand how to think more globally.
One of the reasons I ended up leaving relatively quickly (<1y, iirc?) was that our team, supporting these academics, just had no idea what support to offer. Fundamentally, the people we were supporting needed either to take several years off working with real software working with experienced developers, or they needed someone to write their software for them, neither of which were realistic options. Instead, we offered a lot of courses in how to use git, and how to run pylint+black - both useful skills, but both essentially putting lipstick on a pig.
In fairness, I have not worked with academics in software research, and I would hope things are a little better there. In the rest of academia, though, programming is essentially just a tool you use to beat the HPC system until it gives you results that look right. And there’s no reason or real means to improve your software development skills, because like you say, there aren’t the incentives, nor the culture or structure that could support such an improvement. And I don’t think something like this license adds to that at all.
Or maybe it’s also a matter of culture and shared values, and we can actually teach this to students (and arrange, for example, for people to review each other code), and emphasize that it matters for good professional practices and helps further research. Encouraging people to use a CRAPL license is another way to influence a culture and transmit values, my point is that this move probably goes in the wrong direction.
Yes, absolutely agree on both points. The point I was trying to make is that there is no such widespread culture, nor incentive structure that would surface good programming practices.
I mostly agree, but it’s worth remembering that there are a lot of outliers.
I’ve worked with a lot of PhD students who either had a pile of open-source experience prior to their start or took a few years between undergrad and PhD in industry. The latter is often a good idea because a bit of time in industry gives you a better understanding of the interesting unsolved problems. Without that, it’s easy to spend your entire PhD solving a problem that doesn’t really exist.
Even without that, it’s common for PhD students to do one or more three-month internships with engineering groups. This gives them an intensive crash course in how to build real things.
Beyond that, academics are increasingly measured on ‘impact’ and, in applied sciences, that means ‘do people actually use the results of your work?’. It’s common for people who supervise PhDs to spend some time consulting and doing other forms of technology-transfer work. Internships for their students can form part of this (send the person who did the work to a company for a bit to work with them on getting it into production). Releasing code in a way that makes it easy for people to pick it up and build on it is great for this. One colleague recommended putting grant numbers in license headers so that you can just do a GitHub search and find all of the places your code for a particular project has ended up when you go to ask the same funding body for more money.
There are incentives to produce readable and maintainable code, they just aren’t the only incentives. The fact that (in computer science) there’s often one top-tier place to publish research and it has a single deadline per year is the biggest counter. More rolling deadlines would help a lot. If missing a deadline meant submitting to the same place a couple of months later, that would reduce the pressure to make something that works right now a lot.
A thing you are not mentioning is the academic definition of novelty which means that a lot of improving code for the benefit of future improvements cannot pay off within academia (not even for a different research group). Industry is free to put the code quality expectations into the grant agreements they propose.
And when the problem is hard, it does add the costs for deviating from whatever shape the authors conceptualise the code in, which is hopefully somewhat similar to the manuscript structure, but not whatever structure people call good practices this year. Especially given that the best practices people are talking about are about the mode of maintainability that is not a relevant consideration anyway.
Also, just the amount of hardcoding the inputs/parameters justified for a one-off experiment is already enough to call the code unacceptably low-quality for a deployment with multiple users…
There is of course yet another part of the problem, although its strength varies across fields, where the review practices push some subfields towards a shared narrative of «applications» of their theoretical research which is completely disconnected from, and with a bit of bad luck can be directionally opposite to, what would be worth rewriting into high quality code from an actually applied point of view. This is just sad, but true. I guess product development side of industrial software development employs enough PhDs to approach conferences and offer a handful of developer-days (with education in the relevant domains) per paper for reviewing from their perspective.
The way to improve it is to show what is used now, so that «well, we could also do X for the same effort as Y» can be discussed usefully.
Anyway, overall the only way to change incentives is to (a) do the cheap things that are clear-small-positives, (b) make clear that any other improvements require resource investment into the change of incentives. Yes, (b) can be called normalising the drawbacks. I would say that CRAPL is a part of both (a) and (b), making an effort to scope the code publication in a way that makes it cheap and encourages it, while stressing that other deisrable things are expensive.
I don’t agree. I think that what you mean is that people cannot reuse their code later, because if they did it would not be novel. But this is false: it is pretty common to reuse something and then extend it with something new, or to use something in a new way that warrants publication. For example, specifically in PL research (which is the scientific context in which the CRAPL was written), there has been an explosion in the last decade or so building on Iris, a framework to build program logics on top of separation logic that is mechanized in the Coq/Rocq proof assistant. Iris is a freely available library, and people have written dozens of paper reusing it, extending it, etc. This is an example of a sub-sub-domain being created by the distribution and reuse of good-quality research code, and there are many similar examples – for example the work on top of the egg library.
I don’t know which form of research you are familiar with, but in our world “industry” does not offer research grants, the vast majority of research funding is of public origin. (Industrial companies fire their own research or research-and-development subgroups, except if they do machine-learning, and hope that buying startups regularly will suffice to keep innovating.)
The sort of code that could want to use the CRAPL is the code that really is “crap”, as the acronym says. We’re not talking about following good practices that are outdated. We are talking about:
This sort of things.
Something not entirely different happens in my field, called “artifact evaluation”. If a paper at a SIGPLAN conference is accepted, the authors are offered the possibility to upload their “artifacts” (software code, mechanized proofs, benchmarks scripts), and there is an “artifact evaluation committee” (AEC) that reviews them. AEC reviewers check that they can also build the code locally, that the proofs are valid, that the benchmark results on their machine are consistent with the paper’s qualitative claims, etc. (This review step is optional and does not affect acceptance of the paper.) See for example the POPL 2024 AEC pages. Artifact reviewers are typically PhD students, post-docs, and people who semi-recently migrated from academia to industry.
Note that this is a change that was decided within the research community, by researchers, who are trying to move the needle in common expectations in the community. To my knowledge there was no involvement of grant funding agencies or the industry in shaping this process to improve things.
I fail to see how CRAPL is better than just telling people: “Releasing your code is easy, just slap the MIT license on it and then make a public repo on {github,gitlab}. Oh, and write a three-paragraph README that summarizes the context of this code and clarifies that it is un-maintained.”
Are they, though? It’s not clear to me that writing shit code lets you do good research better than if you write good code. The cost of following semi-decent practices is not very high, and it’s easy to make result-invalidating mistakes or to find yourself blocked in your explorations if your code is really bad.
In itself, this is not blocking. But for a lot of advances, things which make sense to on top of the code of an article will get labelled «incremental advancements» and pushed far enough down the conference ranking to be not worth starting in the first place. Which is often a problem, there are quite a few follow-up papers in different fields that would get cited but won’t get published high enough to be worth the effort — but better code won’t solve it.
Sometimes people manage to structure the research project in a way compatible with adding parts to a framework. If it is Coq/Rocq at INRIA it can even work out better than SageMath…
They do create research collaboration with universities, but rarely. And as long as they do not do it more often, their interests are not academia interests, that’s my point exactly.
Also note that the current artifact evaluation is exactly «can we reproduce the exact same result without looking into the code», not the code quality under the hood. Which is indeed useful for reproducibility / comparisons (but reuse is not anywhere near the priorities).
Usually the former is because of the latter. Yeah, artifact evaluation has kind of solved specifying the dependency versions. Usually at least one dependency is pinned to an obsolete version, though, because without a good chance of stable resource investment there is no point in upgrading just to demonstrate that the algorithm works.
The test suite is literally generating what is used in the article. More tests could be useful for refactoring, but the chances were evaluated as low.
Comprehensive test suites tend to be more code than the thing tested, so they need a lot of code evolution to pay off.
Like half of the article is usually more or less de-facto the comments for the algorithm. Variable names often are similar, too. Long formulas with long variable names would be less readable in the article, so nope.
You know when you switch from explore to exploit, which allows to cut a ton of corners. The code most thoughtfully structured for reuse that I have seen is very prone to hiding result-invalidating mistakes of some kinds (because of all the extra connections between the well-defined and well-isolated parts).
Because people are more likely to go «wait what» about a non-standard license than finish reading a README.
I remember years ago trying to use some academic code for image processing that turned out to have an enormous memory leak, to the point that my little work laptop could not run it successfully and the only answer was “get a bigger machine” because trying to fix this mess of spaghetti C++ was not worth the time.
Is it normalization of deviance, or addressing an elephant in the room? There seems to be an unspoken assumption that sloppy and ad-hoc methods (including but hardly limited to code artifacts) may nonetheless support good (or at least adequate) science. This is clearly true in some cases (as pathfinding is not highway engineering!) and clearly false in others. There has been a long ongoing “reproduction crisis” in the experimental sciences at large, and I’m not aware of it getting any better.
Given that engineering standards are sometimes in fact so low in research, regardless of the cause or any proposed remedy, is it not better to at least allow one’s work to be made public and inspected, rather than hidden for shame or fear of having to support it? That’s the spirit that I read in this. I’m not sure how realistic it is, as a proposal, but it’s at least a gesture.
It varies a lot between research groups, but generally ‘research quality’ code will cut corners in places that won’t affect the experiment, or which can be explained in the evaluation. For example, there were some things in CHERI Clang where you’d get a compiler crash instead of an error message. We also just disabled autovectorisation entirely (which made no difference on our early prototypes because they didn’t have vector units). These were known limitations but things that would need fixing before a real release (the Arm folks fixed a lot of these things for Morello).
That said, I discovered that there were some product groups at Microsoft who had lower standards for production code than I did for research-quality code, so there’s probably more variation between individual teams than there is between academic and industrial code.