Scientific code that produces correct looking results that’s a result of software bugs is neither correct or defensible. Yet we’re seeing claims that the low quality of scientific code is okay because it passes the shallow smell test common in scientific circles.
Of course, there’s the outliers like Lapack and BLAS which have had many eyes refining it. But they’re outliers because they’re explicitly not the one-off codes that’s commonly criticized.
My sister is a graduate student and I periodically help her with her code. I setup all the code hygiene tools, set her up with sentry, parallelized her simulation and implemented writing all results to SQLite. Through out this process she’s gotten better and better at writing software, enough to spot subtle bugs by eye because the rest of the structure is clean enough to not distract. Because of better software development processes, she’s been able to test more inputs and has has discovered and fixed more issues at a faster rate. SQLite alone has been such a positive influence that she has learned complex SQL just to explore faster.
Do materially better results improve the scientific cultural valuation (more like devaluation) of well established software development processes? So far, the answer is “No”. She’s gotten used to saying less about such because software development is seen as a lesser activity unbefitting of a proper scientist. Here we have marked improvements with minor assistance from an outside engineer (me) but it’s seen as a cultural negative to sully oneself with the nitty gritty of engineering.
It’s small wonder why a lot of scientific code is absolute shit. The culture is the source of the fault. It also can be the source of the fix.
I think increased criticism to poor software is a good influence to push for the scientific community to value their software a little bit more.
I was kind in your position as a job for the last three years but I am not a SE by formation, I just learned to program on the side and it got me my jobs in academia as a research assistant where I was plugged on different project to do the tech-heavy part (data processing/analysis) to release some pressure on the main two SE and gap-fill the programming knowledge of the scientists. I am able to understand the scientific part of the project on one side and the constraint to pre/post-process and back-end work on the “IT” part. My main experiences are in two field : Remote Sensing/GIS and Epidemiology and I was lucky to work with good scientists.
What I have learned mainly is that the IT infrastructure that you can use is on of the main chokepoint. In the remote sensing lab, we had access to our own calculation and storage servers with a lot of power. Scientists made a prototype script working on a local machine and go to see the IT guys or for the more tech-savy of them learn slurm and you can do a lot of stuff before hitting any wall. In the epidemiology lab, you have your local machine and maybe access to the university cluster when you compete with a lot of people to get your job running. The difference between having a homogeneous work environment that you can master and an heterogeneous where debugging is a plague can totally change the experience of learning for the scientist.
The way your PI sees and understands software change everything too. Some will put money where it is needed to be sure that software programming will never be an issue (hire some dev that will be a mix between a hotline and the gap between prototype and “production” code). Others will have no understanding of how it works and never be able to provide you or your peer any help on “how to do it”. Some many times, I have listen with a smile to some academics saying “just put it in the software like I have done in this basic example and voilà” even if in reality your dataset may blow your RAM on your local workstation.
And finally, time is an asset that not all PhD students or researchers have. I am working on a project where using an sqlite database could have been an option but my coworker is still learning the basics of R and how to perform spatio-temporal analysis and everything in between. I dropped to using CSV files and cutting the scripts on multiple files so he can learn from them.
It will be a big gain for a lot of scientist to learn how to use some basic tools (even git damn) but it so hard and sometimes, it just don’t fit either in the infrastructure or with the tools/data used. I may be a good time to push harder to integrate the tools than are common practice in the fields directly during bachelor/master instead of pushing on graduate study to speed-learn and unlearn what they know. In my opinion, it will push scientist to value their code more because they will know earlier what they are doing.
Counterpoint: whenever I’ve reviewed science code for grad school friends I’ve found subtle flaws that compromise the results, things that are less likely to have happened with better formatting and unit tests and such. It frustrates me that an important and legitimate criticism of modern scientific practice is going to get sidelined because it’s now the rallying cry of crazy people.
I used to do/maintain scientific code for a living.
I now do commercial and/or open source.
I’m horrified by my earlier standards, (and even more so by my ex-colleagues).
Yes, cross checks take you quite far. (ie. If the result is unphysical, it’s crap, end of story. So most (good) scientific code will have some sort of conservation law cross checks in it.)
But most scientific code is a spaghetti of configuration options and comment in/ comment out / #ifdef / … checks for one configuration in no way give me confidence for another.
Yes, there are different use cases and different pressures on scientific code. But, by gorrah, the scientists could really learn a lot from good modern software design and build and deployment processes.
Some of the claimed code reviews do appear to be all about point scoring.
Writing code has a low status in academia and the people doing it are essentially recent graduates, so most of the (scientific) code is awful. The most worrying aspect is the lack of tests.
It’s even worse than that, in my experience. Many scientific coders haven’t even graduated at all, and are working under unhealthy levels of pressure in environments where they have very little autonomy or professional standing. Few have any training in even the rudimentary software engineering practices that CS students are drilled in, and we know how inadequate those can be.
It’s pretty bad out there. Egotism and political machinations are woven through the culture of science. Poor engineering certainly increases the risk of bad results. Bad results increase the risk of bad policy. Political bias (internal, or otherwise) often drives funding in competitive fields, incentivizing sloppy practices that increase publication velocity and further the policy agendas of (often decidedly non-disinterested) funders, closing the loop on a vicious cycle.
Science is hard enough, even in boring disciplines that don’t attract much outside attention. Software is also hard enough already. We should be able to work together in good faith to improve the quality of research, but when we can’t, there are some pretty deep structural problems that become a little more visible.
It was tough to take this seriously while watching ads for removing loose arm skin with the appropriate cream. I’m going to ignore the specific example he was talking about (criticisms of a lockdown model) and share my ancedotes about scientific code.
The article raises a bit of a strawman by painting a picture of commercial developers concerned about “style” vs scientific folks who just care if it’s “right”. Scientific folks don’t need no unit tests because they are very familiar with the science. They don’t need input validation because their expert users (other scientists) are expected to be experts who “are expected to determine what correct and valid inputs are themselves”.
My first programming job as an undergrad (before becoming a “commercial developer”) was writing copious amounts of matlab code for sensor analysis and incorporating findings from scientists. Some of what I did ended up published. There’s absolutely no way in hell that software didn’t contain glaring errors.
Honestly I feel like various science disciplines are kind of screwed. They publish results based on computer models and yet there’s not much rigor around making sure those model implementations are correct (or even publishing them). A line like “users are expected to figure out which inputs will produce garbage and which won’t” should strike fear into the heart of anyone who’s ever written something trickier than hello world. Which basically means they are wrong. I realize some might consider this a jump, but afaik Knuth is about the only human alive to write correct code from specs. Every other mortal is going to have a fairly high defect rate. Any scientific advance based in part on software without extensive review of software by experts is probably wrong.
So I have not read the original critiques yet but I haven’t seen anyone details whether or not it’s actually wrong or if it’s just style complaints and lack of unit tests.
Blugh. These are valid concerns by all parties. At least, if they are made in good faith.
I’ve done both scientific and commercial work.
Scientific code needs work, badly. Even given the requirements, which are different, as the article makes clear. I’ve worked with scientific code in commercial settings even.
Now, does that mean the code/models/critiques are bogus in this scenario? Whatever. No dog in the fight here. Know the context of the place you’re criticizing, that’s all I have to say.
I think the core issue that academic code has is that those writing it aren’t programmers first and are often over worked grad students learning as they go.
Doesn’t this just set up a holy war, since by this logic commercial software engineers are wholly justified in ignoring the reviews of scientific coders?
Little fiefdoms aside, scientific code that is poorly written horrifies me mainly because it contributes to scientific illiteracy and provides ammunition to those who are politically inclined to reject or resist evidence… all we need is one of these socially-significant models to be truly but subtly buggered by the sort of coding errors the best practices of commercial SE have evolved to eliminate and we can kiss meaningful movement on climate change or vaccines goodbye for another decade. The most free-form virtuoso scientific coder alive would have to admit that IF their code quality was unassailable by a commercial SE in addition to being scientifically valid then they’d have done a better job.
I wonder if it isn’t time for a new type of major. Back when I started college, Computer Science as a major didn’t exist—it was Information Technologies under the Business college. It was about a year later when we actually got a Computer Science department, and later, Computer Science and Engineering (splitting the major into two tracks—software and hardware, while IT still existed under the Business college). Perhaps a major specializing in scientific computing [1], where students in this major can work with majors in other scientific disciplines to write their code. Just an idea.
[1] When Computer Science formally became A Thing, a friend and I knew that Comp Sci 101 was in Fortran, so we signed up for the first Fortran class we saw—which turned out to the wrong course. It was actually “Numerical Analysis in Fortran” where the problems of floating point computation was stressed. Halfway through the class we realized our mistake and where able to transfer out of it (mainly because we were easily passing the class). We never had a class like that in the actual Computer Science department. Shame, because it was an interesting class upon retrospect.
I wonder if it isn’t time for a new type of major. Back when I started college, Computer Science as a major didn’t exist—it was Information Technologies under the Business college. It was about a year later when we actually got a Computer Science department, and later, Computer Science and Engineering (splitting the major into two tracks—software and hardware, while IT still existed under the Business college). Perhaps a major specializing in scientific computing [1], where students in this major can work with majors in other scientific disciplines to write their code. Just an idea.
A bit tongue in cheek but it’s called physicsts for what I know. There is a lot of biais because it is only based on my personal experience but some of those I have knew where hired as programmers in other fields to help other scientist to go from prototype to more robust/fast programs for their research.
Scientific code that produces correct looking results that’s a result of software bugs is neither correct or defensible. Yet we’re seeing claims that the low quality of scientific code is okay because it passes the shallow smell test common in scientific circles.
Of course, there’s the outliers like Lapack and BLAS which have had many eyes refining it. But they’re outliers because they’re explicitly not the one-off codes that’s commonly criticized.
My sister is a graduate student and I periodically help her with her code. I setup all the code hygiene tools, set her up with sentry, parallelized her simulation and implemented writing all results to SQLite. Through out this process she’s gotten better and better at writing software, enough to spot subtle bugs by eye because the rest of the structure is clean enough to not distract. Because of better software development processes, she’s been able to test more inputs and has has discovered and fixed more issues at a faster rate. SQLite alone has been such a positive influence that she has learned complex SQL just to explore faster.
Do materially better results improve the scientific cultural valuation (more like devaluation) of well established software development processes? So far, the answer is “No”. She’s gotten used to saying less about such because software development is seen as a lesser activity unbefitting of a proper scientist. Here we have marked improvements with minor assistance from an outside engineer (me) but it’s seen as a cultural negative to sully oneself with the nitty gritty of engineering.
It’s small wonder why a lot of scientific code is absolute shit. The culture is the source of the fault. It also can be the source of the fix.
I think increased criticism to poor software is a good influence to push for the scientific community to value their software a little bit more.
I was kind in your position as a job for the last three years but I am not a SE by formation, I just learned to program on the side and it got me my jobs in academia as a research assistant where I was plugged on different project to do the tech-heavy part (data processing/analysis) to release some pressure on the main two SE and gap-fill the programming knowledge of the scientists. I am able to understand the scientific part of the project on one side and the constraint to pre/post-process and back-end work on the “IT” part. My main experiences are in two field : Remote Sensing/GIS and Epidemiology and I was lucky to work with good scientists.
What I have learned mainly is that the IT infrastructure that you can use is on of the main chokepoint. In the remote sensing lab, we had access to our own calculation and storage servers with a lot of power. Scientists made a prototype script working on a local machine and go to see the IT guys or for the more tech-savy of them learn slurm and you can do a lot of stuff before hitting any wall. In the epidemiology lab, you have your local machine and maybe access to the university cluster when you compete with a lot of people to get your job running. The difference between having a homogeneous work environment that you can master and an heterogeneous where debugging is a plague can totally change the experience of learning for the scientist.
The way your PI sees and understands software change everything too. Some will put money where it is needed to be sure that software programming will never be an issue (hire some dev that will be a mix between a hotline and the gap between prototype and “production” code). Others will have no understanding of how it works and never be able to provide you or your peer any help on “how to do it”. Some many times, I have listen with a smile to some academics saying “just put it in the software like I have done in this basic example and voilà” even if in reality your dataset may blow your RAM on your local workstation.
And finally, time is an asset that not all PhD students or researchers have. I am working on a project where using an sqlite database could have been an option but my coworker is still learning the basics of R and how to perform spatio-temporal analysis and everything in between. I dropped to using CSV files and cutting the scripts on multiple files so he can learn from them.
It will be a big gain for a lot of scientist to learn how to use some basic tools (even git damn) but it so hard and sometimes, it just don’t fit either in the infrastructure or with the tools/data used. I may be a good time to push harder to integrate the tools than are common practice in the fields directly during bachelor/master instead of pushing on graduate study to speed-learn and unlearn what they know. In my opinion, it will push scientist to value their code more because they will know earlier what they are doing.
Counterpoint: whenever I’ve reviewed science code for grad school friends I’ve found subtle flaws that compromise the results, things that are less likely to have happened with better formatting and unit tests and such. It frustrates me that an important and legitimate criticism of modern scientific practice is going to get sidelined because it’s now the rallying cry of crazy people.
I used to do/maintain scientific code for a living.
I now do commercial and/or open source.
I’m horrified by my earlier standards, (and even more so by my ex-colleagues).
Yes, cross checks take you quite far. (ie. If the result is unphysical, it’s crap, end of story. So most (good) scientific code will have some sort of conservation law cross checks in it.)
But most scientific code is a spaghetti of configuration options and comment in/ comment out / #ifdef / … checks for one configuration in no way give me confidence for another.
Yes, there are different use cases and different pressures on scientific code. But, by gorrah, the scientists could really learn a lot from good modern software design and build and deployment processes.
I know I have.
I think you can make your point without calling anyone “crazy”. Does that really help anything about the situation?
Some of the claimed code reviews do appear to be all about point scoring.
Writing code has a low status in academia and the people doing it are essentially recent graduates, so most of the (scientific) code is awful. The most worrying aspect is the lack of tests.
My take on the all source in one file issue.
It’s even worse than that, in my experience. Many scientific coders haven’t even graduated at all, and are working under unhealthy levels of pressure in environments where they have very little autonomy or professional standing. Few have any training in even the rudimentary software engineering practices that CS students are drilled in, and we know how inadequate those can be.
It’s pretty bad out there. Egotism and political machinations are woven through the culture of science. Poor engineering certainly increases the risk of bad results. Bad results increase the risk of bad policy. Political bias (internal, or otherwise) often drives funding in competitive fields, incentivizing sloppy practices that increase publication velocity and further the policy agendas of (often decidedly non-disinterested) funders, closing the loop on a vicious cycle.
Science is hard enough, even in boring disciplines that don’t attract much outside attention. Software is also hard enough already. We should be able to work together in good faith to improve the quality of research, but when we can’t, there are some pretty deep structural problems that become a little more visible.
It is not surprising that reproducibility is so hit-and-miss these days when so many scientists are this arrogant.
I think mankind would be further along if we had more Knuths and DJBs in the natural sciences–guys who’ll write a check instead of point a finger.
It was tough to take this seriously while watching ads for removing loose arm skin with the appropriate cream. I’m going to ignore the specific example he was talking about (criticisms of a lockdown model) and share my ancedotes about scientific code.
The article raises a bit of a strawman by painting a picture of commercial developers concerned about “style” vs scientific folks who just care if it’s “right”. Scientific folks don’t need no unit tests because they are very familiar with the science. They don’t need input validation because their expert users (other scientists) are expected to be experts who “are expected to determine what correct and valid inputs are themselves”.
My first programming job as an undergrad (before becoming a “commercial developer”) was writing copious amounts of matlab code for sensor analysis and incorporating findings from scientists. Some of what I did ended up published. There’s absolutely no way in hell that software didn’t contain glaring errors.
Honestly I feel like various science disciplines are kind of screwed. They publish results based on computer models and yet there’s not much rigor around making sure those model implementations are correct (or even publishing them). A line like “users are expected to figure out which inputs will produce garbage and which won’t” should strike fear into the heart of anyone who’s ever written something trickier than hello world. Which basically means they are wrong. I realize some might consider this a jump, but afaik Knuth is about the only human alive to write correct code from specs. Every other mortal is going to have a fairly high defect rate. Any scientific advance based in part on software without extensive review of software by experts is probably wrong.
So I have not read the original critiques yet but I haven’t seen anyone details whether or not it’s actually wrong or if it’s just style complaints and lack of unit tests.
Blugh. These are valid concerns by all parties. At least, if they are made in good faith.
I’ve done both scientific and commercial work.
Scientific code needs work, badly. Even given the requirements, which are different, as the article makes clear. I’ve worked with scientific code in commercial settings even.
Now, does that mean the code/models/critiques are bogus in this scenario? Whatever. No dog in the fight here. Know the context of the place you’re criticizing, that’s all I have to say.
I think the core issue that academic code has is that those writing it aren’t programmers first and are often over worked grad students learning as they go.
Doesn’t this just set up a holy war, since by this logic commercial software engineers are wholly justified in ignoring the reviews of scientific coders?
Little fiefdoms aside, scientific code that is poorly written horrifies me mainly because it contributes to scientific illiteracy and provides ammunition to those who are politically inclined to reject or resist evidence… all we need is one of these socially-significant models to be truly but subtly buggered by the sort of coding errors the best practices of commercial SE have evolved to eliminate and we can kiss meaningful movement on climate change or vaccines goodbye for another decade. The most free-form virtuoso scientific coder alive would have to admit that IF their code quality was unassailable by a commercial SE in addition to being scientifically valid then they’d have done a better job.
I wonder if it isn’t time for a new type of major. Back when I started college, Computer Science as a major didn’t exist—it was Information Technologies under the Business college. It was about a year later when we actually got a Computer Science department, and later, Computer Science and Engineering (splitting the major into two tracks—software and hardware, while IT still existed under the Business college). Perhaps a major specializing in scientific computing [1], where students in this major can work with majors in other scientific disciplines to write their code. Just an idea.
[1] When Computer Science formally became A Thing, a friend and I knew that Comp Sci 101 was in Fortran, so we signed up for the first Fortran class we saw—which turned out to the wrong course. It was actually “Numerical Analysis in Fortran” where the problems of floating point computation was stressed. Halfway through the class we realized our mistake and where able to transfer out of it (mainly because we were easily passing the class). We never had a class like that in the actual Computer Science department. Shame, because it was an interesting class upon retrospect.
A bit tongue in cheek but it’s called physicsts for what I know. There is a lot of biais because it is only based on my personal experience but some of those I have knew where hired as programmers in other fields to help other scientist to go from prototype to more robust/fast programs for their research.