I can be fairly sure that none of those 42 somewhat serious issues were deliberately planted, because just about every one of them were found in code that I personally authored…
I love the honesty here.
The listing of CVEs is great; I wish more projects would do this because it lets you see a more informed look at the severity than you would get from NVD who often publish wildly inflated numbers. It also reminds me of Lua’s bug listing where they just go over every single bug that’s ever been found on one page: https://www.lua.org/bugs.html
In the curl project we generate several files as part of the release process and those files end up in the release tarball. This means that not all files in the tarball are found in the git repository.
He doesn’t explain the rationale behind this, so I can only guess, but what I typically hear about this is that this is so people can compile curl from a tarball without installing autotools; is that right? What I’ve never been able to figure out is why this is considered a good thing.
Are there really any potential users out there who are like “well, OK, I’ll install all the other dependencies like gcc and zlib and stuff, but asking me to install autotools is too much; I’m going to go use wget instead”? Is there any comparable situation with any other kind of build dependency where people regularly say “yes, compiling this project depends on having X, Y, and Z installed, well, except if you download the source code in a different way, then you only need X and Y”. It sounds utterly bizarre to me. Why not consistently require all the dependencies be present in order to compile it?
It used to matter, decades ago, when inodes had a premium. We have to constantly remind ourselves that the system under analysis was last designed in the mid-80s; what’s a gigabyte of disk?
I’ve been trying to do a Chesterton’s Fence here and make a good-faith effort to understand the problem first, but the more I learn about it the more I feel like I should probably just be OK with my ignorance, because actually learning why it’s done that way will incur some amount of psychic damage.
He doesn’t explain the rationale behind this, so I can only guess, but what I typically hear about this is that this is so people can compile curl from a tarball without installing autotools; is that right? What I’ve never been able to figure out is why this is considered a good thing.
I’m of the opposite view. I’ve never been able to figure out why this isn’t a good thing.
I have “fond” memories of downloading a tar file, and then realizing I have to install autotools in order to build it. And then cursing because of autotools incompatibility. If the author had only committed the files to revision control (or at least put them in the release tar file), then it would have saved me time and pain.
I feel like this is more an issue with the still-terrible status quo of dependency management and build systems for C.
You want the developer to distribute a partially built version of the source code because that’s easier for you to use with your toolchain. That’s fine and practical, but it also shows that the tools are bad: you wouldn’t do this with a Go project because you wouldn’t need to.
With a Go project you’d either download a complete binary or you’d download the source and build it yourself. You wouldn’t need or want this halfway house.
You want the developer to distribute a partially built version of the source code
I really think that’s an ideological statement, and not an engineering one. I’ve explained why above, but let me explain another way.
As the programmer, I don’t ship “configure” or “make” files to the end user. The end user sees a binary, and doesn’t really care what magic happens behind the scene.
As a programmer, I ship “configure” and “make” files to the builder who creates the binaries . One goal as the programmer is to have more people use my software. If I ship software which is impossible to build when there are autotools version conflicts (as I’ve run into), then that’s a major problem.
I should instead ship the “configure” scripts as part of the release tarball, and avoid avoid that problem entirely. And incidentally solve many other problems, too.
So what engineering problem is solved by not including “configure” scripts in the release tarballs? Disk space?
I explicitly said that I think this is fine and practical. I think it’s also a weird artefact that comes from C code being a pain in the ass to distribute.
I think it’s fine to keep generated files in version control if there’s a reason for it (they’re snapshot tests, or they’re a pain to generate because the dependencies are tricky or you need to use unshareable resources or they take ages to generate, or you’re never gonna modify their source or whatever).
What I’ve never been able to figure out is why this is considered a good thing.
It’s to work around problems in autoconf: the version used by the upstream developers might be slightly incompatible with other versions, leading to obscure build failures.
You could check in the generated files to git. I think there is a way to mark them iignored in GitHub. That way attempts to backdoor it will be more.visible and there will be a history in git..unfortunately that would then also make it difficult to trust pull requests from nonmaintianers (the files would need to be regenerated by a trusted maintainer and verified by a trusted CI, and make sure that the Ci is not changed in the same PR that modifies generated code)
He doesn’t explain the rationale behind this, so I can only guess, but what I typically hear about this is that this is so people can compile curl from a tarball without installing autotools; is that right? What I’ve never been able to figure out is why this is considered a good thing.
Having the generated files in the repo was frowned upon because of a variant of the “Don’t Repeat Yourself” mantra. Not having the generated files in the tarball was frowned upon because it made autotools a hard dependency for building the project.
Having the generated files in the repo also meant that slight deviations in build environments led to those files flip-flopping in version control (e.g. the version string of autoconf, as seen in https://github.com/nmap/nmap/blob/db9a5801d0d883b078d5d408e242760330ec37af/configure#L3) when you weren’t careful which files you committed (and on the other hand: if you had to enumerate files one by one, you might miss a file to commit).
It’s all a bunch of trade-offs. Having an auditable process that reproducibly creates a tarball (incl. generated files and timestamps) so that you can verify the author’s cryptographic signature against the tarball you created from a tag resolves the quandary without having to revisit all the problems that led to the current set of customs.
It’s a build-time only dependency that was avoidable by shipping the scripts. More than that, previous to autotools, some variations of “here’s a configuration script, and there’s a Makefile” already existed, so autotools was more of a developer tool to streamline the creation of that scaffold (and, on the way, create a regular UI for builders/users of the software).
Since some software packages had a build script instead of Makefiles, if there had been a mode for make to generate a portable shell script that does what make does (check dependencies, call commands), it might have become customary to ship that shell script instead of Makefiles, too.
Such habits die with their tools, so autotools based projects will continue to do what they’ve been doing since the 90s, while projects using other systems like meson aren’t adding any “precompiled” build systems (e.g. ninja files) into their release tarballs.
The point of autotools is to find out what tools / libraries you have on your system. It doesn’t really make sense to say “in order to figure out what’s on your system, you have to install a whole suite of tools”. Especially when those tools (configure scripts) could have been committed to git.
It’s just one more step to punish your users, and that never made sense to me. It got worse when the autotools used by the programmer were incompatible with the autotools you have locally. And then you can’t just run “configure; make; make install” to build the package you want. Instead you have to spend time fighting with *#$! autotools.
All that nonsense could have been avoided if the auto-generated files were commited to git.
For many years, my software needed nothing more than libc, CC, and GNU Make to build. It would just Figure It Out, and remove features where the underlying libraries weren’t available.
I recognize this isn’t a common opinion. Many people believe that “generated files don’t belong in revision control”. I’ve never seen arguments which convinced me on this subject. That belief seems based on ideology to me, and not on engineering.
There are many reasons for putting generated files into revision control. Those reasons make me happy, so I do it. Daniel Haxx disagrees, so he does what he wants with his project. That’s freedom,
It doesn’t really make sense to say “in order to figure out what’s on your system, you have to install a whole suite of tools”.
I do things like this all the time. I don’t need all of gcc to compile this program, but I install it anyway, because it’s declared as a dependency. It makes perfect sense to me; this is almost by definition what a dependency is.
I love the honesty here.
The listing of CVEs is great; I wish more projects would do this because it lets you see a more informed look at the severity than you would get from NVD who often publish wildly inflated numbers. It also reminds me of Lua’s bug listing where they just go over every single bug that’s ever been found on one page: https://www.lua.org/bugs.html
He doesn’t explain the rationale behind this, so I can only guess, but what I typically hear about this is that this is so people can compile curl from a tarball without installing autotools; is that right? What I’ve never been able to figure out is why this is considered a good thing.
Are there really any potential users out there who are like “well, OK, I’ll install all the other dependencies like gcc and zlib and stuff, but asking me to install autotools is too much; I’m going to go use wget instead”? Is there any comparable situation with any other kind of build dependency where people regularly say “yes, compiling this project depends on having X, Y, and Z installed, well, except if you download the source code in a different way, then you only need X and Y”. It sounds utterly bizarre to me. Why not consistently require all the dependencies be present in order to compile it?
It used to matter, decades ago, when inodes had a premium. We have to constantly remind ourselves that the system under analysis was last designed in the mid-80s; what’s a gigabyte of disk?
I’ve been trying to do a Chesterton’s Fence here and make a good-faith effort to understand the problem first, but the more I learn about it the more I feel like I should probably just be OK with my ignorance, because actually learning why it’s done that way will incur some amount of psychic damage.
I’m of the opposite view. I’ve never been able to figure out why this isn’t a good thing.
I have “fond” memories of downloading a tar file, and then realizing I have to install autotools in order to build it. And then cursing because of autotools incompatibility. If the author had only committed the files to revision control (or at least put them in the release tar file), then it would have saved me time and pain.
I feel like this is more an issue with the still-terrible status quo of dependency management and build systems for C.
You want the developer to distribute a partially built version of the source code because that’s easier for you to use with your toolchain. That’s fine and practical, but it also shows that the tools are bad: you wouldn’t do this with a Go project because you wouldn’t need to.
With a Go project you’d either download a complete binary or you’d download the source and build it yourself. You wouldn’t need or want this halfway house.
I really think that’s an ideological statement, and not an engineering one. I’ve explained why above, but let me explain another way.
As the programmer, I don’t ship “configure” or “make” files to the end user. The end user sees a binary, and doesn’t really care what magic happens behind the scene.
As a programmer, I ship “configure” and “make” files to the builder who creates the binaries . One goal as the programmer is to have more people use my software. If I ship software which is impossible to build when there are autotools version conflicts (as I’ve run into), then that’s a major problem.
I should instead ship the “configure” scripts as part of the release tarball, and avoid avoid that problem entirely. And incidentally solve many other problems, too.
So what engineering problem is solved by not including “configure” scripts in the release tarballs? Disk space?
I explicitly said that I think this is fine and practical. I think it’s also a weird artefact that comes from C code being a pain in the ass to distribute.
I think it’s fine to keep generated files in version control if there’s a reason for it (they’re snapshot tests, or they’re a pain to generate because the dependencies are tricky or you need to use unshareable resources or they take ages to generate, or you’re never gonna modify their source or whatever).
It’s to work around problems in autoconf: the version used by the upstream developers might be slightly incompatible with other versions, leading to obscure build failures.
Couldn’t you say the same thing of basically any dependency?
Depends if it happens frequently like it does with autoconf. (Well, except when they stopped cutting new releases for several years.)
You could check in the generated files to git. I think there is a way to mark them iignored in GitHub. That way attempts to backdoor it will be more.visible and there will be a history in git..unfortunately that would then also make it difficult to trust pull requests from nonmaintianers (the files would need to be regenerated by a trusted maintainer and verified by a trusted CI, and make sure that the Ci is not changed in the same PR that modifies generated code)
Having the generated files in the repo was frowned upon because of a variant of the “Don’t Repeat Yourself” mantra. Not having the generated files in the tarball was frowned upon because it made autotools a hard dependency for building the project.
Having the generated files in the repo also meant that slight deviations in build environments led to those files flip-flopping in version control (e.g. the version string of autoconf, as seen in https://github.com/nmap/nmap/blob/db9a5801d0d883b078d5d408e242760330ec37af/configure#L3) when you weren’t careful which files you committed (and on the other hand: if you had to enumerate files one by one, you might miss a file to commit).
It’s all a bunch of trade-offs. Having an auditable process that reproducibly creates a tarball (incl. generated files and timestamps) so that you can verify the author’s cryptographic signature against the tarball you created from a tag resolves the quandary without having to revisit all the problems that led to the current set of customs.
I don’t understand why this is frowned upon. Isn’t it normal to make you users install your dependencies if they want to compile your program?
It’s a build-time only dependency that was avoidable by shipping the scripts. More than that, previous to autotools, some variations of “here’s a configuration script, and there’s a Makefile” already existed, so autotools was more of a developer tool to streamline the creation of that scaffold (and, on the way, create a regular UI for builders/users of the software).
Since some software packages had a build script instead of Makefiles, if there had been a mode for
maketo generate a portable shell script that does what make does (check dependencies, call commands), it might have become customary to ship that shell script instead of Makefiles, too.Such habits die with their tools, so autotools based projects will continue to do what they’ve been doing since the 90s, while projects using other systems like meson aren’t adding any “precompiled” build systems (e.g. ninja files) into their release tarballs.
The point of autotools is to find out what tools / libraries you have on your system. It doesn’t really make sense to say “in order to figure out what’s on your system, you have to install a whole suite of tools”. Especially when those tools (configure scripts) could have been committed to git.
It’s just one more step to punish your users, and that never made sense to me. It got worse when the autotools used by the programmer were incompatible with the autotools you have locally. And then you can’t just run “configure; make; make install” to build the package you want. Instead you have to spend time fighting with *#$! autotools.
All that nonsense could have been avoided if the auto-generated files were commited to git.
For many years, my software needed nothing more than libc, CC, and GNU Make to build. It would just Figure It Out, and remove features where the underlying libraries weren’t available.
I recognize this isn’t a common opinion. Many people believe that “generated files don’t belong in revision control”. I’ve never seen arguments which convinced me on this subject. That belief seems based on ideology to me, and not on engineering.
There are many reasons for putting generated files into revision control. Those reasons make me happy, so I do it. Daniel Haxx disagrees, so he does what he wants with his project. That’s freedom,
I do things like this all the time. I don’t need all of gcc to compile this program, but I install it anyway, because it’s declared as a dependency. It makes perfect sense to me; this is almost by definition what a dependency is.
ah, this is about reproducible builds, not about
curl | bash.