The section on compatibility seems unfair. In the maximal (Cargo) approach, it is posited that since CI runs on everything, you have strong evidence that they work together. But in the minimal (modules) approach, no such consideration is given. The same argument applies there, but better: in the maximal approach, you have to rerun CI for every library that transitively depends on you when you release a new version, invalidating all previous runs. But, in the minimal approach, every CI run stays valid until someone explicitly updates to the new version AND pushes a new tagged version of their own library. Even then, only that new tagged version requires a CI run. No other libraries are affected. I think it’s reasonable to assume people won’t publish new tagged versions of their libraries with broken dependencies, and so you’re much more likely to get a compatible set.
In other words, assuming authors test their releases, the only way to get an untested configuration in the minimal world is if you combine multiple libraries together that share a transitive dependency. Even then, you know that at least one subset of libraries is a tested combination. The maximal world contains this failure mode and more: every time a package is published, every transitive dependency may get a broken combination, and you aren’t guaranteed that any of your libraries have been tested in combination.
Sharing transitive dependencies is pretty common, at least in Rust world. As you pointed out, this cancels most of “you get configuration tested by author” advantage.
There is a tradeoff between getting configuration tested by author and getting configuration tested by ecosystem. Another tradeoff is between silently getting new bugs and silently getting new bugfixes.
I don’t believe it cancels out most of the advantage. The versions in the transitive dependencies also have to be different, which I believe will be rare due to how upgrades work. As for testing by the author vs testing by the ecosystem, all it takes is one library in the ecosystem depending on the same two libraries as you, and you get just as much ecosystem testing.
I agree with the bug tradeoff. Personally, I prefer stability, but I can understand that others may have a different preference. I think in the minimal world, people who want updates can explicitly ask for that, and in the maximal world, it seems people are starting to add flags to allow the other direction (–minimal-versions).
Sharing transitive dependencies with different versions is also common in Rust. Instead of making assertions, I probably should whip up a script to count and publish statistics, but exa/Cargo.lock should illustrate my point. exa is rather popular Rust command line application.
How to read Cargo.lock: top section is serialization of graph so hard to read. Bottom section is checksum, sorted by package name and version. exa transitively depends on both num_traits 0.1 and 0.2, and winapi 0.2 and 0.3. This is typical.
num-traits 0.1 actually depends on 0.2. None of the transitive dependencies there actually require 0.1 in a way that excludes 0.2 (only datetime requires ^0.1.35) as far as I can tell, so I see no reason it needs to be included in the build. Perhaps it’s included in the checksum for some other reason?
edit: I have since learned that ^ on v0 dependencies only allows patch level updates. So ^0.1.35 means >=0.1.35, <0.2.
Winapi 0.2 and 0.3 do appear to both be required in the build. This is due to the term_size crate using a ~0.2 constraint. While I do not have a windows machine to test right now, this commit bumped the version to 0.3. It was only a reorganization of imports, and I believe that all of the pub use entries in 0.3 would cover all of the old imports. I will test this out on a windows machine later tonight.
None of the dependencies on v1 or greater require multiple versions. People tend to attempt to respect semver, so this is expected. Also note that out of 55 transitive dependencies, only those two libraries had multiple versions, and only one would possibly require any changes, and it was a v0 dependency. I believe this is also typical, and I have surveyed a large corpus of Go packages that use dep and had the same findings. Even if the tools allow stricter constraints, typically they weren’t needed.
edit edit: To be clear, currently in the Go community there is no easy or supported way to include multiple versions of the same package into your library or binary. The Rust community can, and so perhaps some norms around what types of code transformations are possible differ, causing it to happen more often. I think the fact that Go has been getting along fine without multiple versions is evidence that Go doesn’t need that feature as much, but does not imply that for Rust. I don’t mean to argue that minimal selection would be a fit for Rust, but I don’t think it has the problems that the post describes in the context of Go.
Oops, you are right. num-traits is using so-called semver trick, which explains it better than I ever can. For crates using semver trick, it is indeed normal for num-traits 0.1 to depend on num-traits 0.2. A good way to think about it is that post-0.2 release 0.1.x deletes 0.1 implementation and provides 0.1 interface compatible shim around 0.2 implementation instead.
lalrpop/Cargo.lock is probably a better example. LALRPOP is a parser generator and transitively depends on regex-syntax 0.4, 0.5, and 0.6, without semver trick. I admit this is not typical, but it is also not rare. Multiple versions support has been in Cargo since forever.
Your dep analysis is fascinating. Thanks a lot for letting us know.
Thanks for another example. In this case, the Cargo.lock is shared for a number of workspaces, so many of the duplications are not actually present in the same artifact. Additionally, many of the different versions are present only in build-dependencies for some artifact. I analyzed the dependencies and reduced the set of real duplications down to these:
The first two duplications were fixed by upgrading docopt from 0.8 to 1.0. No code changes required, the tests passed, and would happen automatically with MVS. The third was fixed by upgrading string_cache from 0.7.1 to 0.7.3. Again, no code changes were required, the tests passed, and this would happen automatically. This also fixed the fifth duplication. The fourth duplication is the only one that caused any problems, as there were significant changes to regex-syntax between 0.4 and 0.5, and it directly depends on 0.4.
So in this case, there was only one dependency issue that would not have been solved by just picking the higher version out of about 70 dependencies, and the one failure was in a v0 dependency. So, while I agree they exist, I just don’t think they will be frequent, nor a significant source of pain.
In fact, the only times duplications happened were when “breaking” changes happened. The way the default version selector in Cargo exacerbates this by considering any minor version change in a v0 crate to be “breaking”. In only one example was it actually breaking, and in every other example, just using the largest version worked. In the MVS world, breaking changes require updating the major version, which will allow both copies to exist in the same artifact. So while sharing transitive dependencies is frequent, sharing transitive dependencies that do not respect semver is infrequent, and sharing transitive dependencies with incompatibilities within compatible semver versions is also infrequent, causing this to not be a problem in practice.
and sharing transitive dependencies with incompatibilities within compatible semver versions is also infrequent, causing this to not be a problem in practice.
I don’t think you can reach this conclusion. If someone were to do this analysis, time is a critical dimension that must be accounted for. I also think you aren’t doing a correct treatment of semver. Namely, if I were in the Go world, regex-syntax would be at v6 rather than v0.6, to communicate its breaking changes. Each one of those minor version bumps had breaking changes. It simply may be the case that some breaking changes are bigger than others, and therefore, some dependents may not be affected.
With respect to time, there is often a period of time after which a core crate has a new semver release where large parts of ecosystem depend on both the new version (because some folks are eager to upgrade) and also the older version. For example, there was a period of time a ~year ago where some projects were building both regex 0.1 and regex 0.2, even when there were significant breaking changes in the 0.2 release. You wouldn’t observe this now because people have moved on and upgraded. So the collection of evidence to support your viewpoint is quite a bit more subtle than just analyzing a particular snapshot in time.
(To comment on the larger issue, my personal inclination is that I’d probably like a world with minimal version selection better just because it suits my sensibilities, but that I’m also quite happy with Cargo’s approach, and really haven’t experienced much if any pain with Cargo that could be attributed to maximal version selection.)
Thanks for explaining the v6 vs v0.6 distinction better than I was able to. I was trying to get at that with the “breaking” paragraph. Cargo implicitly treats all minor version changes in the v0 major range as “breaking” by making the valid limit only in the minor range, in the same way it treats major versions in the v1 and above range as “breaking”. I think this is a great idea, but muddies the waters a bit on comparing ecosystems with respect to multiple versions of transitive dependencies. Like you said, in a Go world, it would be regex-syntax at v4 and v6, which would both be allowed in the binary at the same time.
About your point on talking about time, in a Go world, those would be regex v1 and regex v2, again, not causing any issues. I am claiming that it is rare that multiple versions of some package need to exist in the same artifact when they are within the same compatible semver range. For example, if both v1.2 and v1.3 are required in the binary at the same time. I agree an analysis through time is valuable, but rarity also depends on time, so sampling any snapshot will help estimate how often it happens.
In order to get an estimate for how often multiple semver compatible dependencies occur, I went through the git history of the above projects and their Cargo.locks, but only counting duplicates if they are of the form v0.X.Y and v0.X.Z or vX.Y.Z and vX.S.T. Again, v0 gets this special consideration because of the way that Cargo applies the default constraint. In order to make sure that the authors of these libraries weren’t pinning to some possibly older but semver compatible range, I checked their Cargo.tomls for any constraints that were not of the default form.
LALRPOP had no such conflicts in 15 revisions back to 2015. Every constraint was of the default form.
exa had no such conflicts in 115 revisions back to 2014. Every constraint was either default or "*".
There is no evidence in either of these repositories that at any time Cargo had to do anything other than pick the highest compatible semver version for any shared transitive dependencies.
This discussion has helped me understand better that the v0 range is going to be problematic for the Go modules system if people treat is as I expect and is encouraged: as a spot for breaking changes and experimentation. Cargo handles this gracefully by allowing breakage to be signaled in the minor version, but Go has no such design consideration. I hope that either a change is made to make this easier, or guidance is made in the community to avoid the problems.
I am claiming that it is rare that multiple versions of some package need to exist in the same artifact when they are within the same compatible semver range.
Oh interesting, OK. I think I missed this! I think I would indeed say that this is consistent with my experience in the ecosystem. While I can definitely remember many instances at which two semver incompatible releases are compiled into the same binary, I can’t remember any scenario in which two semver compatible releases were compiled into the same binary. I imagine Cargo probably tries pretty hard to avoid that from ever happening, although truthfully, I can’t say that I know whether that’s a hard constraint or not!
This discussion has helped me understand better that the v0 range is going to be problematic for the Go modules system if people treat is as I expect and is encouraged: as a spot for breaking changes and experimentation. Cargo handles this gracefully by allowing breakage to be signaled in the minor version, but Go has no such design consideration. I hope that either a change is made to make this easier, or guidance is made in the community to avoid the problems.
Yeah that’s a good point. I can’t think of any libraries I’ve ever published (aside from maybe a few niche ones that nobody uses) that haven’t had to go through some kind of breaking changes before I was ready to declare an API as “stable.” Usually they only happen because other people start to actually use it. The Go ecosystem could technically just reform their conventions around what v1 means. IIRC, the npm ecosystem kind of pushes toward this by starting folks at 1.0.0 by default I think? But that may be tricky to pull off!
I think this article has a small misconception about vgo (aka go modules): it doesn’t take the minimum version. go get always downloads the latest version. Thereafter the MVS algorithm picks the maximum of all the constraints.
EDIT: also I notice that it confuses the terms minimal and minimum. The Go algorithm is minimal because Russ Cox feels that nothing else can be taken away.
“The key to minimal version selection is its preference for the minimum allowed version of a module.” –Russ Cox
The maximum of the values of the constraints is the minimum of the versions allowed by the constraints.
It is more clear to call it the minimum, since the algorithm gives lower and lower versions as constraints are removed–it can only be pushed towards higher values by adding constraints. Conversely, the cargo algorithm “wants” the maximum version, and can only be dissuaded from it by adding constraints (or lockfiles).
It does take the minimum version. Yes, the name is minimal not minimum, but one of the property of that minimal algorithm is that it takes the minimum version.
An example should be clarifying. B is available in version from 1.0 to 1.10. A declares dependency on B >= 1.5. vgo resolves B 1.5, Cargo (and other package managers) resolves B 1.10.
Yep, I understand that. My point was that if A requires 1.5, C requires 1.2 and D requires 1.6 then the maximum of those is selected, i.e. 1.6. This has the side effect of requiring a deliberate upgrade act to get version 1.10. However the benefit is that if I run the resolution algorithm today then you run it next week when version 1.11 is released, we both get exactly the same set of dependencies and can reproduce one another’s builds.
Yes, I think we are all in agreement about what happens. The question is whether it is good. The drawback of vgo argued in the article is that B will inevitably get bug reports for 1.6 already fixed in 1.10. Another is that real world testing of B is spread along all versions from 1.0 to 1.10, while in Cargo most testing is against 1.10 while 1.10 is the latest.
Cargo (and other package managers) solve reproducibility with lockfile. Lockfile is admittedly not “minimal”, but apart from minimality it solves technical problem equally well.
The section on compatibility seems unfair. In the maximal (Cargo) approach, it is posited that since CI runs on everything, you have strong evidence that they work together. But in the minimal (modules) approach, no such consideration is given. The same argument applies there, but better: in the maximal approach, you have to rerun CI for every library that transitively depends on you when you release a new version, invalidating all previous runs. But, in the minimal approach, every CI run stays valid until someone explicitly updates to the new version AND pushes a new tagged version of their own library. Even then, only that new tagged version requires a CI run. No other libraries are affected. I think it’s reasonable to assume people won’t publish new tagged versions of their libraries with broken dependencies, and so you’re much more likely to get a compatible set.
In other words, assuming authors test their releases, the only way to get an untested configuration in the minimal world is if you combine multiple libraries together that share a transitive dependency. Even then, you know that at least one subset of libraries is a tested combination. The maximal world contains this failure mode and more: every time a package is published, every transitive dependency may get a broken combination, and you aren’t guaranteed that any of your libraries have been tested in combination.
Sharing transitive dependencies is pretty common, at least in Rust world. As you pointed out, this cancels most of “you get configuration tested by author” advantage.
There is a tradeoff between getting configuration tested by author and getting configuration tested by ecosystem. Another tradeoff is between silently getting new bugs and silently getting new bugfixes.
I don’t believe it cancels out most of the advantage. The versions in the transitive dependencies also have to be different, which I believe will be rare due to how upgrades work. As for testing by the author vs testing by the ecosystem, all it takes is one library in the ecosystem depending on the same two libraries as you, and you get just as much ecosystem testing.
I agree with the bug tradeoff. Personally, I prefer stability, but I can understand that others may have a different preference. I think in the minimal world, people who want updates can explicitly ask for that, and in the maximal world, it seems people are starting to add flags to allow the other direction (–minimal-versions).
Sharing transitive dependencies with different versions is also common in Rust. Instead of making assertions, I probably should whip up a script to count and publish statistics, but exa/Cargo.lock should illustrate my point. exa is rather popular Rust command line application.
How to read Cargo.lock: top section is serialization of graph so hard to read. Bottom section is checksum, sorted by package name and version. exa transitively depends on both num_traits 0.1 and 0.2, and winapi 0.2 and 0.3. This is typical.
num-traits 0.1 actually depends on 0.2. None of the transitive dependencies there actually require 0.1 in a way that excludes 0.2 (only datetime requires ^0.1.35) as far as I can tell, so I see no reason it needs to be included in the build. Perhaps it’s included in the checksum for some other reason?
edit: I have since learned that ^ on v0 dependencies only allows patch level updates. So ^0.1.35 means >=0.1.35, <0.2.
Winapi 0.2 and 0.3 do appear to both be required in the build. This is due to the term_size crate using a ~0.2 constraint. While I do not have a windows machine to test right now, this commit bumped the version to 0.3. It was only a reorganization of imports, and I believe that all of the pub use entries in 0.3 would cover all of the old imports. I will test this out on a windows machine later tonight.
None of the dependencies on v1 or greater require multiple versions. People tend to attempt to respect semver, so this is expected. Also note that out of 55 transitive dependencies, only those two libraries had multiple versions, and only one would possibly require any changes, and it was a v0 dependency. I believe this is also typical, and I have surveyed a large corpus of Go packages that use dep and had the same findings. Even if the tools allow stricter constraints, typically they weren’t needed.
edit: Here’s a link to my analysis of these types of issues in the Go community: https://github.com/zeebo/dep-analysis
edit edit: To be clear, currently in the Go community there is no easy or supported way to include multiple versions of the same package into your library or binary. The Rust community can, and so perhaps some norms around what types of code transformations are possible differ, causing it to happen more often. I think the fact that Go has been getting along fine without multiple versions is evidence that Go doesn’t need that feature as much, but does not imply that for Rust. I don’t mean to argue that minimal selection would be a fit for Rust, but I don’t think it has the problems that the post describes in the context of Go.
Oops, you are right. num-traits is using so-called semver trick, which explains it better than I ever can. For crates using semver trick, it is indeed normal for num-traits 0.1 to depend on num-traits 0.2. A good way to think about it is that post-0.2 release 0.1.x deletes 0.1 implementation and provides 0.1 interface compatible shim around 0.2 implementation instead.
lalrpop/Cargo.lock is probably a better example. LALRPOP is a parser generator and transitively depends on regex-syntax 0.4, 0.5, and 0.6, without semver trick. I admit this is not typical, but it is also not rare. Multiple versions support has been in Cargo since forever.
Your dep analysis is fascinating. Thanks a lot for letting us know.
Thanks for another example. In this case, the Cargo.lock is shared for a number of workspaces, so many of the duplications are not actually present in the same artifact. Additionally, many of the different versions are present only in build-dependencies for some artifact. I analyzed the dependencies and reduced the set of real duplications down to these:
The first two duplications were fixed by upgrading docopt from 0.8 to 1.0. No code changes required, the tests passed, and would happen automatically with MVS. The third was fixed by upgrading string_cache from 0.7.1 to 0.7.3. Again, no code changes were required, the tests passed, and this would happen automatically. This also fixed the fifth duplication. The fourth duplication is the only one that caused any problems, as there were significant changes to regex-syntax between 0.4 and 0.5, and it directly depends on 0.4.
So in this case, there was only one dependency issue that would not have been solved by just picking the higher version out of about 70 dependencies, and the one failure was in a v0 dependency. So, while I agree they exist, I just don’t think they will be frequent, nor a significant source of pain.
In fact, the only times duplications happened were when “breaking” changes happened. The way the default version selector in Cargo exacerbates this by considering any minor version change in a v0 crate to be “breaking”. In only one example was it actually breaking, and in every other example, just using the largest version worked. In the MVS world, breaking changes require updating the major version, which will allow both copies to exist in the same artifact. So while sharing transitive dependencies is frequent, sharing transitive dependencies that do not respect semver is infrequent, and sharing transitive dependencies with incompatibilities within compatible semver versions is also infrequent, causing this to not be a problem in practice.
I don’t think you can reach this conclusion. If someone were to do this analysis, time is a critical dimension that must be accounted for. I also think you aren’t doing a correct treatment of semver. Namely, if I were in the Go world, regex-syntax would be at
v6rather thanv0.6, to communicate its breaking changes. Each one of those minor version bumps had breaking changes. It simply may be the case that some breaking changes are bigger than others, and therefore, some dependents may not be affected.With respect to time, there is often a period of time after which a core crate has a new semver release where large parts of ecosystem depend on both the new version (because some folks are eager to upgrade) and also the older version. For example, there was a period of time a ~year ago where some projects were building both
regex 0.1andregex 0.2, even when there were significant breaking changes in the0.2release. You wouldn’t observe this now because people have moved on and upgraded. So the collection of evidence to support your viewpoint is quite a bit more subtle than just analyzing a particular snapshot in time.(To comment on the larger issue, my personal inclination is that I’d probably like a world with minimal version selection better just because it suits my sensibilities, but that I’m also quite happy with Cargo’s approach, and really haven’t experienced much if any pain with Cargo that could be attributed to maximal version selection.)
Thanks for explaining the
v6vsv0.6distinction better than I was able to. I was trying to get at that with the “breaking” paragraph. Cargo implicitly treats all minor version changes in thev0major range as “breaking” by making the valid limit only in the minor range, in the same way it treats major versions in thev1and above range as “breaking”. I think this is a great idea, but muddies the waters a bit on comparing ecosystems with respect to multiple versions of transitive dependencies. Like you said, in a Go world, it would be regex-syntax atv4andv6, which would both be allowed in the binary at the same time.About your point on talking about time, in a Go world, those would be
regex v1andregex v2, again, not causing any issues. I am claiming that it is rare that multiple versions of some package need to exist in the same artifact when they are within the same compatible semver range. For example, if bothv1.2andv1.3are required in the binary at the same time. I agree an analysis through time is valuable, but rarity also depends on time, so sampling any snapshot will help estimate how often it happens.In order to get an estimate for how often multiple semver compatible dependencies occur, I went through the git history of the above projects and their Cargo.locks, but only counting duplicates if they are of the form
v0.X.Yandv0.X.ZorvX.Y.ZandvX.S.T. Again,v0gets this special consideration because of the way that Cargo applies the default constraint. In order to make sure that the authors of these libraries weren’t pinning to some possibly older but semver compatible range, I checked theirCargo.tomls for any constraints that were not of the default form."*".There is no evidence in either of these repositories that at any time Cargo had to do anything other than pick the highest compatible semver version for any shared transitive dependencies.
This discussion has helped me understand better that the
v0range is going to be problematic for the Go modules system if people treat is as I expect and is encouraged: as a spot for breaking changes and experimentation. Cargo handles this gracefully by allowing breakage to be signaled in the minor version, but Go has no such design consideration. I hope that either a change is made to make this easier, or guidance is made in the community to avoid the problems.Oh interesting, OK. I think I missed this! I think I would indeed say that this is consistent with my experience in the ecosystem. While I can definitely remember many instances at which two semver incompatible releases are compiled into the same binary, I can’t remember any scenario in which two semver compatible releases were compiled into the same binary. I imagine Cargo probably tries pretty hard to avoid that from ever happening, although truthfully, I can’t say that I know whether that’s a hard constraint or not!
Yeah that’s a good point. I can’t think of any libraries I’ve ever published (aside from maybe a few niche ones that nobody uses) that haven’t had to go through some kind of breaking changes before I was ready to declare an API as “stable.” Usually they only happen because other people start to actually use it. The Go ecosystem could technically just reform their conventions around what v1 means. IIRC, the npm ecosystem kind of pushes toward this by starting folks at 1.0.0 by default I think? But that may be tricky to pull off!
I think this article has a small misconception about vgo (aka go modules): it doesn’t take the minimum version.
go getalways downloads the latest version. Thereafter the MVS algorithm picks the maximum of all the constraints.EDIT: also I notice that it confuses the terms minimal and minimum. The Go algorithm is minimal because Russ Cox feels that nothing else can be taken away.
“The key to minimal version selection is its preference for the minimum allowed version of a module.” –Russ Cox
The maximum of the values of the constraints is the minimum of the versions allowed by the constraints.
It is more clear to call it the minimum, since the algorithm gives lower and lower versions as constraints are removed–it can only be pushed towards higher values by adding constraints. Conversely, the cargo algorithm “wants” the maximum version, and can only be dissuaded from it by adding constraints (or lockfiles).
It does take the minimum version. Yes, the name is minimal not minimum, but one of the property of that minimal algorithm is that it takes the minimum version.
An example should be clarifying. B is available in version from 1.0 to 1.10. A declares dependency on B >= 1.5. vgo resolves B 1.5, Cargo (and other package managers) resolves B 1.10.
Yep, I understand that. My point was that if A requires 1.5, C requires 1.2 and D requires 1.6 then the maximum of those is selected, i.e. 1.6. This has the side effect of requiring a deliberate upgrade act to get version 1.10. However the benefit is that if I run the resolution algorithm today then you run it next week when version 1.11 is released, we both get exactly the same set of dependencies and can reproduce one another’s builds.
Yes, I think we are all in agreement about what happens. The question is whether it is good. The drawback of vgo argued in the article is that B will inevitably get bug reports for 1.6 already fixed in 1.10. Another is that real world testing of B is spread along all versions from 1.0 to 1.10, while in Cargo most testing is against 1.10 while 1.10 is the latest.
Cargo (and other package managers) solve reproducibility with lockfile. Lockfile is admittedly not “minimal”, but apart from minimality it solves technical problem equally well.