1. 45

    1. 30

      Post co-author here, AMA.

      What we did:

      1. Scan Rust’s most popular 1000 crates with cargo-semver-checks
      2. Triage & verify 3000+ semver violations
      3. Build better tooling instead of blaming human error

      Around 1 in 31 releases had at least one semver violation.

      More than 1 in 6 crates violated semver in at least one release.

      These numbers aren’t just “sum up everything cargo-semver-checks reported.” We did a ton of validation through a combination of automated and manual means, and a big chunk of the blog post is dedicated to talking about that.

      Here’s just one of those validation steps. For each breaking change, we constructed a “witness,” a program that gets broken by it. We then verified that it:

      • fails to compile on the release with the semver-violating change
      • compiles fine on the previous version

      Along the way, we discovered multiple rustc and cargo-semver-checks bugs, and found out a lot of interesting edge cases about semver. Also, now you know another reason why it was so important to us to add those huge performance optimizations from a few months ago: https://predr.ag/blog/speeding-up-rust-semver-checking-by-over-2000x/

      1. 5

        Very cool! Happy to hear this might be integrated into cargo. I really like the fact that the Elm package manager enforces semver at publish time.

        1. 8

          Thank you! I was inspired by the Elm package manager when building cargo-semver-checks. It was one of those “once you know it’s possible, it turns out the problem isn’t super hard” kinds of fortuitous situations. I happened to be in the right place at the right time with the right tools at my disposal.

      2. 4

        Right now I’m working on similar tooling for a different API ecosystem (fuchsia.dev).

        Three questions:

        1. Do you think crates.io should reject detectable server violations?
        2. Did you try building packages and then varying the micro version numbers of their dependencies to find practical, in-the-field breakages that your static analysis might not catch yet?
        3. How do you handle macros? Proc macros probably get you into the halting problem, but macro_rules might be tractable.
        1. 2

          Great questions!

          1. No, I explicitly think it shouldn’t. Semver violations are explicitly allowed when they serve the broader interest of the community, such as rolling back unintentionally-public functionality (has happened in Rust!), or fixing unsoundness errors or critical security issues (the Rust language itself does this!). I think maintainer judgment is final, and these tools should inform rather than enforce.
          2. I wasn’t sure which of two possible interpretations this was: is it “find crates that re-export 3rd party dependencies, where wiggling the 3rd party dependency can cause the re-exporting crate to violate semver” or is it “attempt to find semver violations in a different way, by looking for crates that might break when their dependency versions are wiggled in allowable ways”? The former would definitely produce more semver violations, but rustdoc JSON is currently not fully reliable for linking cross-crate data so our analysis currently is crate-local only. We’re working on it, the rustdoc and Rust folks are working on it, it’ll happen sooner or later. The latter isn’t that interesting to me personally, because I think it will be vastly more computationally expensive, and will also find lots of cases where crates declare dependencies like ^1 but actually use a feature added in 1.2. This “minimal versions” problem is well-known, and so not that interesting. And if we never try downgrading versions, only upgrading, we’re only likely to find the same issues everyone has already likely found by doing the same upgrade. In other words, I found our work interesting precisely because it had very high potential of discovering previously-unknown semver issues, and wouldn’t have been excited about it if it only told us what we mostly already knew.
          3. We currently don’t handle macros at all, beyond macro use in one’s own crate (at which point the macro is just a semver-irrelevant internal implementation detail). Rustdoc doesn’t give us a lot of information on macros (proc or otherwise), so we’ve been going after lower-hanging fruit elsewhere thus far. I also agree that macro_rules may be tractable, and I’ve been trading ideas with a few maintainers of macro-heavy crates on what would be most useful to them and most likely to find and prevent issues that might otherwise slip by. Their #1 request, btw, is proper handling of #[doc(hidden)] which macro-heavy crates use very often and now is our biggest source of false-positives since we currently (incorrectly) treat those items as being public API.
          1. 2

            For 2 I meant the latter but now I understand why it’s unappealing. For my own similar project I too am focused on static analysis of metadata rather than trying to compile things.

            For 3, does #[doc(hidden)] make macros inaccessible or just indicate to other crates that there’s no stability guarantee?

            1. 1

              For 2, makes total sense.

              For 3, the latter.

              Usually, the macros themselves are public API and not #[doc(hidden)]. The challenge is that macro expansion produces code that is in the crate where the macro is used, not the crate where it is defined. So anything the macro-expanded code uses from the macro-definition crate must be public — it’s being used cross-crate. But you often don’t want it to be public API because then you have to uphold a stability guarantee on “internal” implementation details.

              The usual answer then is that the macro is not #[doc(hidden)], but uses code that is public and #[doc(hidden)]. That way, the macro internals are public (therefore callable by the macro) but not subject to a stability guarantee because they are explicitly annotated as “not public API.”

      3. 3

        cargo-semver-checks is really nice but for more complex crates it’s difficult (at least for me) to figure out what exactly is broken.

        Is there a forum for people to discuss the output and the concrete lint warnings?

        An example:

        $ git clone https://gitlab.com/sequoia-pgp/sequoia
        $ cd sequoia/openpgp/
        $ cargo semver-checks --default-features
             Parsing sequoia-openpgp v1.16.0 (current)
             Parsing sequoia-openpgp v1.16.0 (baseline, cached)
            Checking sequoia-openpgp v1.16.0 -> v1.16.0 (no change)
           Completed [   0.378s] 48 checks; 46 passed, 2 failed, 0 unnecessary
        --- failure trait_method_missing: pub trait method removed or renamed ---
        A trait method is no longer callable, and may have been renamed or removed entirely.
                ref: https://doc.rust-lang.org/cargo/reference/semver.html#major-any-change-to-trait-item-signatures
               impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.23.0/src/lints/trait_method_missing.ron
        Failed in:
          method map of trait ValidAmalgamation, previously in file /home/wiktor/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sequoia-openpgp-1.16.0/src/cert/amalgamation.rs:401

        AFAIK this method was not touched for literally years (git blame shows 2020-04-11 22:52:18 +0200). Full output at https://paste.debian.net/1291329/

        Thanks for this incredibly useful tool!

        1. 2

          Thank you for checking it out, and for your candid feedback! It’s very much appreciated, and helps us make the tool better.

          If an output message is confusing or suspected false-positive like this, feel free to open an issue and we can triage. Just make sure to mention what commit you’re using when running cargo-semver-checks, to make reproduction easier — otherwise branches drift and that can sometimes make it hard.

          That error is complaining that it cannot find the trait method ValidAmalgamation::map anymore, but it existed in 1.16.0 published on crates.io.

          Given you’re saying that method still exists and hasn’t been modified recently, this is a possible false-positive so definitely good to have in the issue tracker.

          1. 2

            Just for continuity for people that want to follow the discussion I’ve filed an issue there: https://github.com/obi1kenobi/cargo-semver-checks/issues/536

            1. 1

              Thanks for following up and opening the issue, I appreciate it. I’m about to head to RustConf and will be travelling for the next couple of weeks, but I’ll take a look as soon as I can!

      4. 2

        The is sounds like really good work, congratulations!

        A suggestion: briefly describe rust-semver-check in the introduction, e.g. “rust-semver-check examines the JSON output of rustdoc for different versions of a crate and reports possible semver violations”.

        1. 1

          Thank you!

      5. 2

        I like the approach! Especially in identifying the gulf that can be closed by tools, instead of blaming humans (regardless of fault or lack thereof).

        Thanks for taking the time to write this up and share it.

        1. 1

          Thank you! In general I feel the size of that gulf is commonly underestimated, so I’m very interested in building better tools. As one example: databases, compilers, and distributed systems have advanced massively over the last 10 years, but for the most part the tooling story is generally the same today as it was 10 years ago.

          If that sounds interesting and you’d like to keep an eye on my work, you might find subscribing to my blog worthwhile. It supports RSS and also has a email notification option: https://predr.ag/subscribe/

    2. 3

      I don’t want to sound dismissive but sounds like a luxury problem to have. I’m thinking of the time crate which seems to have been basically rewritten from 0.1 to 0.2 (or 0.3), then I think regex broke all my use cases at one point, and I want to say it was the 1.0 release, but not so sure.

      What I mean is that my default stance is more like “According to semver it shouldn’t break so I only think it will break half of the time”.

      That said, I’m mostly happy with Rust here, but I don’t trust semver at all.

      Also great article and interesting initiative, if only the projects had enough tests to simply notice before a new release, so I always liked those “we tried this on every crate” posts by the Rust team, which doesn’t seem feasible in a repeated way for simple crates.

      1. 4

        The escape hatch in Semver is that if the major version is 0, then no Semver rules apply to the minor or patch version.

        1. 6

          This is true for standard semver, but Rust / cargo’s implementation is slightly different: it ignores leading zeroes, so 0.1 -> 0.2 is considered major, and 0.1.0 -> 0.1.1 is minor, etc. This is technically not compliant with semver, but … that’s above my pay grade :)

          cargo-semver-checks implements the same thing that Rust / cargo do, because its role is to prevent you from getting in trouble in Rust, under Rust’s rules.

        2. 3

          That is what google is doing with the popular guava java library and it has been extremely painful. In my last java job I actively replaced guava with apache-commons-* or wrote things myself to get out of this madness.

          These days I am doing mostly golang and things are a lot more predictable over there.

      2. 4

        I understand the sentiment. An equally valid standpoint (common in large projects with many dependencies) is that it’s a luxury to not have this problem. If one’s work depends on 50 crates, it’s fine if running cargo update breaks half the time. If one’s work depends on 500 crates and crates break half the time, one ends up doing no work other than fixing breakage from dependencies. At some point, projects just hit a “level cap” of sorts and cannot take on any more dependencies due to this effect.

        In a real sense cargo-semver-checks is a replacement for those “enough tests” that would allow maintainers to notice accidental breakage. Even if we can’t eliminate 100% of breakage, the data in this post shows we can eliminate a large portion of it. That can make all the difference in empowering maintainers and raising that level cap.

    3. [Comment removed by author]

    4. 1

      Bravo, thanks for the study and the data!

      I do want to poke on a false dichotomy in the post: That semver violations are either human error or a tooling problem.

      It’s great that the Rust community and ecosystem has aspirations here, and even greater that tooling can make assumptions on what most software in crates.io will adhere to. That said… Some projects may tactically violate semver if they know a change is valuable and also has no/low probability of breaking consumers. Some projects may choose to follow different conventions that look like semver but are not actually semver. (See: https://calver.org) Some projects may choose to just not do semver at all (See: http://sentimentalversioning.org and http://unconventions.org)

      The Rust community has had more than one “burn the heretic” moment… Please consider Semver as a worthy goal to aspire to, but not as a religious or moral duty. As tooling improves, and I believe it will, I just hope people keep in mind that a project that violates semver anyway may have good reasons for doing it, just like people who use unsafe might have a reason for it.

      1. 3

        Bravo, thanks for the study and the data!

        Thank you 😁

        Some projects may tactically violate semver if they know a change is valuable and also has no/low probability of breaking consumers.

        Agreed! This is why cargo-semver-checks aims to inform not enforce. We don’t want maintainers to violate semver by accident and without knowing it’s happening, that’s all. There are definitely “tree falls in the forest” situations where tactically breaking semver is the right thing to do, and we leave it to maintainers to decide when that is the case. (As I’m sure you already saw in the post.)

        Please consider Semver as a worthy goal to aspire to, but not as a religious or moral duty.

        Unfortunately, between the compiler and the cargo build tool, Rust already assumes that all crates follow semver. cargo update by default upgrades all dependencies to their largest non-major-bump versions, and the compiler only allows multiple major versions of the same crate to live side-by-side, not minor ones. While binaries may have more freedom, libraries that don’t follow semver can be quite difficult to use in Rust given that core assumption.

        I don’t think it’s a religious or moral duty. But I also wouldn’t use a Rust library that doesn’t at least attempt to adhere to semver, simply because it would be quite difficult to use it given the predispositions of the language tooling.

        I just hope people keep in mind that a project that violates semver anyway may have good reasons for doing it.

        100% agreed! This is precisely why we didn’t publish a list of the specific semver violations we found, nor name which crates or versions they are in. We don’t want any abuse aimed at maintainers on the basis of our data, because that would be misguided in addition to being wrong. If crate maintainers reach out directly to us, we’re of course happy to share the results with them.

        1. 3

          Unfortunately, between the compiler and the cargo build tool, Rust already assumes that all crates follow semver.

          Given the fact that A) reasonable people can and often do disagree about what it means to “follow semver”, and B) even with agreement on what it means, reasonable and well-meaning people nonetheless still often fail to “follow semver”, this feels like a choice that merits the “unfortunately” tag.

          1. 2

            A) reasonable people can and often do disagree about what it means to “follow semver”

            I’m not saying it was reasonable, but much disagreement also has been had over issues such how to indent code and where to put braces. These formatting squabbles are largely absent from modern Rust: one can run rustfmt, and whatever it produces is the “right” formatting, even if some resent it. Maybe we (in Rust) can similarly largely end disagreement over what it means to follow SemVer by encoding an acceptable-to-most-people definition into a tool (which should also help with part B, like the title says).

            1. 4

              +1 to the parent comment. Rust’s definitions about what is and is not acceptable in which kind of release are much more well-defined than the vast majority of other programming languages. The definitions aren’t necessarily obvious or identical to SemVer-as-defined-on-semver-dot-org, but they are well-thought-out and handle the vast majority of commonly-encountered cases — I’ve been using them to develop cargo-semver-checks and even contributed to them in small parts, so this is first-hand experience.

              So while not everyone in every programming language might agree on what goes in a minor version vs in a patch, in Rust this discussion is largely settled and the rules are written down. That makes it relatively easy for a tool like cargo-semver-checks to scan for violations of those rules, and then cite the rule while displaying the evidence of it being violated.

              1. 2

                See my other comment, but a good summary is that Go has already tried this, and while they’ll proclaim it a success in that technically they haven’t violated their policy, that’s no consolation to people whose code has nonetheless been broken by “well, technically that was defined as non-breaking” changes. And that’s kind of the whole problem here: no amount of explanation of the precise definition of “breaking” will un-break someone’s code or make them happy about having their code broken.

            2. 1

              I have much less faith in “just write it down and make a tool” than you do. In part because I expect people will just work around it (say, by bumping major on every release). In part because I think defining what is and isn’t “breaking” is at least as contentious as code formatting, and arguably more so.

              But mostly because people largely aren’t interested in the actual hair-splitting discussion about what does and doesn’t count as “breaking”. They’re interested in not having their code break, and their code is still going to break. I got yelled at a lot for pointing out something similar in a recent thread about Go’s backwards-compatibility approach, but: when someone’s code gets broken by a new release, and they’re told that it’s not actually a breaking change because the definition of “breaking” technically says so, it’s no consolation to the person whose code is now broken, and they’re likely to feel betrayed by the language/ecosystem.

              1. 2

                Rust has similar issues. In particular, there are breaking changes that are not considered semver-major. Two things give me hope for a better outcome:

                • Those breaking changes are possible to resolve automatically with a tool using only the information currently available. It’s just that we as a community haven’t yet invested in making that tool.
                • Rust allows using more than one major version of the same library at once in the same project. So bumping major every time isn’t as painful as it might seem at first, and unlike in say Python, major version upgrades don’t have to be ecosystem-wide all-or-nothing.
                1. 2

                  Well, Hyrum’s Law demonstrates that literally every change is potentially a breaking change, for someone. Semantic versioning is well-defined as a specification (at semver.org), and that spec doesn’t precisely define a breaking change (because it can’t). That means semver, generally speaking, isn’t an expression of an objective and verifiable property of software — it’s an expression of a subjective and best-effort intent from authors.

                  It’s cool that Rust has defined a set of things that constitute verifiable breaking changes, and built tools to detect and flag those things for authors. But I guess that makes Rust-semver a specific subclass of semver in general, and I suspect that unless that distinction is made painfully clear, many (most) people will understand “semver” as the general kind, not the Rust-specific kind.

                  1. 2

                    I think most languages have a set of things that are verifiable breaking changes. If I remove a function from the public API, that’s a breaking change in Python or Java or JS just as much as it is in Rust. These are the kinds of issues cargo-semver-checks detects and reports.

                    We still can’t catch and report general-case breaking changes, a la Hyrum’s Law. But as the blog post shows, even the small subset we can report today is useful, in that over a large real-world sample of code, it catches a lot of things that previously went unnoticed and were problems waiting to happen.

                    Neither cargo-semver-checks nor Rust claims to have “solved” semver. I’ve even refused to build a “suggest what version bump I need” feature for cargo-semver-checks because of that, even though it’s probably a top 5 most-requested feature for the tool. All we claim is that tools like this are useful in real life, even if though they won’t (and can’t possibly) catch everything.

                    1. 3

                      Absolutely! No doubt this tool provides real value.

              2. 1

                I have much less faith in “just write it down and make a tool” than you do.

                I no longer have such optimism, and I don’t know why I did. Though I still think the tool itself sounds worthwhile, for what it does, I’ve both thought about it more and realized that (being based on a JSON description of an API) this tool fundamentally can’t see changes in the implementation of functions, either of which I think would have tamped down my optimism. I agree with your second paragraph.

                1. 1

                  My 2 cents:

                  As disruptive as breaking implementation changes are, they simply aren’t that common compared to accidental breaking changes in APIs. Forget our semver survey. For every GitHub issue in a Rust project you can find where the reporter complains about a breaking implementation change, I bet I can find 5 (maybe even 10) GitHub issues opened by someone other than me where the complaint is a semver-violating API change.

                  Breaking implementation changes of any significance aren’t anywhere nearly close to happening in 3%+ of all releases, like our semver study finds for currently-detectable accidental breaking changes in APIs. If they were, I bet we’d always treat any version bump as a major version, and we’d also much smaller dependency graphs than we do now.

                  In other words, I think revealed preference shows that breaking implementation changes sound more scary than they are, while breaking API changes sound less problematic than they really are.

                  1. 1

                    What distinguishes API breaking changes from implementation breaking changes? Is it whether that breaking change is detected at compile time vs. at run time? In either case, does semver’s definition of breaking change make any such distinction? (AFAIK it doesn’t?)

                    1. 1

                      Semver itself does not make any such distinction.

                      Everyone, including library authors, is much more worried about implementation breaking changes. So your question comes up a lot. But because everyone is worried about them, folks also spend a lot more time and effort on protecting their code from such issues, for example using test suites.

                      Much less thought and effort is given to API breaking changes. Compared to how often this actually seems to happen in practice, very few people think “someone might mistakenly delete a public method and we might not catch it in code review.”

                      Given that data shows this happens a lot, and considering that API breaking changes are much easier to catch with static analysis, it’s one of those “we really should be doing this also, why weren’t we already?” kinds of situations.

                      1. 1

                        OK, but then even in your framing, what differentiates API breaking changes from implementation breaking changes? What’s the definition?

                        1. 1

                          If rustc can catch it, cargo-semver-checks should be able to as well — we’re a long way from that sort of completeness, but that’s the end goal. If a breaking change cannot cause a compile error, then our current approach has no hope of catching it either and maintainers should use other tools at their disposal to catch and prevent those changes.

                          There might be small edge cases where this definition is imperfect in one way or another, but it’s directionally correct.

                          1. 2

                            Gotcha, so API breaking changes are compile errors, and implementation breaking changes are runtime (or maybe testing) errors, I guess? 👍

                            1. 2

                              As a rule of thumb, yes 👍

      2. 2

        I do want to poke on a false dichotomy in the post: That semver violations are either human error or a tooling problem.

        Another example: sometimes the semver violation is a deliberate response to a human error: when someone makes accidentally makes a release that exposes some functions that were meant to be internal-only, and then immediately makes another release that removes them.

        1. 1

          The first release is both human error and a SemVer violation, but I don’t think the second release is either. SemVer considers the possibility of accidental violations, says “Use your best judgement”, and says that the correcting release may be either a major release or a patch release.

          1. 2

            Another example: sometimes the semver violation is a deliberate response to a human error: when someone makes accidentally makes a release that exposes some functions that were meant to be internal-only, and then immediately makes another release that removes them.

            In fact, we found at least one exactly such case — it’s mentioned in the maintainer interviews portion of the post.

            Use your best judgement

            This is why cargo-semver-checks seeks to inform not enforce. We don’t want to prevent maintainers from publishing something. We merely want them to be fully informed about the contents they are publishing, since often (but not always!) publishing a breaking change outside of a major version can be unintentional.

          2. 2

            I think the first release isn’t a semver violation because it doesn’t break anything?

      3. 1

        If I were to take a moralizing position on this, it would be “If one wants not to use SemVer, fine, but then one mustn’t use Cargo, because, if one uses Cargo, one’s users are justified in expecting one to adhere to SemVer, as Cargo stipulates.”

        1. 1

          Perhaps more mustn’t use crates.io. Even if one doesn’t use cargo for one’s own project, if the code is uploaded to crates.io, other people can use cargo to depend on it. But the point stands — semver is about communication and managing expectations. The section on #[non_exhaustive] in the blog post covers another example of this, where something that used to not be a semver-major change until 2019 has since become one, due to changing norms and expectations.

        2. 1

          I don’t know if you’re intentionally creating a very stupid strawman position to bolster my argument, but as a trivial counter point, both Rust and Cargo do not (can not) strictly follow Semver and have a different (valuable) idea of what a major version should communicate.