Static analysis tool for breaking public API changes?

I've been reading the excellent Rust for Rustaceans and the chapter on API design contains a lot of useful tips on things that are part of your public API, some of which are super non-obvious (You added a non-pub field to a pub struct? Surprise! One of your library users just had their CI break)

It got me thinking that it would be absolutely awesome to have a static analysis tool that could express all of these changes for you. I believe that by definition they should all be discoverable statically. Bonus awesome if it clearly gave you a motivated example of why each is a breaking change (in the grand tradition of helpful rust error messages).

A quick search has made me aware of: rust-semverver, which appears to be in the nursery; RFC 1105 API Evolution; and cargo-public-items. Obviously none of these have the polish of, say, clippy, but they may be good places to continue work or draw from.

Opinion time (weakly held):

Right now I'm not convinced that such a tool focussing on semver rules per se is particularly desirable -- at least not initially. Reasonable people can and do disagree on the specifics of what sort of changes warrant what sort of version bump and not everybody uses semver. Rather, I'd suggest that a lint-style approach of "hey, here are all of the breaking changes you've made, now make up your own mind about what to do with this information" would be preferable. For example, breaking changes to #[doc(hidden)] pub items are breaking changes, but some authors will not mind making them because they "shouldn't be" while others will want to care about them.

As far as implementation goes, I'm going to write some thoughts here just so I don't forget them. My initial thought was to use a similar architecture to clippy, but rather than lint rules have rules that define some feature as part of the public API, if (big if) this is output in some diff-able format then you can "simply" have the tool run the analysis twice, diff the outputs, and construct a report from the changes. However, I see rust-semverver has taken an interesting approach by actually compiling the rlibs and I'd be interested to see how well that works in practice.

3 Likes

I just realised I never really asked a question here. I guess my questions are:

  • Have I missed any existing efforts in this direction?
  • What are peoples opinions on how such a tool should be designed to be useful?
  • What are peoples opinions on how such a tool would best work?
  • Does anybody want to help developing such a tool (it's definitely a bit much for me on my own)?

As a library author, it would be amazing if such a tool could integrate with cargo-release. That way I could run cargo release patch and it'll warn me if new functionality was added, or I might run cargo release minor and be warned of any breaking changes.

It's true that it won't detect more nuanced changes (e.g. going from O(1) to O(n) or when an upstream crate adds an impl so your generic function now accepts more/less types), but as per the 80-20 rule it should be quite possible to catch most of the trivial semver violations with some basic heuristics.

I really like this "summarize + diff" approach. Having lots of simple characteristics that you can pass to a simple diffing algorithm feels more scalable than comparing rlibs.

1 Like

Note that there's an RFC for this:

This RFC proposes a comprehensive set of guidelines for which changes to stable APIs are considered breaking from a semver perspective, and which are not. These guidelines are intended for both the standard library and for the crates.io ecosystem.

https://rust-lang.github.io/rfcs/1105-api-evolution.html#summary

And cargo uses semver in dependency resolution, so if you're using cargo you need to follow its rules.

So I think it's far more defined than your post implies, though there are still some points of disagreement (like MSRV).

A few ways to break downstream not mentioned in RFC 1105 are

1 Like

Yeah, I saw that RFC and it looks like a really good resource. To be fair, I haven't done a thorough review of it and other resources, however my statement still seems to be likely true, just because there's an RFC doesn't mean all reasonable users actually want to use those definitions (or that they are complete for everyone).

Hi! Author of cargo-public-items here (Edit: Now renamed to cargo-public-api). I think basing it on rustdoc JSON seems promising so far. Why would you start developing a new tool? Come join me instead :wink: My tool already supports diffing in a way that can be used to determine breaking changes. Obviously there is still work to be done, but it seems a lot more fruitful to join forces than to start from scratch. Also: I would also love to receive any kind of feedback, question, or feature request.

2 Likes

Hey @Enselic , great work on cargo-public-items! It certainly seems like the best way to auto track API changes right now.

I guess my concern with basing it off rustdoc JSON is that you're using a tool (rustdoc) in a way that it's not designed (for detecting breaking API changes, while it's designed primarily for human-readable documentation). Obviously what you're getting for that is a huge bootstrap in functionality (you don't have to parse the code yourself or write rules defining to what the public API is), but I worry that what you're lacking is a complete definition of what the public API is and rustdoc upstream may be uninterested in some of the details I'm interested in here.

For example, see quitedot's reply for some examples of subtle public API breakages. Perhaps rustdoc tracks all of those, perhaps it doesn't, but I guess my concern is that if and when you find a breaking change in the category of "technically and subtly part of the public API, but not appropriate for human-readable documentation" rustdoc will not want to add that to their parser (or may even remove features during refactors over time).

I'm probably over-thinking it, but that's where my head's at.

Thanks a lot for sharing your thoughts! Not sure I agree with you that rustdoc is not designed to detect changes in public API. One major use case for rustdoc is that it shows you the the public API surface of a crate. Which means you can detect changes to the public API.

However, I totally agree that using rustdoc JSON will not be able to detect ALL breaking changes. But I would like to propose to you that it is theoretically impossible to detect ALL breaking changes statically. Some breaking changes are in the runtime domain. How would you statically detect Bat segfaults when trying to view any file. · Issue #2013 · sharkdp/bat · GitHub for example without re-architecting? (I am comfortable to share that example since it was my fault it happened, which btw is the biggest blunder in my career, both professionally and with open source).

It is true that there is a risk rustdoc JSON is or becomes too limiting. I mitigate that by actively helping to change the rustdoc JSON format to support the use-cases of cargo-public-items.

I am confident that my tool can or will detect like 99% of breaking changes to crates, which I consider useful enough.

2 Likes

Never limit yourself.
:sunglasses:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.