Blog post: A Formal Verification of Rust's Binary Search Implementation

After my project being leaked on r/rust, I scrambled to finish my blog post about it (which implied creating a blog in the first place (and even having to think of a name)), so enjoy :smile: :

While I haven't done so in the blog post, I should concede that this project still feels quite crazy to me. It doesn't really build on any prior research, which is usually not a very trust-inducing sign. Thus I would be very happy about any feedback from people actually working on Rust's semantics and the community at large :slight_smile: .

9 Likes

Really neat project. Thanks for sharing.

One thing I'm always skeptical about with these formal verification projects is their ability to continue delivering useful results over time. Once somebody submits a patch to binary_search the analysis you've done will be invalidated. Have you put any thought into how a test suite of formal verifications might be maintained over time? Is this something we could add to Rust's CI so that we know binary_search is correct for all time?

5 Likes

[quote="brson, post:2, topic:6644"]
One thing I'm always skeptical about with these formal verification projects is their ability to continue delivering useful results over time. Once somebody submits a patch to binary_search the analysis you've done will be invalidated. Have you put any thought into how a test suite of formal verifications might be maintained over time?[/quote]
Ah, excellent question! There's nothing I can do against bigger changes like changing the control flow, but it should already be resilient against smaller ones like renaming a variable or extracting an expression into a variable (except that loops are currently identified by the ID of their header block, maybe a simple incrementing counter would be more stable).

I should also mention that while my current proof is quite manual, the next version of Lean will focus heavily on automation, which should only increase resilience against changes.

Actually, now that I think about it, the reliance on rustc's internal API may be the more unstable part of the two. Compiling electrolysis with an updated rustc will frequently fail, and not updating will eventually prevent you from transpiling core because of changed intrinsics or language features.

If the stability is deemed to be acceptable, sure! All you need is a binary of electrolysis and of Lean. Most of the execution time is spent on compiling core up to MIR, which isn't all that long.

1 Like

From the post:

I haven't decided yet what to prove next, which is the real reason I started writing this post: to get suggestions from the Rust community about code in or outside of std that may be interesting to verify, without using too many dependencies or crazy unsafe tricks.

I always wished std::str::contains verified, when a method to verify Rust code became available. As far as I know it is unsafe-free. I worry it may be too challenging though.

It implements "Two-way string-matching", which is worst case linear time like Knuth-Morris-Pratt, but unlike KMP, does not allocate. It is a quite tricky algorithm, and Rust had an algorithm bug in the past. Somewhat embarassingly, contains("bananas", "nana") returned false.

(By the way, is there any formal verification of two-way string-matching implementation?)

So it would be great to check whether the current implementation is correct.

2 Likes

Ha, I found that same bug report when looking for potential candidates. Unfortunately, looking at the length of the implementation and particularly the comments, I agree with you that the algorithm is probably too complex. There not even being a formalization of the abstract algorithm, and the presence of sentences like

The algorithm hinges on the following theorem, which is stated without proof: [...]

don't sound like a good sign :sweat_smile: .

4 Likes

How does your strategy compare with refinement typing as seen in Haskell and TypeScript ( http://goto.ucsd.edu/~rjhala/liquid/haskell/blog/about/ Refinement types for TypeScript (and Typed Racket) [OT] ) regarding the assurance of code correctness it gives, usage simplicity, time necessary to use it, and so on?

Extra note: if you have some kind of way to make sure a function will never be called with wrong inputs (thanks to fully proof of the functions that call it, or refinement typing or something else), returning an Option<> is just a waste. That's why having an unsafe version of most std library functions is useful.

Well, one the goals of my project is to verify real-world programs without the need for annotations or modifications. Otherwise you either need to push your changes upstream (unlikely to happen unless your whole tool is adopted) or you have an obsolete fork in no time.

Regarding usage, in theory I could use the same SMT backends to achieve pretty much the same experience. But I'm focused on interactive proofs right now, which is definitely more work than push-button proving, but not limited to the narrow confines of SMT-decidability. It seems to me that these kinds of refinement typing languages are mostly used for verification 'in the small', like showing the absence of runtime errors, not complete correctness proofs.

Actually, I didn't see from the Refined TypeScript paper how/if they handle aliasing, which is of course the other big part of my project.

I'm not too sure what you mean by your extra note rearding Option and unsafety.

The "tool" needs to be part of the language or its standard default ecosystem, if you want it to be used. Annotations are OK as long as the standard library is fully annotated, large common libraries are annotated, etc.

I see, refinement typing is more like a type system that rules out some run-time errors but often doesn't give you a full correctness assurance, while in proofs you try to verify the whole correctness of the function(s). I guess the two things don't compete then, in theory in a program you can use regular static typing, refinement types and full proof according to how much critical the correctness of some part of the code is.

It was a simple note about run-time performance. Assume you have a "LiquidRust", that is a type system (refinement typing) that you can optionally put on top of your regular Rust code. If your refined types prove an array A of Ord values T is never empty, then calling A.iter().max() is always going to give you a value of A. But currently in the Rust standard library iter().max() doesn't return a value T, it returns an Option, so it's a waste of run-time data for the enum tag, and you have to perform an unwrap() even if you are statically sure it will never be None.

So a standard library designed for future implementation of refinement typing (or full proof) should offer functions like iter().unsafe_max(), or iter().proved_max() that return a item T instead of a Option. And somehow the type system should not require to wrap iter().unsafe_max() into an unsafe{} where your refined types prove your A is never empty.

I am not sure how this can be done... but I think having unsafe versions of all library functions is a start.

A simpler solution is for the programmer just to call unwrap() when the refined types prove that unwrap() never panics, and then the compiler could be modified to strip away the unwraps() it has proved they can never panic, and the IDE can color the two kinds of unwraps in different colors (or you use some module annotation that forbids unproven unwraps(). This way the LiquidRust code is still valid Rust code. The problem is, iter().max() still returns more data than it strictly needs...

I've been (and still am) quite busy with completing the thesis and its write-up, so instead of new full-length blog posts here's a quick list of prominent features I implemented since the original post:

  • A Lean theory of asymptotic analysis, on top of which I proved the asymptotic runtime of [T]::binary_search. If you thought the previous specification was nasty, look again. In less formal words, this proves that (an upper bound) of binary_search's runtime is not only logarithmic in the slice length (as expected) but also linear in the maximum comparison cost (a fact that probably would have been overlooked in an informal proof and isn't all that interesting all in all, but formal proofs can't lie even if they wanted to).
  • Enforced fixed-size integer arithmetic. If you didn't trust the binary_search correctness proof before, this should hopefully change your mind :slight_smile: .
  • An electrolysis reference to the Rust Reference. This should clarify my current coverage of the Rust language and some translation details.
  • To test the new lenses representation and bitwise operators, I've verified core parts of @bluss's fixedbitset crate. Why fixedbitset, you ask? Because the original goal of petgraph turned out to be slightly too enthusiastic, so I stopped at the first dependency :smile: .

Finally, in case you'll be at Rust Belt Rust and are interested in my project or just formal methods in general, I'd love to talk with you!

6 Likes