Bug classes & Rust

What classes of bugs are there that Rust (though not necessarily uniquely) has mitigations for?

I think I have a pretty good idea of what undefined behavior is: It means that even though your code is accepted by the compiler, you can't really know what code it will generate, and what (global) effects it will have. As far as I understand, the goal is that Rust (sans unsafe) should be unable to express undefined behavior. Is this correct? I've seen this claim, but is it an expressed goal of the Rust project?

Logic bugs cover a lot of things, but they are basically code that will properly generate the correct machine code from the source code, but it's wrong by virtue of the source code being wrong (albeit correct enough to compile). These are "programmer mistakes" like failing to implements checks and state changes properly. (For instance accidentally checking the "read access" permission flag instead of the "admin privileges" flag). Strongly typed languages can help mitigate logic bugs (anecdotally, it definitely has in my case). And (type) abstractions (strings, containers) definitely help eliminate logic bugs. Can one reason that the borrow-checker helps against certain logic bugs? (Early on in my Rust adventures I started implementing reentrancy checks within an method, only to realize that by virtue of being &mut self I had compile time guarantees that the object would not call the same method twice [at the same time]).

Race conditions are mitigated by the type system (Send, Sync, and hiding the protected data inside a Mutex, only allowing access to it through a guard).

What does "sound" and "unsound" mean in this context? I've always read "sound" as "code will do what you expressed in source" and "unsound" as "don't do, will have unintended side effects", but I've never really known what it's proper definition is; and a while back I read someone claim that "soundness" has different definitions for different programming languages, which muddied the water.

Are undefined behaviors by definition "unsound"? (One of those "All undefined behaviors are unsound, but not all unsoundness is undefined behaviors" things?).

Are there other bug/problem classes that Rust features (whether in the language or the ecosystem at large) help mitigate (or even eliminate)?

3 Likes

Searching for previous questions and finding a thread with some useful links suggests you might want to look into these two links:

4 Likes

"Unsound" has a very specific definition in Rust. An API is unsound if it can cause Undefined Behavior without the consumer of the API writing unsafe.

Yes.

No. If no undefined behavior is involved in a bug, then it can't be unsound. It can still be logically incorrect, but "(un)soundness" is specifically used for the presence/lack of UB.

Non-reentrancy is not (merely) a logic bug in Rust, since racy shared mutability is considered Undefined Behavior. So the distinction between shared/mutable borrows (and their corresponding guarantees around lifetimes, unicity of mutable borrows, and constraints regarding interior mutability) helps avoid unsoundness/UB, not "logic errors".

Logic bugs that are easy to write in a weakly-typed language are usually prevented in Rust by means of the trait system and wrapper types. Traits allow the programmer to abstract over capabilities of types, which is exactly where most logic bugs happen, while wrapper types can confer such capabilities onto other types. I.e., you expected a type to have some capability that it doesn't in fact have; this is not necessarily the existence of a method but some other internal invariant, e.g. a validated e-mail address or a guaranteed-positive integer. You might want to google "parse, don't validate" for a subset of the relevant problems and the related conventional wisdom in the FP community.

9 Likes

You might be aware of this, but to other readers, a quick clarification on this point.

Race conditions are a good example of logic bugs that are harder to make in Rust. Race conditions in Rust are very much possible; Send and Sync exist “merely” to prevent something different, called “data races”. Those are always undefined behavior (UB). Data races are a specific case of race condition involving plain (unsynchronized and non-atomic) memory reads and writes, but not every race condition is a data race. Since the explicit goal of Rust’s claim to be a “safe language” is to make UB impossible to trigger in safe Rust code, only data races and the UB they create are what needs to be prevented. “Race condition” in general is way too broad of a concept for a compiler to solve anyways, but since many (inadvertent) race conditions are similar to or involving data races, Rust does prevent some mere logic errors here, too.


As an aside, writing this motivated me to look into the story of data races in so-called “memory safe” languages that don’t have UB in the first place. The answer commonly tends to be, roughly paraphrased: “the behavior is defined, but so weakly defined that you should absolutely avoid data races anyways”, at least for Java, AFAICT. Without having tried to understand their memory in any detail, I would imagine that data races make it really easy for your code to generate objects that are in a horrendously inconsistent state; but since the language manages all memory through the same garbage-collected heap, no matter how horrendously broken your data is, you still won’t have double-frees or use-after-frees. Of course, such an approach of making data-races no-longer-UB could never work in Rust: There’s no way to correctly track ownership in a data race, and since in Rust ownership==memory-management, correctly tracking ownership is a fundamental necessity.


I think, the only correct answer to this is: you are comparing terms that cannot be compared in this manner in the first place. “Undefined behavior” is a possible program run-time behavior, or behavior of a function when executed with a specific input. On the other hand “unsound” is a property applicable to some whole API, or some programming language feature, or maybe to the entirety of safe Rust, or any combination of these. Well… so maybe in the special case of “soundness” of a single function without any input or access to global state, this boils down to a single question about “undefined behavior”, so that in this case “has undefined behavior” and “is unsound” are the same.

In some more detail… actually, this became rather lengthy, so I’ll make it collapsible…

Click for more 🙂

Undefined Behavior

If a particular program is run with a particular set of inputs (and a particular set of choices for any nondeterminism/“randomness” that comes up during execution), then this program run can behave as “undefined behavior”.

The term “unsound” can be used as soon as we start to generalize the things we are talking about. Or… well… arguably…

…as a first step, we could consider a single program with a single input, but all possible choices of nondeterministic/random behavior during execution. If any such choice of nondeterminism leads to “undefined behavior”, arguably, the correct term to use is still “undefined behavior”. The point being that the behavior of this program on this input is undefined, since there’s no way to avoid the possibility of undefined behavior, and any behavior where UB is one possible outcome, is UB itself. In my opinion, it still makes sense to distinguish these cases somewhat, if for nothing other than describing the usefulness and limitations of miri, which can, as far as I understand, generalize over certain kinds of nondeterminism/randomness, but of course not over all of them. If you write a function that generates a random u32 number and triggers UB only if the number is 42, realistically, it’s not going to catch that case. Similarly, if you trigger UB only when certain allocations happen to be randomly more aligned or less aligner in certain ways.

As a second step, we could also consider more factors that might be beyond control of the program, so you could still speak of “undefined behavior”. E.g. the version of rustc: Code might violate safety requirements of the Rust standard library, in ways that lead to actual UB only in future versions of Rust. This is sometimes called “library UB” and contrasted with the term “language UB” for the “actual UB” I was talking about in the previous sentence.

Now going away from “undefined behavior” terminology. If we consider still only a single program, but all possible inputs, it’s reasonable to ask “does the program avoid UB for all possible inputs?” I’m not actually sure what the correct terminology for this concept is. Maybe “sound”? Or “memory-safe”? Just “safe”? As in “the program is safe” / “… sound” / “… memory-safe”. Here, “safe” sounds a bit like it’s referring to “safe vs unsafe Rust code”. We can avoid this terminology problem by ignoring programs with input. Every input could possibly be hard-coded anyways, once we generalize over safe programs, so let’s only focus on programs without any input. Such programs would still simply have either “undefined behavior” or not.

So, as discussed above, “(has) undefined behavior” is a property of a program, possibly with some specific input. Similarly, in order to determine where this undefined behavior comes from, we can talk about individual functions. The principle is the same. A concrete function for a concrete set of inputs (and while observing some concrete global state) can trigger undefined behavior, so that a program [for simplicity, program with no input] that executes that function with those inputs in that global state will have undefined behavior.

Unsoundness

As mentioned above, (un)sound is mostly a property of whole APIs (however small or large); which could be a single function, but usually involves many ways of calling such a function or API, with various inputs, multiple times, in various orders, doing various other operations in-between.

Let’s quote the UGC’s definition

we say that a library (or an individual function) is sound if it is impossible for safe code to cause Undefined Behavior using its public API

This means, that for soundness, we not only need to understand what “undefined behavior” is, which was discussed above, but also what an API is, and what “safe code” is. The TL;DR of course simply is that “sound” is like a universally-quantified “has no UB”: Similarly to the discussion above where we considered all possible inputs to a program, we can consider all possible usages of a function or an API, and when there’s some possibility to create UB this way, it’s called unsound. For a single function, that’s essentially “consider all possible inputs and all possible (relevant) global state(s)”.

But the “all possible usages of a function” involves the concept of “safe code” in Rust. Since, due to language bugs, it has never actually been true that it’s impossible to cause Undefined Behavior using safe [Rust] code in the first place, the above definition cannot be taken literally in a mathematical sense, otherwise, all APIs would be trivially unsound. Maybe an intuitive fix to this problem could be by instead requiring that “the API will most likely be sound once all language-level soundness-issues are fixed”. We need to weaken the statement to “most likely”, because we cannot actually know, how Rust’s soundness issues are going to be fixed – we can only guess, and (reasonably) hope we didn’t rely on any assumptions that those fixes will break.

Also “safe code” includes usage of the standard library, typically, which contains “safe” abstractions over unsafe code used internally. But it might include even more. Perhaps we also consider all sound Rust functions, potentially using unsafe internally, as “safe” Rust? No that won’t work; the definition would become cyclical, we’d be asked to already know what sound Rust code is in order to define what “soundness” means in the first place. But there’s also a problem if we don’t consider any third-party sound Rust functions that might use unsafe internally. The problem is that if we don’t consider them, then two APIs that are both individually sound by definition, could become unsound when used together.

One example for this problem is the API offered by the crate replace_with. The basic idea is to allow application of a fn(T) -> T on a &mut T reference, while correctly handling the case that the function might panic (by various strategies such as writing back a default/fallback value or aborting the program.

This API could be considered sound because it cannot cause UB using safe code. But then, imagine another API which has one type MyType and two functions fn provide_ref(callback: fn(&mut MyType)) and fn must_not_be_called(MyType), where must_not_be_called causes UB when called. This API, too, could be considered sound, since there is no safe Rust code that allows you to obtain an owned value MyType by calling provide_ref. (Assume MyType has only private fields, no constructors, really there’s no further API here at all.)

But together, these APIs suddenly become unsound. You can call the replace_with API inside of the callback passed to provide_ref to obtain an owned MyType value after all; then call must_not_be_called and :boom:.

The right interpretation in my opinion is that – somehow – the abovementioned APIs should both be deemed unsound. Until some official source (or some large consensus) decides (more or less arbitrarily) to define (at most) one of them to be sound. Which in this case, as far as I’m aware, has happened in some form, with people generally being of the opinion that replace_with is sound, and consequently, the other API definitely isn’t.

I have only now, writing this answer, thought about that there might be some difficulty in formally defining the concept of “soundness” in a way that does have the effect that – as intended – the two APIs described above would initially both be considered unsound. Maybe someone else has ideas how that could be accomplished, mathematically? Anyways… ignoring this problem, a precise definition of “soundness” should then presumably also mention a set of defined to be sound API patterns, which includes the standard library, but possible even also additional, externally, things, such as replace_with.

10 Likes

Altohugh I replied a simple "yes" above, I agree with this sentiment generally (but I guess I was too lazy to elaborate :sweat_smile:). To clarify, my "yes" given as an answer to this question is to be taken more precisely as "if you can ever observe your code cause UB, then something, somewhere is definitely unsound".

6 Likes

Apart from the memory safety and UB protections, Rust has other features that help with common programming mistakes:

  • Use of Option and Result makes it impossible to use the returned value of a failed function call as if it succeeded. Unlike e.g. golang where value, err := can both exist at the same time, and the language won't stop you from using value unconditionally without checking err. Or functions that return int where >= 0 is some useful value, but -1 is an error. That wouldn't stop you from taking -1 and putting it where the success value should be. The laziest .unwrap() error "handling" in Rust still stops the program or thread before the wrong value propagates.

  • The convention to use Mutex<T> instead of separate T and Mutex makes it impossible to use T without going through the Mutex first.

  • Exclusive ownership can be used to enforce that some method calls can only be made once, or are the last use of an object. For example if you have a database connection with fn close(self), then db.close(); db.query(…) is impossible. It's like preventing use-after-free, but on a higher level.

  • Collections typically require &mut self for mutation, which is an exclusive loan. This prevents the iterator invalidation problem, because an in-progress iteration will conflict an attempt to borrow the same object exclusively for mutation of the same object. There are languages where this isn't unsafe in terms of memory-safety, but it usually has unexpected results and can lead to bugs.

  • Ownership makes it clear when collections are shared and can be mutated or not. In a GC language that allows many mutable references to the same object, if you keep a reference to a vector/list/array that has been provided by a caller or exposed to 3rd party code, you can't know for sure that it won't be mutated by the caller sometime later. This typically leads to defensive programming (e.g. you'll copy it to be sure nobody else can mutate it), but in Rust that's unnecessary — you clearly know when you own something exclusively.

19 Likes

Everything that requires reading and writing the same thing at once.

For example, in C++ v.emplace_back(std::move(v[0])) will work fine if there's existing capacity, but is a UB use-after-free if it needs to reallocate the vector so the first element is no longer in the same place. (I think v.push_back(v[0]) also has that problem, but I've forgotten a whole bunch of details on exactly the argument types end up being in C++ for these things.)

Or even in not-full-of-UB languages, something like seq.AddRange(seq.Reverse()); in C# will compile just fine, but will (if you're lucky) result in an exception getting thrown because you're modifying and iterating the same sequence at the same time. Whereas in Rust, v.extend(v.iter().copied().rev()); will fail to compile.

6 Likes

I don't know if this has been mentioned, but leveraging the ownership model makes it possible to create APIs that make some types of bugs impossible, by consuming Self.

For example, just a few weeks ago at $myDailyJob one of my colleagues faced a nasty issue with Akka HTTP when a seemingly safe change that required deserializing a payload twice resulted in the error:

Substream Source cannot be materialized more than once

Of course there are ways around it, but catching it at compile time is just one of the several reasons why I love Rust.

4 Likes

Another of my favourites: Rust helps avoids the "oops someone changed the shared object" bugs.

For example, in C# there's Container.GetItemQueryIterator Method (Microsoft.Azure.Cosmos) - Azure for .NET Developers | Microsoft Learn for CosmosDB. It takes a QueryRequestOptions object, which is great -- way better than 15 optional parameters.

But it's a mutable object. So it's very easy for someone to write a wrapper that takes a QueryRequestOptions and sets one of the option fields in the wrapper. And then that can have surprising impacts on everything else that uses those options later. Thus you're stuck cloning the object to be safe, which because it's a C# class means allocating another one. (And it's used over async calls, so it escapes, and thus can't be stack-optimized either.)

In rust, having options: &QueryRequestOptions (or options: Arc<QueryRequestOptions>) is amazing so you don't need to worry about that. Passthrough things that won't modify the options say so, in a way that you can rely on.

(Of course, you could have QueryRequestOptionsBuilder in C# that's mutable, which can generate immutable QueryRequestOptions instances, but that's not idiomatic in C# as evidenced by Microsoft's APIs not doing it. And doing it yourself is a whole bunch of boilerplate compared to "hey I typed & and it did what I wanted".)

5 Likes

In my current project I have a serialization API that looks like this (in essence):

struct SerializeDocument { ... }
struct SerializePage<'a> { ... }
struct Element { ... }
impl SerializeDocument {
    fn page(&mut self, id: &str) -> SerializePage<'_> { ... }
}
impl<'a> SerializePage<'a> {
    fn element(&mut self, element: Element) { ... }
}

The idea is, if your document has a structure like

page1:
    element1
    element2
page2:
    element3

then you express that by doing

let mut doc = SerializeDocument::new();
let mut page1 = doc.page("page1");
page1.element(element1);
page1.element(element2);
let mut page2 = doc.page("page2");
page2.element(element3);

Each page is implicitly ended when the next SerializeDocument::page call is made on the parent document. Now, what would be really bad is if someone did this:

let page1 = doc.page("page1");
page1.element(element1);
let page2 = doc.page("page2");
page2.element(element2);
page1.element(element3); // BAD

It's important that all the elements of each page be presented before the next page is started, because the serializer is going to be emitting bytes for every element and pages are laid out sequentially in the output format.

The great thing about this API is that the bad pattern is a compile-time error!

error[E0499]: cannot borrow `doc` as mutable more than once at a time
  --> src/main.rs:21:21
   |
19 |     let mut page1 = doc.page("page1");
   |                     ----------------- first mutable borrow occurs here
20 |     page1.element(Element);
21 |     let mut page2 = doc.page("page2");
   |                     ^^^^^^^^^^^^^^^^^ second mutable borrow occurs here
22 |     page2.element(Element);
23 |     page1.element(Element);
   |     ---------------------- first borrow later used here

The exclusivity semantics of &mut under NLL are exactly what's needed to prevent interleaving elements from multiple pages, which I think is pretty neat.

The same module also uses ownership and an opaque type "token" to ensure that a fallible finalization method (not suitable for Drop) on SerializeDocument is called exactly once.

6 Likes

Perhaps "has defined behavior"? (Since behavior is by definition defined for all possible inputs…)