For Beginners: An interesting article about Ownership and Borrowing

I have struggled a lot with Rust's borrowchecker. Coming from C and C++, I often wondered why the compiler is so conservative and restrictive, sometimes rejecting code that is 100% safe. It took me a while to realize that there are build-in rules that need to be followed and why they need to be followed: it's simply another paradigm that is different from that in C or C++.

Therefore: don't stick to the paradigms of other languages, but clear your head.

I recently found an article of three parts which can help you as a beginner to make friends with Rust's ownership model.

Edit: I forgot the link. Here it is:

And here is a comprehensive and helpful post about Non-lexical-lifetimes:

Cheers

5 Likes

This entire comment is about the articles in the first link.

These articles are pretty good on the whole. In general I was more satisfied the further I went. Their presentation of ownership is a bit overgeneralized without noting that it's generalized (which is common). They present references as immutable/mutable (in contrast with shared/exclusive) without pointing out that shared mutability is possible (also common). They have some bigger misconceptions about reference lifetimes or perhaps lifetimes generally.

Almost all these gripes are from part 1. By the time I finished reading I got the impression that the author is aware of at least some of the overgeneralizations. Personally I would prefer if they were explicit about generalizing upfront, preferably some note about the exceptions are -- I have a negative reaction to counterfactual statements and strongly dislike having to go back and relearn things after I found out I wasn't given the whole truth.

I also think this was written from a particular perspective (without calling it out), coming from Java maybe? Some of their points[1] would probably be better served by being explicit about the comparison / POV assumptions.

More detailed notes follow. Everything after part 1 is a nit or clarification.

Part 1

A core aspect of Rust is that every piece of data has one explicit owner.
[...]
the information that a garbage collector tracks at runtime is known to Rust at compile time

Shared ownership can be modeled in Rust. Not everything is known at compile time.

Arguably static and leaked data has no owner.

A borrow might be implemented using a reference, but it might not.

I mean, if you say use a reference, it's going to use a reference. Maybe this note was here due to coming from a language where "reference" meant something else than a Rust reference, and the author wanted to warn against assuming a Rust reference is the as what "reference" means in some other language.

Similarly, Rust makes no guarantees about move's implementation.

I don't know what they're getting at with this either. There are no move constructors. A move is notionally a memcpy:

It’s important to note that in these two examples, the only difference is whether you are allowed to access x after the assignment. Under the hood, both a copy and a move can result in bits being copied in memory, although this is sometimes optimized away.

(The above quote is from here.)

If we intend to mutate data we must declare our variables using mut

mut bindings allow directly overwriting and taking a &mut to the bound variable, to be precise. There are various ways to mutate things even if there is no mut binding (see "interior mutability" below for one way).

They do a good job covering how you can just move to a mut binding in part 2.

They do a good job of covering how the lack of a mut binding don't mean the underlying data has some immutable property in part 3.

Borrows, mutable or not, exist for the lifetime of their scope. In all of the above cases, the borrows exist until the end of main, which is why we violate the requirements of multiple readers OR one writer.

No, that's incorrect. References haven't been limited to scopes since NLL landed over 5 years ago.[2] The requirements are violated in the examples because they try to interleave the &mut borrows with the & borrows and/or use of the borrowed struct.

Shared borrows and exclusive borrows are better names than immutable borrows and mutable borrows, despite the mut keyword. In particular, mutating through a & is possible in some cases. This is called interior mutability in the documentation, but another description is shared mutability.

Here's their "limited" code block, with no function boundaries
fn main() {
    let a1 = 1;
    let a2 = &a1;
    let a3 = &a1;
    println!("{a1:?} {a2:?} {a3:?}");

    let mut b1 = 1;
    let b2 = &mut b1;
    println!("{b2}");
    let b3 = &mut b1;
    println!("{b3}");
    println!("{b1}");

    let mut c1 = 1;
    let c2 = &c1;
    println!("{c1} {c2}");
    let c3 = &mut c1;
    println!("{c3}");

    let mut d1 = 1;
    let d2 = &mut d1;
    println!("{d2}");
    let d3 = &d1;
    println!("{d3} {d1}",)
}

Playground.

Part 2

I didn't have any nits here worth writing out.

Part 3

Most my nits here would be repeats from part 1 (e.g. shared ownership). Exactly when deallocations occur is sometimes determined at run-time. Rust does determine where values do or may possibly drop at compile time.

Maybe they're getting around to pointing these things out...

Everything we've discussed here is the default behavior. It's the set of rules that most of your Rust code will live under. It provides the strongest safety guarantees and the lowest runtime overhead. However, Rust provides constructs to change these rules on a case by case basis. This often involves deferring compile-time checks to the runtime. Hopefully we'll get to talk about these more advanced cases soon.

Destructors are more general than Drop.

Part 4

Literal &strs are &'static strs.

Not the point of the code but here's how I might write the word counter.

A &mut str on the other hand, is something you'll rarely, if ever, use. It doesn't own the underlying data so it can't really change it.

It definitely can, there's just relatively few cases where you can validly mutate UTF8 data (like a str is) in-place, since it's a variable-width encoding. (ASCII bytes are always one UTF8 byte, which is why &mut str can be used to perform the lowercasing.)

Technically, str is the slice and &str is the slice with an added length value. But str is so infrequently used that people just refer to &str as a slice.

More generally, a str is basically a [u8] with more invariants. Slices like str and [u8] and any other [T] are dynamically sized types (DSTs). References to slices like &[T] and &mut [T] are very often also just called slices. So unless the material you're reading is being very explicit/pedantic/careful, you just have to figure out from context whether a "slice" means the DST or some sort of pointer to the DST.

Since it doesn't need ownership of the parameter, &str is the correct choice.

&String doesn't give ownership either. It's more that

But since this is so common, the Rust compiler will also let us call this function with a &String:

The technical term is "deref coercion". (String dereferences to str; &String deref-coerces to &str.)


  1. e.g. "think interfaces" ↩︎

  2. Even before then some borrows only lasted for a statement. ↩︎

  3. check capacity ↩︎

3 Likes

This entire comment is about the stackoverflow answer in the second link.

Lifetimes, especially intrafunction lifetimes (those unnameable things '_ the compiler calculates within a function body), are complicated to define. The answer does explain in general how NLL is an improvement over the lexical lifetimes we use to have.

But the part where they try to define lifetimes in terms of value liveness or validity isn't accurate, which is what almost all of my comment is about.

The name "non-lexical lifetimes" doesn't sound right to me

The lifetime of a value is the time span during which the value stays at a specific memory address (see Why can't I store a value and a reference to that value in the same struct? for a longer explanation). The feature known as non-lexical lifetimes doesn't change the lifetimes of any values, so it cannot make lifetimes non-lexical. It only makes the tracking and checking of borrows of those values more precise.

This confusion goes away if you don't conflate Rust lifetimes (those '_ things) with the liveness of values -- what they call "lifetime of a value". The liveness of a value doesn't correspond to a Rust lifetime. (The lifetime of a borrow of some value is naturally limited by the liveness of the value.)

The answer in the link is getting closer to the idea...

Rust lifetimes are not the time period between when an object is created and when it is destroyed!

...but Rust lifetimes don't correspond to "validity at a specific address" either. Moreover, a borrow (which does have a lifetime) can conflict with some use of the value that doesn't invalidate the value at it's current address.

A more accurate name for the feature might be "non-lexical borrows".

This too hints at differentiating borrows from value liveness, but I still prefer not using the word "lifetime" for both concepts, similar to the NLL RFC. The "lifetime"s of NLL were never referring to value liveness (or value validity at an address).

I suggest thinking of it like so: Borrows have lifetimes, and uses of places ("lvalues") can conflict with those borrows. Uses include overwriting, moving, copying, being destructed, going out of scope, having a reference taken, etc.

NLL refined lifetimes so that borrows conflict with less uses.


(An optional section attempting to demonstrate why the "validity at an address" notion falls short in more detail.)

Lifetimes are the output of an analysis of when borrows of places (a path to some memory) are used, along with things like explicit outlives annotations. The analysis then calculates when each place needs to be considered borrowed, and how it's borrowed (shared or exclusive primarily). Finally the analysis checks every use of a place to see if there's a conflict with its borrows.

Moving or going out of scope or being overwritten, etc, are just specific cases of a place being used.

Moving out of a place conflicts with being borrowed, for example. So if you try to use any borrow of a place after moving out of that place,[1] you get a borrow error.

As another example, taking a &mut to a place also conflicts with being borrowed, like moving does. But the value is still valid.

As another example, copying out of a place conflicts with being exclusively borrowed, but not shared borrowed. If you try to use an exclusive borrow of a place after copying, you'll get a borrow error. But the value is still valid, and you won't get an error if you use a shared borrow of the place after the copy.

As another example, taking a & to a place conflicts with being exclusively borrowed, but not shared borrowed, like copying does. But if you have shared mutability, you can still mutate data behind the &. This means that &mut conflicting with being borrowed at all (as mentioned above) isn't about mutability or the possibility of replacing the value at the address the & points to -- we can do those things with shared mutability, without invalidating other shared borrows. (The conflict above is about &mut being exclusive.)

Now, you can have some mental model where things like "value validity at this address" correspond to some notional lifetime, and borrows are capped by that lifetime. But in my opinion, and as I hope the above examples demonstrate, you have to add a lot of complications and probably multiple types of notional lifetimes, etc, to make it jive with what actually does or does not compile.

The root reason being, that's not actually how the compiler works.


(n.b. some other comment)

Perhaps a better refinement on the naming would be, sub-lexical lifetimes, considering that it specifically shortens the lifetime estimates of binds.

Rust lifetimes can cross scope boundaries, so non-lexical is more accurate.


  1. i.e. do something that causes the place to still be borrowed during the move ↩︎

2 Likes

There is one problem I have with the idea:
Ownership and mutability interact thusly: you can have multiple immutable borrows **or** one mutable borrow...
Why isn't it like this?
you can have one mutable borrow (or owner) **and** multiple immutable borrows?
Why should immutable 'references' be limited that way?

And why (if the statement is true) should this code example in article 1 work?
The statement is confusing for me, because we have here 1 mut and 2 immutable borrows.

  let mut a1 =  1;
  let a2 = &a1;
  let a3 = &a1;  // No Problem. Can have multiple borrows

Because if the item being borrowed is something like:

    struct S {
        a: u32,
        b: u32,

And it is being written from one thread while being read by another thread then it is possible that the following sequence of events happens:

  1. First thread writes S.a
  2. Second thread reads S.a and S.b
  3. First thread writes S.b

Notice that at step 2) the second thread now has an incomplete/inconsistent update of S. The a is a new value but the b has not been written yet.

Of course this applies to multiple readers (immutable borrowers) as well.

Note that a mutable borrower is not necessarily an owner contrary to what you wrote. Having an &mut reference does not make one the owner.

Ok. That is why i wrote "or" owner in brackets.
I can understand the 'datarace' in your example. And have to think about it.

(what I really like very much is "Having a single explicit owner means that Rust doesn't need (and thus doesn't have) a garbage collector." The implications I still have to learn: how to write my code).

1 Like

In the code, this is a mut binding (variable)....

let mut a1 = 1;

It is not an exclusive (&mut) borrow. It's not a borrow at all. A binding being mut just means you're allowed to take a &mut to it and you're allowed to overwrite it. It's basically a non-optional, compiler-enforced lint. All bindings could be mut and it wouldn't change the soundness or borrow checking results of compiling programs.

If you own something, you can move it. If you can move it, you can always do this:

let a1 = String::new(); // Non-`Copy` type
let mut a1 = a1;
a1.push_str("mutated!");

In contrast, &mut T and &T are two different types with very different semantics.

It's more that &mut being exclusive is a core part of Rust's memory safety and data racing story. For example, if you have a &mut Vec<i32>, you can push into that Vec -- perhaps causing a reallocation and move of the entire contents -- without worrying about making a bunch of references dangle, because you know that no other references exist. Data races are similarly avoided.

There are also a number of optimizations that can happen if you know two references don't cover the same memory (because at least one is exclusive), or that the contents behind a shared reference without shared mutability won't change.

Here's an article about the exclusivity concept. And another, though be warned it's from pre-Rust-1.0 and so is a little dated. It has one of my favourite quotes regarding the exclusive/shared vantage point (in contrast with immutable/mutable):

[I]t’s become clear to me over time that the problems with data races and memory safety arise when you have both aliasing and mutability. The functional approach to solving this problem is to remove mutability. Rust’s approach would be to remove aliasing. This gives us a story to tell and helps to set us apart.

3 Likes

Aha: more clear for me now. no dangling pointers allowed.
And I suspect that the strict rules are one of the things the compiler can produce performant code.

It's something extremely fundamental: shared ownership and mutability don't mix well.

They don't mix well in programming: how often have you needed to track “who changed that damn variable behind by back”?

They don't mix well in real life: how often you had to yell on your neighbour kids that they have to return toys in proper place and at least ask for permission before taking them?

Shared ownership and mutability mix is awful, terrible, evil thing… yet that's also necessarily evil: that forum would be terribly boring if no one would be able to see each other messages, for one thing.

We have RWLocks to handle the issue (and they were invented decades before Rust was invented). There are also Mutexes, RWLocks simpler (and faster) cousin. Rust also have single-threaded version, called RefCell and there are many way to allow shared mutability where it's need.

Functional programming solves that by banning mutability outright, but then they have to add various kludges, because the world we live in is, of course, shared and mutable.

Rust uses different tools to achieve similar goal, but the problem and solutions come from the same source: : shared mutability is something truly evil, awful, dangerous… yet, sometimes, desirable and needed.

If think about it… almost every problem that you ever faced in your programming career maybe traced to that duality between danger and desirability.

2 Likes