Blog post series: After NLL -- what's next for borrowing and lifetimes?


#61

When I was a Rust beginner I had a hard time with this, and now that I’m competent I still run into the “disjoint field borrow” problem on a regular basis.

Of course I know all the tricks and workarounds, and I use them to fix the problem and move on with my life, but it never feels nice.

I’ve often used both the sub-struct and free function workarounds. I know about view structs but I consider them too heavy-weight. If there was an easier way to create view structs (perhaps a macro?) I think they would be a good workaround.

I’ve internalized the workarounds enough so that I don’t directly run into this problem very often (i.e. I rarely get actual compile errors), but it does add extra friction when writing code (because it essentially requires some unnatural refactoring to workaround what feels like a bug in the compiler, even though I know it’s not a bug).

The most annoying thing is when working with Pinned types (like Futures), because the Pin API requires you to use methods to access the fields (you cannot access fields directly), but that immediately runs into the disjoint field borrow problem.

And unlike most situations, you cannot refactor it using sub-structs or free functions, so in some situations you’re forced to do really hacky things like unsafely borrow the entire struct and then carefully avoid mutating/moving certain fields:

View structs (which have been specialized to work with Pin types) could be a good solution for this.

I would love to have a general solution for disjoint field borrowing, but I understand it’s an extremely tricky thing to solve, with a lot of design space to explore. So I think it’s a good long-term goal, but I don’t expect it to happen anytime soon. And it may never happen, we may be stuck with workarounds forever.

Regardless of what the outcome is, I trust the Rust lang team will make the right decision. They’ve definitely earned my respect (and then some) over the years. Thank you for working so hard and achieving so much.


#62

Thank you for the example.

For example because I’m writing code that solves a specific graph problem, so it shouldn’t go into the impl of my more general graph struct. But I also want to have functions that remove a vertex and all associated edges, so I do want to have a single struct containing both vertices and edges.

That makes sense. Would using an extended version of the rule that considers that implicit partial borrows are accepted within a crate help ? (Since I assume semver considerations are only at the crate interface, right?)

I’m thinking of something like that:

  • Mygraphcrate
struct Graph {
    vertices: Vec<Vertex>,
    edges: Vec<Edge>,
}
  • MyGrapSolvingProblem
use Mygraphcrate::{Graph, Vertex, Edge};

fn process_graph(g: &mut Graph) {
   // vertices borrowed here
   let v = &mut g.vertices[0];

   // Call delete_vertex through the same crate with `g` partially borrowed. 
   // Since this is a private call and `delete_vertex` only borrows `g.edge` this is fine.
   delete_vertex_edges(g, v);
}

fn delete_vertex_edges(g: &mut Graph, v: &Vertex) {
   // Borrows `edges`
   for g in g.edges.iter() {
      //...
   }
}

The problem would only arise if process_graph and delete_vertex_edges were defined in separate crates since in that situation you wouldn’t be able to call the method with a partially borrowed g.


#63

As a beginner-ish, I’ve had some difficulties with the given blog post and the View pattern. It is clearly a very interesting pattern, and I think I’m gonna try to work on a code-generating macro to ease its usage.

I think that easing its usage is definitely needed, since it “allows” a construct very used by other languages and since doing the pattern by hand is not obvious: actually everybody overlooked a typo on the blog code! (which made me crazy until I tested the code and fixed the typo).
The working view code is the following one:

struct CheckWidgetsView<'me> {
  widgets: &'me Vec<MyWidget>,
  counter: &'me mut usize,
  listener: &'me mut Sender<()>,
}

impl<'me> CheckWidgetsView<'me>  {
    fn signal_event (
        self: &mut Self,
    )
    {
        *self.counter += 1;
        self.listener.send(()).unwrap();
    }

    fn check_widgets (
        self: &mut Self,
    )
    {
        for widget in /*&*/self.widgets { // <--------------- HERE
            if widget.check() {
                self.signal_event();
            }
        }
    }
}

As you can see, the code works because &T (where T = Vec<MyWidget> in the example) is Copy-ish and you can thus take a copy of a dereference of &mut &T (happens at self.widgets expression). Now @nikomatsakis explanation makes sense: thanks to the view trick, the iterated widgets is now “decorrelated” from self and thus does not conflict with self.signal_event.
(whereas with the erroneous &self.widgets, we are &*-ing a &mut &_ resulting in a &&Vec<MyWidget> which is not : IntoIterator)


#64

Just wanted to say :heart: Thank you! :heart: to all the folks leaving comments in this thread. I’ve not had the time to respond much, but I’ve been reading them with great interest.

I just finished a new blog post, this time on moving out of borrowed references and the sentinel pattern:

Sometimes when we have &mut access to a struct, we have a need to temporarily take ownership of some of its fields. Usually what happens is that we want to move out from a field, construct something new using the old value, and then replace it.

You can read all about it here. :blush:


#65

There are however some other cases where similar sorts of “permission juggling” might be nice to express. For example, people sometimes want the ability to have a variant on insert – basically a function that inserts a T into a collection and then returns a shared reference &T to inserted data. The idea is that the caller can then go on to do other “shared” operations on the map (e.g., other map lookups). So the signature would look a little like this:

impl SomeCollection<T> {
  fn insert_then_get(&mut self, data: T) -> &T {
    //
  }
}

This signature is of course valid in Rust today, but it has an existing meaning that we can’t change. The meaning today is that the function requires unique access to self – and that unique access has to persist until we’ve finished using the return value. It’s precisely this interpretation that makes methods like Mutex::get_mut sound.

What is the reason that the compiler can’t make the returned borrow unique/shared based on whether the returned reference is itself &/&mut?


#66

You can, but then you lose the method call syntax.

One way of doung it would be to utilize something like

enum Ref<'a, T> {
    Share(&'a T),
    Mut(&'a mut T)
}

But the compiler shouldn’t make that decision for us.


#67

Other types than just & and &mut can hold a reference. How would you determine if they should hold an exclusive or shared lock?

Currently, if I write

fn foo(&mut self) -> Foo<'_> {
  ..
}

I know that the returned Foo is tied to an exclusive lock on self. Unsafe code can rely on that guarantee. Because of that, “weakening” the lock needs to be explicit.


#68

Doesn’t the compiler have to assume that Ref<'a,T> could always be a Ref::Mut,therefore returning it would always be a unique borrow(even after an attribute for weakening returned borrows to be shared is added )?

Edit:Assuming that the lifetime comes from a &'a mut _ parameter.

The compiler has to assume that both of these return a unique borrow:

fn from_mut_to_share<'a,T>(mut_:&'a mut T)->Ref<'a,T>{
    Ref::Share(mut_)
}

fn from_mut_to_mut<'a,T>(mut_:&'a mut T)->Ref<'a,T>{
    Ref::Mut(mut_)
}

#69

Oh i see what you mean now, you want the compiler to magically know that because the arg is shared the output should be shared. That may work.

Also i was thinking that you can use the Ref type as the argument and return type, thus that should not be a problem.

fn get<'a,T>(_: Ref<'a, Lock<'a, T>>) -> Ref<'a,T> { ... }

Note: I’m on mobile rn, so it’s hard to code, so I left out the func def.


#70

Just to clarify,when I said unique/shared borrow I was talking about the ability to access the borrowed variable after calling a method/function returning a reference to it.In which a unique borrow disallows access to that variable until the reference (or any copies of it) is not used anymore,and a shared borrow only allows using the borrowed variable with methods/functions taking shared references.

Taking the blog post example,this is a unique borrow,even though it returns a shared reference:

fn insert_then_get(&mut self, data: T) -> &T

Edit:
I think I know the reason why the current behavior can’t be changed,this compiles:

use std::marker::PhantomData;


fn empty_borrow<'a,T>(_:&'a T)->PhantomData<&'a mut ()>{
    PhantomData
}

struct PtrBorrow<'a,T>{
    ptr:*mut T,
    _borrow:PhantomData<&'a mut ()>,
}

fn ptr_borrow<'a,T>(ptr:&'a mut T)->PtrBorrow<'a,T>{
    PtrBorrow{
        ptr:ptr as *mut _,
        _borrow:PhantomData
    }
}

fn main(){
    let mut a="hello".to_string();
    {
        let _empty_0=empty_borrow(&a);
        let _empty_1=empty_borrow(&a);
        let _empty_2=empty_borrow(&a);
        println!("{:?},{:?},{:?}",_empty_0,_empty_1,_empty_2)
    }
    let _b=ptr_borrow(&mut a);
    
    // Can't call the function again
    // let other_b=ptr_borrow(&mut a);
}

#71

The key point is this:

A function with a signature like fn(&'a mut self) -> &'a K says something like this:

  • You give me exclusive access to self for the lifetime 'a .
  • In exchange, I give you shared access to a K for the remainder of that lifetime.

This does not imply that you get shared access to anything else – and in particular, it may well be that the only reason you can have access to the K during 'a is because we know that nobody else is accessing self.

This is precisely what the Mutex method relies on: it allows you to get access to the contents of the mutex during 'a without actually holding the lock. This only works because we know that we have an &mut reference to the lock, so nobody else can try to lock it in the meantime.

Basically, the point is: if we inferred from a signature like fn(&mut Foo) -> &Bar that shared access to the Foo is ok, that would be very limiting on unsafe code (it would be fine for safe code), and would in fact rule out existing APIs.


#72

Yeah,I independently realized the point about unsafe code (which does not use a &mut reference) in my previous comment(a while after making the question).


#73

Would it be possible to work around the insert-then-get issue by using a signature fn(&'a mut self) -> (&’a Self<T>, &'a T) ? That is, return an immutable borrow of the container as well as the inserted element.


#74

Free variables as a general, but extreme solution

[…] Obviously, this is a significant ergonomic regression.

I beg to differ. I don’t see why using a free function with some arguments should be considered “extreme”. God bless Rust has free functions and doesn’t force us to shove absolutely everything inside a class like Java does. That’s a feature, not a bug. (And in this case, it solves a real problem.)

As to the “ergonomic regression”: well, it is one, but I wouldn’t even say “significant”. This problem is not that common, and pulling out a free function once or twice doesn’t seem that much of a burden. Sometimes solving a problem in a programming language constitutes coming up with a different way of using existing language features (I wouldn’t even consider it “clever” or “hard”), instead of rushing to extend the language.


#75

I have this problem much more often. I wouldn’t have a problem with doing this once or twice with two or three references. But for implementing some algorithms every other routine has this problem and requires many more parameters (because you have to thread through parameters required for calling other functions). I find that such code becomes unmaintainable.

View structs help a bit, but I consider this to be a significant ergonomic regression from implementing the same algorithms in most other languages. But those are pretty much the algorithms were I consider it very difficult to not introduce accidental data races in other languages, which is the reason I still stick with rust for this. Being able to express the code in an uncluttered way and having guaranteed data race freedom would be wonderful.


#76

Not with this particular example, as the fields of Graph shouldn’t be public in my example. There are many different ways to represent graphs and in some cases I want that to be an implementation detail. Or have code that is generic wrt. the Graph representation.


#77

The problem isn’t with the free functions themself. The problem is that you need to borrow each individual field (both at the function definition and call site).

In other words, rather than this:

fn foo(&mut self) {
    // ...
}

self.foo();

You instead have this:

fn foo(bar: &mut Bar, qux: &mut Qux, corge: &mut Corge, foobar: &mut Foobar) {
    // ...
}

foo(&mut self.bar, &mut self.qux, &mut self.corge, &mut self.foobar);

I would indeed call that “significant” loss of ergonomics. And that was with just 4 simple fields, imagine with more fields (or more complex types for the fields)! And now imagine that foo is called multiple times (and you have to repeat the same &mut self.<field> borrow code each time). Those sorts of things do happen in real code.

Functional and OOP languages don’t have this problem because they have a GC, so borrowing isn’t an issue. So it doesn’t really have anything to do with “classes vs functions”: the exact same borrowing issues also happen with Rust functions (not just methods).

In other words, this is a significant ergonomics downside of Rust compared to both OOP and functional languages. And it’s an ergonomics downside that happens with really fundamental mechanisms in Rust (structs + disjoint field access + calling a method/function inside of a method/function).


#78

The assumption here – the whole reason you’re even bothering to write the second version – is that the first version is incorrect. There’s always going to be an cost to taking incorrect code and rewriting it. These examples express fundamentally different things, and the difference is expressed in the simplest and most obvious way. It’s not the shortest way, but there’s nothing unclear happening here.

Most of the inconvenience encountered here is a result of architectural decisions. Do you actually need to borrow all members? Do you need them all &mut? If not, then the example is exaggerated. If so, then you should take into account that all the shorter versions of this code are also all useful and express unique things. The fact that you need a lot of “bits” here is only indicative of the fact that you’re trying to encode a larger number.

The language does give you ways to shorten this stuff. (One of which is even implied by the use of self in this example.) Of course, to do it, you have to write more code, but that’s par. Anything that is not a cookie-cutter standardized solution requires the user to write code.


#79

That is false: as described in Niko’s blog post, the first version is not incorrect, it is in fact completely correct.

It’s just that the compiler isn’t smart enough to realize it’s correct (but it is smart enough to realize it if everything is contained in a single function/method).

I never said it borrows all the fields: it only passes in the fields that the function (and any transitive functions) need.

And yes, in that example they all need to be &mut (otherwise I would have made them &). But it doesn’t matter, changing them to & doesn’t make it any more convenient.

I’m not sure what you’re talking about here: there aren’t any shorter versions, which is the whole point of this discussion.

No, it doesn’t, which is exactly the point of the blog post (and this discussion): you are forced to use the longer version, even though you should be able to use the shorter version.

I suggest you re-read the blog post, it sounds like you are misunderstanding what people’s complaints are about. The complaints are not from newbies who just don’t understand Rust, these are issues that affect even expert Rust programmers.


#80

You’d like it to be correct, but it currently is not. That’s why you have to use the second version: because it is the version that is currently flexible enough to express the behavior you want. (Unfortunately, it is often too flexible.)

Perhaps &self could be made to borrow only what is needed. This would allow the first version to be correct, but if that is being done automatically, care needs to be taken to ensure code which emulates borrows can still function properly. Plenty of code exists which assumes &self borrows the entire struct.

The shorter versions are the signatures that do not borrow the same number of fields, or do not borrow them mutably. Those signatures express a useful difference. The fact that you want to borrow 4 fields rather than 2 means you have to write twice as much of something. If you want all the marbles or none of the marbles, that’s fundamentally easier to express (requires less information) than if you want 2 black marbles and 3 red marbles. There will always be a cost in expressing the latter.

You are forced to use it internally. You are not forced to expose it to your users. Separate field borrows are strictly more flexible than &self borrows and you can recover intermediate levels of flexibility by composing the borrows however you like. E.g., fn split(&self) -> (&A, &B).

Of course, you have to write some boilerplate to get this level of specificity and control, but that’s what code is for.

May I request that you not assume that I haven’t been following the discussion or understand the core issue? And that you not put words in my mouth pertaining to presumptions of incompetence on the part of the people in this thread?

I’m simply trying to point out that ‘ergonomic cost’ doesn’t exist in a vacuum: if you want a language which enables flexibility through the use of composition of primitives, then it will necessarily become verbose when the number of composed primitives grows. If you want to express something as specific as the exact borrowing behaviors for subsets of an object, there is either going to be a syntactical cost to it, a redundancy, or a loss of flexibility. The fact that &self borrows all fields even when you don’t need them all is a perfect example of all three.