Why doesn't Rust just always borrow every variable and return ownership when done?

Is it a way to reduce memory? Some kind of safety check?

3 Likes

You're going to have to explain what you mean by this to get a meaningful response. In one sense, you've just described "reborrowing", which has been in the language forever. In another sense, this would prevent all Rust code ever written from passing the borrow checker ever again.

1 Like

Well sometimes you want to give away ownership. You should probably give an example in which you think Rust acts weirdly.

3 Likes

What I mean is why doesn't every variable passed into a function act as if it has a "&" before it? That way no one would have to think about whether or not they'll get that variable back.

1 Like

The basic answer is that this would require garbage collection, and Rust is not a garbage collected language.

1 Like

When a value is moved into a function, it's dropped at the end of that function. If all functions always borrowed their arguments, they might be alive for much longer, which increases memory usage.

Apart from that, borrowing only works if the borrow doesn't outlive the scope of the owner. That's why we can't return a temporary value from a function:

fn hello(name: &str) -> &str {
    &format!("Hello, {}", name)  // error!
}

This seems simple, but as soon as you have structs borrowing other structs borrowing other data, we need lots of lifetimes to satisfy the borrow checker, and we don't want that.

6 Likes

Yet another reason is that many functions need to "consume" one of their arguments to do what they're supposed to do at all (without implicit cloning or garbage collection or whatever). Most into_* functions are like this.

4 Likes

Knowing when "done" is for every value in an arbitrary program is very difficult problem.

It may seem easy at the first glance, but there are lots of complicated edge cases (see escape analysis problem).

Rust solves finding "when done" by limiting where variables can be borrowed, basically forbidding you from writing difficult cases of the problem, so that it can statically discover places in the program where values are "done".

Other languages that allow borrowing any variable anywhere and free them when done, either use reference counting (knowing that done is when refcount is 0) or use a mark-and-sweep garbage collector and periodically search all of program's data for objects that are no longer used.

Rust doesn't want the cost of garbage collection, so it doesn't "just" allow programs to borrow data in an arbitrary complicated ways that would require garbage collection.

8 Likes

In addition to what others have said, some useful programming patterns in Rust rely on the value not being available anymore after it was passed-by-value to a function.

For example Type Level state machines (ie using a type rather than a value to represent a state in a state machine) build on that and would not be nearly as useful without it.

3 Likes

@Aloso
I agree that's the best argument I've heard for borrow checking.. needs a lot of memory that otherwise could be freed once the function returns. Was thinking that probably still needs this memory anyway.. but no, you're right that could be freed.

@jjpe and @Ixrec
Variables being consumed or otherwise available seems to me like it could be solved with the 'mut' more than borrow?

@kornel
I know that escape analysis is tricky, though actually think I have an algorithm that solves this. There is one out there that tries to do 'compile time reference counting', I think a better solution is to create a graph of all function calls that is tree like and starting from the functions that don't call any other function, work your way up. Just trying to make sure I fully understand the problem then will put up a forum post explaining the idea to see if it makes sense.

If I'm understanding any of this: you're well on your way to reinventing the borrow checker :slight_smile:

One of the biggest reasons why Box/Rc/Arc exist is to deal with objects whose lifetimes do not fit into a neat tree structure like this, e.g. because cycles are possible or because threads are involved.

This is also why everyone tells you not to put references in your structs if you can avoid it: that effectively forces the structs to have strictly "tree-shaped use" where you create it, pass it to some functions, then destroy it all within a single lexical scope.

Unfortunately no, since type level state machines rely on the type being affine i.e. they rely on a value of that type being capable of transitioning exactly once to another state (which is encoded as another type). With a mutable borrow this cannot be guaranteed, nor can a value of the new state be constructed without copying, cloning or swapping with some default value; But by passing the value itself the exactly-used-once property becomes inescapable.

1 Like

If you squint, this is basically how "move semantics" work in C++. Types that can be moved out of have an empty state, and when you move out of them you have to put the moved-from object in the empty state (nulling out pointers, etc.) so that calling the destructor won't cause any conflicts with the moved value. This is quite different from Rust where moving from an object doesn't require doing anything special (drop flags are related, but they are stored on the stack frame where the object is owned, not with the type).

C++ move semantics are clunky because they were bolted on a quarter century late. Rust's design is a lot cleaner. Of course, you can write code like C++ in Rust, too: wrap everything in Options and use &mut Option<T> instead of just T. But why would you?

7 Likes

What if one of you need some kind of runtime loadable extensions / plugins? You cannot analyse at compile-time what they do with data that is passed to them.

Also, how do you know when to deallocate a piece of memory? Do you compile that into your function based on static analysis results?
Then what about other dynamic usage patterns? Sometimes your function may be called with a reference that is not used anymore and can be deallocated after your last use in your function. Another time, it is still referenced.
Do you compile the function multiple times, for each case?

That will get very messy for often-used functions in library crates.

Libraries can be compiled to know that these are the functions in their API that can be called and these variables need to be borrowed, those can have ownership consumed and it's automatically compiled into the library's function calls.

That's the beauty of building it from the bottom up. It would not be possible to do this if you built it starting from the top.

@Ixrec
Why wouldn't Box work this way? And why not just say the entire pool of memory can be used and stick them on the stack instead of the heap.. I still haven't seen an example where Box is needed. But even assuming there is something with threads allocating something that can't be allocated before the thread is called, why can't Box still work even with this model? Other compilation method could still throw an error in cases where box is needed.

Isn't this exactly how Rust works today? I think we've lost track of what language change you're trying to suggest.

This series of questions is so confused that some of them are literally answered by each other, so I'm really unsure what you're looking for here. Let's try this:

If you made a language where all ownership had to be strictly tree-like, then yes you could put everything on the stack and not have a heap at all and there'd be no need for Box/Rc/Arc. In such a language it would also be trivially impossible to support dynamically sized containers like Vec, recursive types like linked lists, or most typical uses of runtime polymorphism through trait objects (dyn Traits can be created on the stack, but that's rarely useful). The latter two examples usually require a Box.

This is also pretty much exactly what current Rust would be like if we removed all dynamic memory allocation from it. I suppose we could call that "MISRA Rust".

2 Likes

There are cases like:

if random() {
    return new();
} else {
    return borrowed_arg;
}

which make ownership of the return value dynamic and impossible to know at compile time. Collect these in a Vec and now your Vec's destructor can't statically know what to free and what not.

Graphs can have single ownership if they're trees, and require shared ownership if they're a DAG. You'll probably run into halting problem trying to figure out at compile time which one the code is building.

And there are function pointers and polymorphism which can make different functions called dynamically based on runtime conditions. You won't be able to allow different implementations have different kinds of ownership.

Java and Go have conservative escape analysis. If they can figure out which objects don't escape, they can keep them on stack. But when they can't, all bets are off, and they have to resort to GC.


Rust's borrow checker doesn't analyze function bodies. If you did whole-program analysis you could remove need for naming specific lifetimes.

You couldn't remove distinction between borrowed and owned values, unless you started pessimistically wrapping things in Cow or Rc in places where code is impossible to analyze. But that's going towards being a garbage collector, or something like Swift which starts with Rc on everything, and then optimizes refcounts out where possible.

2 Likes

Sorry, I was referring to automating borrow checking and removing the need to explicitly do it outside of lifetimes and Boxes.

haha

I don't see what makes this impossible?

borrowed_arg must be borrowed because it appears in the function and is used again by a function that called this function.

I'm not seeing the problem you are trying to point out?

And yes, I would treat all different polymorphisms of a function the same, not seeing a scenario where they wouldn't be? Still the same function.

Nah, a Dag seems to me like it would take roughly as long as the current borrow checker to run. Perhaps I'm not explaining it well. You could still analyze every function only once. I'll write something up specifically detailing the algorithm I'm seeing in a separate forum post. Would probably help this discussion.

let borrowed_arg = new();
for i in 0...10000 {
  vec.push(if random() {
    new()
  } else {
    borrowed_arg
  });
}

drop(vec); 
consume(borrowed_arg);

How does the Vec::drop is supposed to know which indices are borrowed or owned? This is a dynamic property now. This program created 10000 new bits of information about its types, which aren't in the source code. You can store these bits (creating a flavor of Rc or Cow), or you can deduce them from the data (creating a GC), but you can't compile a correct non-leaking drop without having this information.

Code could also do vec[random()] = vec[random()] a few times to make unpredictable number of self-references.


Another way to look at this problem is by analogy to static and dynamic types. Borrowed vs Owned are types. Rust in this way is a statically-owned language.

A language where you can "just" borrow things is dynamically owned. Figuring out ownership in such program is the same problem as finding types in a dynamically typed program.

In dynamically typed languages you can analyze the program, data flow to get some idea of the types. If you allow flexibility, you keep it dynamically typed, and have endless edge cases. If you're strict, you're inventing type inference, and have static typing strictness, but without explicit syntax for it.

13 Likes