What is the formal definition of ownership?

Hello!

I'm very new to Rust, and I found myself confused about the notion of ownership for quite some time after reading the book up to chapter 10, and completing a bunch of basic exercises.

The book states that the 1st rule of ownership is:

By this rule, what a variable "owns" is some "value" (not "data").

Consider this example:

let arr: Vec<u8> = vec![48, 49, 50, 51, 52];

What does arr directly own? My best guess is that arr directly owns the value created by the value expression, which is a "vector header", which is an instance of Struct std::vec::Vec

By this time my mental picture of ownership is all mushy. Instinctively I think there must be a notion of "ownership chain", which starts from the variable arr, down to the struct std::vec::Vec, ...... , all the way down to the actual "data", which in this case are five primitive scalar u8 s.

With a clear definition of "ownership", we should be able to answer these questions:

  1. What is ownership? (what owns what?)
  2. What action will introduce ownership between a pair what and what?
  3. Does every variable owns something? (The variable will own something if it has been subjected to an action described in question number 2)

Thanks in advance!

4 Likes

Rust doesn't have a specification at this time, so I don't have a formal spec to point you to. That said, thinking of ownership as a tree is a good way to think of it. Every value is owned by either another value, or a variable, and if you follow the chain, you will always eventually reach a variable.

Unfortunately, there are some types that complicate this. For example, the Rc type introduces reference counting, where you have many handles to a single value. In some sense all of these own the value, and in another sense none of them does.

Regarding references and borrowing, you might be interested in A unique perspective and Stacked borrows.

5 Likes

Ownership can be thought of in different ways, I think. One of them definitely is as a variable owning data. In that case, you could say arr does own the Vec<u8> as well as all the u8s inside that Vec.

If you have a second variable like this, though, what does it own?

let arr2: &[u8] = &arr;

Just borrowing doesn't change arr's ownership, so I don't think arr2 owns the Vec<u8>. But it must own something, right?

It might make sense to say that arr2 here owns the reference, but not the data behind the reference.

But this can get even more confusing if you have something like an Arc<Vec<u8>>, which is "shared" ownership. No one variable storing an Arc can take out the data behind it, instead it exists as long as any Arc pointing to it exists. In this sense you might say that no variables own the Vec<u8> inside an Arc<Vec<u8>>, and that each variable storing an Arc only owns that Arc (but not the data behind it).

Regardless, I don't know if that's helpful. I'll answer your questions:

My best intuition for this is that ownership is a property of scope/functions. The variables in a scope own data, but it's often useful to think of the whole scope owning the data, since then some of the other properties might make more sense, like:

  • the owner of some data is the scope which is responsible for freeing that data
  • at the end of the data's owning scope, the data no longer exists
  • the owner is the scope which can move the data around, and relinquish ownership.

I think the only thing which introduces ownership is creating new data or new primitives. Using an integer primitive, or using struct construction syntax, these things create new data, and thus create ownership.

At that point on, ownership is really only ever transferred. It moves from one place to another. From that perspective, vec![...] creates something, and returns the data. Then let arr = (...) transfers ownership of that Vec into arr.

An incomplete list of things which transfer ownership:

  • let statements: let x = ...; transfers ownership of ... into x
  • assignment statements: x = ...; transfers ownership of ... into x (similarly, a.b = ...; transfers ownership into b)
  • function call parameters: f(a, b, c) transfers ownership of a, b, c into f
  • method calls with self receiver: x.method() transfers ownership of x into method if and only if method is declared as
    fn method(self, ...)
    
  • function call results: f(...) transfers ownership of the result from the function into whereever you use the result

Every instantiated variable does, yes. It might not be much, but every variable owns something.

Some examples:

// a owns the `Vec<u32>` and the data inside it
let a = vec![1, 2, 3];
// b owns the reference `&Vec<u32>`, but not the Vec or the data inside it
let b = &a;
// c owns 6
let c = 6;
// d also now owns 6 (note: c still also owns a different 6, since u32 is Copy)
let d = c;
// e does not own anything, as it is uninstantiated.
let e: u32;

That's... probably not the most helpful overview. This is mostly my thoughts on it, and my mental model - @alice's links are probably a better canonical source than this. Hope it gives you some stuff to think about, though!

2 Likes

and I can assure you, you are not alone :slight_smile:

In order to get a clearer picture, it helped me to differentiate between what the programmer (i.e you) or the compiler 'sees' and what happens in the compiled binary.

When you are writing source code:

let arr: Vec<u8> = vec![48, 49, 50, 51,52];

you are introducing a "variable binding", i.e. a name which your are using further on in your source code in order to refer to some content, i.e. a name that is bound to some memory location. Starting with the definition of a variable binding, the compiler 'knows' which content has to be accessed each time your are using the same name.
In the finished compiled binary (especially with all debug information stripped off) this variable name isn't available anymore because it is not necessary as the compiler 'replaces' all variable names with the actual reference to the memory location where the content is stored (in your example the Vec structure).

Rust's ownership mechanism now just ensures that in your source code only one variable-binding at a time can refer to the same content in memory. I.e. with

let arr_new = arr;

simply speaking, the rust compiler invalidates the former variable-binding 'arr' because now the only allowed access to the memory content is by using the name 'arr_new'. Any further usage of the old variable-binding 'arr' in your source-code will lead to the well known error messages of the rust-compiler.

Of course this ownership handling is applied recursively down, e.g. in the case of the Vec structure, the structure member 'buf' is the variable-binding for the memory region that contains the u8-scalar-values.

2 Likes

Ownership is a property that allows values to be moved (aka transfer of ownership) or borrowed (aka shared, aka "passed by reference"). Without either moving or borrowing, I would say, ownership doesn't really mean anything.

Let's go over some samples, so we are on the same page;

fn main() {
    let a = vec![1, 2, 3];
    let b = a;
    println!("{:?} {:?}", a, b);
}

In this example, we have two variables, a, and b. First, a is given ownership over a Vec<integer>, and the vector is given ownership over three integers. Second, ownership of the vector is transferred to b. Third, both a and b are printed. This does not compile, of course, because the vector was moved from a, which is now unusable. This demonstrates ownership with move semantics.


fn main() {
    let a = "Hello world";
    let b = a;
    println!("{:?} {:?}", a, b);
}

This is slightly different, now we have given a ownership over a &'static str, and copied it to b. This time the program builds successfully. The assignment on line 3 does not transfer ownership because string slices are Copy types. Types implementing Copy are cheap to copy around, in this case it is pointer-sized. This demonstrates ownership with copy semantics.

Some types are expensive to copy, but they may implement an alternative called Clone which is usually a memcpy under the hood. Cloning requires an explicit call, e.g. T::clone().


With move and Copy out of the way, there is still another use case to cover for ownership, and that is what most people will initially struggle with; borrowing. The borrowing rules are really simple, but I won't repeat them here. But I will show more code to demonstrates the rules.

// This is fine
fn foo() {
    let mut a = String::from("Hello world");
    a.push('!');
    let b = &a;
    println!("{:?} {:?}", a, b);
}

// This is not
fn bar() {
    let mut a = String::from("Hello world");
    let b = &a;
    a.push('!');
    println!("{:?} {:?}", a, b);
}

In function foo, we have mutated a by pushing a '!' character to the end of the string, followed by b borrowing a, and printing both. This is fine because no mutation is occurring after the borrow is taken.

In function bar, we have swapped the borrow and the push. In this case, the function fails to compile because it is in violation of the borrowing rules; namely that you cannot take an exclusive reference (mutable borrow) while a shared reference (immutable borrow) is alive. String::push() takes &mut self, which conflicts with b.

In both functions, there is only a single owner! a owns the String (and the string owns its chars, etc...)

This mental model of exclusive vs shared references is very fitting, much more than "mutable and immutable" anyway.


The last bit to cover, then, is shared ownership! The borrow rules can be too restrictive, in many cases. The standard library provides a means of sharing ownership in multiple places. This is done with smart pointers, which chapter 15 in The Book covers in more detail. Some smart pointers offer compile time guarantees (like the static borrow checker), and others only offer runtime guarantees.

use std::cell::RefCell;
use std::rc::Rc;

fn main() {
    let a = Rc::new(RefCell::new(String::from("Hello world")));
    let b = Rc::clone(&a);
    a.borrow_mut().push('!');
    b.borrow_mut().push('?');
    println!("{:?} {:?}", a, b);
}

Now this is an interesting example! We have a non-Copy type with shared ownership and interior mutability; both a and b share ownership of this value. This works because each variable owns a unique clone of a smart pointer that references the value. Both smart pointers can be used to mutate the interior value while ownership is shared. And both pointers can be used to print the value.

RefCell is a smart pointer that allows interior mutability by guaranteeing safety at runtime. With a minor adjustment, this can be made to panic, by violating the borrowing rules:

use std::cell::RefCell;
use std::rc::Rc;

fn main() {
    let a = Rc::new(RefCell::new(String::from("Hello world")));
    let b = Rc::clone(&a);
    let c = a.borrow_mut();
    let d = b.borrow_mut();
    println!("{:?} {:?} {:?} {:?}", a, b, c, d);
}

This example compiles, but panics at runtime on line 8, in which d attempts to take an exclusive reference to the inner value while c already has an exclusive reference.


Well, ok, I guess I do have one final thought on the discussion of ownership, and it has to do with lifetimes. This was implied in previous posts by Drop. Non-Lexical Lifetimes made some problems with ownership much easier to reason about. E.g. this example compiles and runs just fine, even though I used an example almost identical earlier to demonstrate how the borrow on line 3 violates the borrowing rules:

fn main() {
    let mut a = String::from("Hello world");
    let b = &a;
    println!("{:?}", b);
    a.push('!');
    println!("{:?}", a);
}

In this case, it works because NLL drops b right after it is last used. This allows the exclusive reference on line 5 to be valid.


I hope that answers your questions about "what is ownership". I think if I had to succinctly describe ownership, I would explain it as something like a complex interplay between move semantics, copy semantics, borrowing, and smart pointers. A value can only ever have a single owner. But ownership can be transferred, some values can be copied (some others may be cloned), almost all values can be borrowed, and in more tricky situations shared ownership can be employed with smart pointers. Together, all of these things culminate as memory safety in Rust.

8 Likes

You have probably been immersed in this for so long that you no longer notice that those two simple phrases are incredibly confusing.

In the 'move' example nothing actually moved. The the thing we are interested in, the actual bits comprising the elements of the vector, did not move anywhere, they stay in the same position in memory. Which is an important consideration when there are a huge number of elements.

In the "copy" example the thing we are interested in, the actual bits of the string "Hello world", did not get copied anywhere.

Often a primary concern is what actually gets moved or copied, one does not want to waste time memcpying bytes around. But "move" and "copy" here don't really talk about actual data movement.

If I understand C++ correctly it also has "move" semantics which mean "does not move".

Of course this has been confusing since time immemorial. Computers generally have instructions to move data from register to register or register to memory. Often they are called "move" and have mnemonics like "MOV". Of course they don't move, they copy. Or clone if you like.

3 Likes

Ownership moved. This is perhaps a subtle detail, but it's really the "aha! moment" of move semantics in Rust. Ownership is purely conceptual, it is not something you can see in a disassembler.

The pointer is copied.

I was very careful with the terminology I used, can we try to stick with that? Moves do not necessarily copy anything. You are probably thinking about Copy vs Clone; Copy should never use memcpy since it's usually reserved for pointer-sized types (e.g. they fit in one or two registers), but Clone does memcpy (which may be optimized to a series of SIMD instructions).

2 Likes

Yes, exactly. I was just pointing out that on first reading for a beginner this "meta" meaning is often not immediately obvious.

Yes. But my point is: Who cares? When I'm writing my program my primary concern is likely to be how much copying of huge arrays and structures is going on. The copying of that data is what matters not the implementation details of pointers.

Also any C programmer used to memcpy, a copy function actually reads and writes lots of actual data, might rightly expect a language feature called "Copy" to do what it says, read and write lots of data.

Certainly, it's terminology that is used everywhere. It would only cause even more confusion to try and fix it everywhere now.

Fixing it would require "Copy" to be renamed as something more descriptive of what it does, "DupPtr" or some such. "Clone" would have to be renamed to "Copy" so as to indicate what it does, actually copy data around.

Clearly we are not going there.

¯\_(ツ)_/¯ Are we just talking semantics? I have no interest in bike shedding terminology. My intention is to inform and educate. I apologize that the examples I provided were confusing. And I understand that your response had good intentions to clear it up. I fear that it may have muddied the waters more than necessary.

There's an old saying, "forget everything you know about..." I highly suggest anyone coming to Rust from another language to give this an honest try.

When you see Copy but you hear memcpy, that's a bias that needs to be destroyed. We say Copy is cheap. If you want to do something expensive like memcpy a bunch of stuff around memory, you have to go out of your way to do it with Clone.

1 Like

I did not mean to bash on your explanation. Actually I think it's pretty good. Just pointing out a potential difficulty people have.

Certainly there are a lot of ideas and assumptions one carries over from past language experience to a new language, which don't apply or work out well. Mental models need adjusting.

However given that moving data around it was most program do most of the time I don't consider thinking of memcpy or it's equivalent mechanisms to be a bias. It's reality.

The word "copy" has a common meaning outside of programming. Is that concept of copy also a bias?

1 Like

Thank for all the comprehensive explanation from different perspective!

Both alice and daboross mentioned the complication introduced by the smart pointer types Rc<T> and Arc<T>, which made me realize that perhaps I'm not yet "qualified" enough to raise such a question, having only read the book up to chapter 10. I'll try to digest as much as possible, while taking more time to thoroughly study the book and do much more exercises.

Thank you very much! I think your description of the chain effectively broadened the concept of ownership by saying that values can also be owned by other values.

By using the concept of ownership chain, we can explain why we say String is an "owned" type, because there exists a chain from the String object all the way down to the underlying byte array, without going through any reference.

This is a very helpful concept! Currently I have little actual experience but I can intuitively feel that thinking in terms of the relationship between a scope and its associated owned data is beneficial to resource management, since as you described, an owned value will be dropped when its owning scope ends.

Thanks! I think I've found the "proof" of your claim from the Rust reference in the chapter describing assignment operators.

The sentence "...either copies or moves its right-hand operand to its left-hand operand." basically says that the LHS receives ownership either from a duplicated new value, or from another existing value.

Huge thanks for joining the forum and give me the reply, I found your comment extremely helpful in terms of helping me establish another view into the ownership system by seeing the transfer of ownership simply as a re-establishment of a name binding.

Thank you very much for such a comprehensive analysis! I think after seeing all the different perspectives from alice, daboross and rust-less, your write-up presented me with the big picture that clears up my understanding significantly.

Also, don't worry about the wording of "move", I think I've understood the point from the conversion between you and @ZiCog, moving ownership is very much like moving a file in the filesystem, the actual bits on the disk is not moved at all.

6 Likes

Don't worry about being qualified to ask questions! We're here to help, so if you have any questions or doubts, please ask.

This is a really good analogy!

5 Likes

The posts above pretty thoroughly cover what "ownership" means within Rust, but it may also be worth mentioning a little about the history and other languages.

AFAIK the modern notion of ownership originated in C and/or C++ as part of developing abstractions around dynamic memory allocation. Basically, C's malloc() would allocate some memory and return a pointer to it, with the understanding that free() would later be called exactly once on that pointer when the memory was no longer needed.

Naturally, there were many cases where a C library returned a pointer or accepted pointers as arguments where it was unclear whether the library would call free() or the caller was expected to call free(). The earliest notion of "having ownership" was basically "being responsible for calling free()", and only applied to pointers to heap memory.

Since C++ introduced destructors (similar to Rust's drop()), it became straightforward and idiomatic for a user-defined C++ type to "own" a pointer, and use its destructor to ensure it was consistently free()d exactly once. Wrapper types which did nothing but this became called "smart pointers". It then quickly became obvious that C++ copy constructors (similar to Rust's clone()) are only semantically correct if they performed deep copies / fresh reallocations of anything behind pointers, so that at the end of the copy there would still be "only one owner" for each allocation (look up "rule of three/five/zero" for more detail).

But deep copies are expensive and often unnecessary, so people started working on patterns for transferring ownership without a deep copy, and the term "move" caught on. The importance of copy vs move semantics turned out to be a very general and fundamental concept for all types, not just heap-allocated ones. So now with the benefit of hindsight, Rust "just" integrated all of those semantics into its core language so it can guarantee that you don't double-free anything by accident, even things on the stack that won't literally call free(), but may do some other important thing like close a file or unlock a mutex.

(and this is only what I'm aware of; I'm sure other languages contributed to the "ownership" concept in ways I haven't heard of)

12 Likes

A little snippet of high level language design history I found once was about FORTRAN.

In the original FORTRAN all parameters were passed by value. So basically values were copied in to the arguments of the functions being called. In this scheme ownership is not a problem.

Then they came up with the idea of pass by reference. Now a function could modify a value in the caller through the passed reference.

This threw up an interesting ownership issue. You see in FORTRAN all those hard coded constants you spread around your program were not compiled into the code where they are used, as immediate values or whatever. Rather they were all placed in a common area called the "literal pool". So, if you used the numeric value of PI a hundred times in your code it would only appear once in the binary, in the literal pool.

Well, the issue was that you could now pass a reference to a constant like PI to a function, and the function could then change the value of PI through that reference. Which mean that all uses of PI in the program now used a different value!

Oops.

This is at least a fine example of how wedging all kind of language features together can have unexpected and bad consequences.

6 Likes

If anyone thinks @ZiCog's example of changing the value of PI is made up, I managed to change the value of 3 to be 4 by that mechanism so that 3+3=8. Talk about a tough bug to track down.

6 Likes

If anyone should doubt what I said about FORTRAN the story is told in Jack Crehshaw's famous "Let's Build a Compiler" paper: https://compilers.iecc.com/crenshaw/tutorfinal.pdf

Jack's paper comprises everything I know about building a compiler. It's not much but it did enable me to define a C/Pascal like block structured language with statements, conditionals, loops functions and data types signed and unsigned bytes, words and longs. And write a compiler for it that generated code for x86 and the Propeller micro-controller from Parallax Inc.

It was not optimized efficient code but I was over the moon that I could do that at all!

At least I learned that even defining a coherent programming language is hard, never mind writing a compiler for it.

Hmm... perhaps I should revisit my compiler effort and try writing it in Rust to generate WASM...

1 Like

I'm trying to answer your questions by reading the RustBelt formalization https://jhjourdan.mketjh.fr/pdf/jourdan2018rustbelt.pdf.

  • Jung, Ralf, et al. "RustBelt: Securing the Foundations of the Rust Programming Language." 2018.

Please pardon me that I didn't read the paper fully, and didn't understand fully. Please correct me if my interpretation wrong.

Also, note that this is not only formalism possible for the Rust language. Different formalisms are possible.

  1. What is ownership? (what owns what?)

The RustBelt paper defined own relation between types and values, within a context.

Each type defines its own relation. For example, a product type (A, B) owns the value ab if and only if there are values a and b such that ab is the concatenation of a and b, and A owns a, B owns b.

This definition doesn't directly mention the nested structure of ownerships, but inference rules can be used to reason about the structure.

Thus, the answer to the question is: a type owns the set of possible values of the type.

  1. Does every variable owns something? (The variable will own something if it has been subjected to an action described in question number 2)

The RustBelt paper uses a MIR-like model where local variables are explicitly tracked (allocation and deallocation) and is represented by a value of an owned pointer type.

The own relation definition of this type is as follows: an owned pointer own_T type owns a pointer value p if and only if p is a valid pointer to a location with value v such that T owns the value v.

Thus, when a stack location corresponding to the is allocated to the valid value of type T, the local variable of type T which is represented as a pointer to the stack location owns the value.

  1. What action will introduce ownership between a pair what and what?

The own relation is defined for each type and thus the answer is dependent on the type.

For the owned pointer types (including local variables), ownership is introduced when the pointed location is allocated. When the pointed value is modified, ownership to the previous value is destroyed, and the new value is now owned (by the owned pointer type).

3 Likes

Huge thanks for pointing me to this!

I have no CS background and hugely uneducated when it comes to formal PL theory. (Which made me realize that as an uneducated man I should probably ask less and silently learn more) However your interpretation is very helpful!

This sounds to me like a general depiction of type theory, for example an uint8 type "owns" the set of possible values [0, 255], which I've failed to relate to Rust's ownership model.

Thanks for the low-level explanation!

I can't really help with the OP's question here, but I do want to share an observation. After being immersed in Rust for a few weeks now, I find that in order to understand ownership you just have to fail at it spectacularly over and over again, write code that makes sense and have it not compile and then try to do what the compiler tells you, and when it tells you you need a ref you add '&' and then it says, no you don't want a borrow there you want a move, and yell at it "that's what I had before" and then walk away for a bit and come back and rewrite the whole function to be more like the code from that livestream you watched and it goes from 20 lines down to 3 and actually works. And the more you do this, little things gradually go "click, click, click" like the tumbler pins on a safe you're trying to crack. I don't think there's any way to force understanding here, you just have to drill the concepts over and over while attempting to apply them.

6 Likes

Is that really the language used in such formalism? Because to a layman it makes no sense.

It says that u8 owns all the values 0..255=

It says that u16 owns all the values 0..65,535=

But wait, they can't both own the values 0..255=

In my naive mind a type is defined by the set of possible values an object of the type can represent. That is not exclusive "ownership", other types can include those values as well.

Can I assume you use of the word "own" there is different that it is when we talk of the Rust ownership model. There the actual type is not important, we are concerned with the number of ways a value can be referenced by name or through reference. It's about aliasing of values, whatever type they may be.

1 Like