Rust References - Few points - Request for comments

Request you to given comments on my notes on Rust. May correct the terminologies.
This is also my attempt to create material for persons who are new to programming.

In Rust, a variable is a tag to a meta-data on some data. The variable is said to own
the "data". To make language simple, we short-circuit the term variable and meta-data
and simply say variable contains data (or memory location of data)

Depending on the type, the data can be embedded in the variable, or variable contains
pointers to the memory location where data actually stored. For primitive types, the data is
stored in the variable itself.

For a complex data type such as String, complex structs, the data is stored in heap and
variables contains pointers to those data in the heap memory.

In Rust, reference or reference variable is a Special Variable which refers
(loosely we can say it points to) to the actual variable.

For example,
let i : i32 = 5 ;
Here i is a variable and holds the value 5. &i represents a reference variable pointing to i.

let j = &i ; 

Here reference variable &i referencing actual variable i is assigned to j . Hence j is
also a reference variable. This reference variable j is also referencing the actual variable i.

In Rust, references can be Copied as Rust implements Copy trait for references.

In C we have pointers. If x is a pointer variable , then to access the value it contains, one
needs to de-reference using * . ..ie. *x .

In Rust, when I can use reference variable like aregular variable. Consider the following program

struct Point {

    x: i32,
    y: i32,
}


fn main()
{

    let p1 = Point { x:5, y:5};

    let r_square = r_square(&p1);

    println!("Distance squared  from origin is {}",r_square);
}


fn r_square(p : &Point) ->i32
{

    //Even though p is a reference, I don't use
    //any de-referencing operator to access  the
    //structure.
    let r2 = (p.x) * (p.x) + (p.y) * (p.y);
    r2
}

I passed a complex data type called Point ( which is defined using struct) as a reference.
In side the function that reference variable is called p .

I access the data elements of the structure through reference variable p , but I never use
any de-referencing operator.

If someone comes from C background, this is one of the differences. In C, the pointer
contains memory address.

In Rust the reference variable points to actual variable. We attempt not say it is just a pointer.

We can access the value directly from the reference variable.

Another example, consider the following code

''''''
fn main()
{

let mut s1 = String::from("Hello");

let  t1 = & mut s1;

t1.push_str(" World");

println!("{}",s1);

s1.push_str(" Rust");
println!("{}",s1);

Here s1 is of type String and initialized to hold value "Hello". &s1 is a reference variable (let us forget about mut now), and assigned to t1 . (remember Copy trait for references).
Now t1 is a reference variable referring s1.

I could use t1 as if its a actual variable. In the same program, string literal " Rust" is added
to the actual variable. Note that usage are similar for both actual variable and reference variable.

Still reference variable is a reference variable.

Consider similar examples for primitive types like i32.

fn main()
{

    let i : i32 = 5;

    let j = &i;

    let temp: i32 = j +j ;
    let temp1 : i32 = j * j;

    println!("{}",temp); 

    println!("{}",temp1); 
}

Here also, j is a reference variable and we constructed an expression where j is
treated as a actual variable and not as reference variable.

Rust compiler knows how to de-reference the reference variable automatically and perform computation.

Another example with struct

#[derive(Debug)]

struct Point {

    x: i32,
    y : i32,
}

fn main()
{

    let mut p1 = Point { x:5, y:5};

    println!("{:?}",p1);
    let p2 = & mut p1 ;



    p2.x = 10;
    p2.y = 5;

    println!("{:?}",p1);
}

Here too, p2 is a reference variable referring the actual variable p1. But I can use reference variable p2 as if it is actual variable .

The point is that one cannot compare C-pointers with Rust's reference variable from the
programming perspective. Rust dont call them pointers. It is called reference or reference variable.

To understand more, consider the following code

fn main()
{

    let mut i : i32 = 5;


    let mut j = &i;
    //Seperate variable is created
    j =10 // Not possible.
    j = &10;


    println!("{}",i);
    println!("{}",j);
    println!("{}",&i);
}

In the above code j is a reference variable. j =10 is an assignment statement. That is we are trying to change the value of i to 10 through reference variable j. But Rust
encounters type mismatch error as we try to assign value 10 (i32) to a reference variable j.

The point is that you cant always treat reference variable as actual variable. The assignment i=10 is okay but Rust throws type mismatch error when j = 10.

In the statement j = &10;, what happens is value 10 is created in memory and its unnamed variable is updated (as j is mutable) to j as its reference . So j now refers
to 10 and no more refer variable i.

The question now arises in our minds , how come struct example worked when we changed the values through reference. Recall the statement in that example,

p2.x = 10 ; where p2 is a reference variable.

It subtle to note that p2 is a reference variable but p2.x is of type i32 as
defined in the Point struct !!!. Thats the reason we could assign the value

p2.x =10;

Consider the the following code.

fn main()
{

    let mut i : i32 = 5;

    println!("{}",i);

    let j = & mut i;
    //Seperate variable is created

    *j = 10; //De-reference operator


    println!("{}",i);
  
}

In the code, the value of i changed from 5 to 10 through reference variable j.
The * is a de-reference operator ( similar to C ). When I use de-reference operator,
the compiler always refers to the actual variable it points to and treat like the actual
variable even for assignment (provided type is matched).

Consider another example code with struct.

#[derive(Debug)]

struct Point {

    x: i32,
    y : i32,
}

fn main()
{

    let mut p1 = Point { x:5, y:5};

    println!("{:?}",p1);

    let p2 = & mut p1 ;

    p2 = Point { x:10, y:15}; //Type mismatch;
    *p2 = Point { x:10, y:15};

    println!("{:?}",p1);
}

Here p2 is a reference variable to the struct p1. So *p2 is like p1. Its type is same as p1 type which is Point.

In a nutshell, in Rust , reference variable can be use like a actual variable. However if we want assignment through reference vaiable then we have to use de-reference operator as illustrated above.

This is in-contrast with C pointers where pointer variable always to be de-referenced.

1 Like

TL;DR. Maybe later.

However, if I was new to programming the opening sentence "In Rust, a variable is a tag to a meta-data on some data." would have flawed me immediately and it's unlikely I would have read further. Or at least that would have been the case when I was introduced to programming, with BASIC, at age 16.

If this is intended for those new to programming you need to remove all mention of C, or any other language, they won't know what you are talking about.

You have used terms like "pointer", "struct", "heap" and "trait" without defining them first. Those new to programming won't know what you are talking about.

2 Likes

References are not pointers

They happen to be implemented using pointers on machine level, the same way C pointers happen to be implemented using integers on machine level. But pointers aren't integers. References aren't pointers.

  • Box<i32> is also a pointer. Its in-memory representation is identical to &i32. From C perspective, both pass data "by reference" exactly in the same way, and both would even work if passed to a C int* function.
  • Box<str> and &str are structs containing data pointer and a length. They can't be cast to a pointer.

Only shared borrows (&) are copyable. Exclusive borrows (&mut) are not (Rust does what it calls "reborrowing" to make them copyable-ish, but it's intentionally very limited).

References are about borrowing. They are about borrowing values, not variables, e.g. you can borrow a field of a struct. Some types even "fake" borrow things with zero-sized types like PhantomData just to enforce scopes or exclusive access to a shared resource.

Borrowing is a completely different angle that has no direct counterpart in C. If you try to think about it by analogy to C pointers, you're going to struggle with the borrow checker forever.

2 Likes

Not sure what qualifies as "complex" here, but structs don't automatically imply heap allocation.

No, that's not how it works. A Rust reference is a pointer (albeit with extra constraints on uniqueness and lifetimes). It's just that the member access (dot) operator performs automatic dereferencing when the left-hand side is a reference type, so if T has a field f, then t.f works for both t: T and t: &T.

Therefore, the following is also incorrect:

because assignment through a reference-to-mutable also works with the dot operator:

struct S {
    f: i32
}

let mut s = S { f: 0 };
let mut r = &mut s;
r.f = 1; // this compiles, and now s.f is 1
1 Like

They are also about indirection though. That's crucial. It's perfectly fine to think of a reference as a (smart) pointer, because it's just what it is.

1 Like

I think you're leaning too heavily on the "specialness" of references and explaining everything as if "references" and "data" are different things.

People do this in C, too, and they say things like "C is pass-by-value for primitive types, and pass-by-reference for pointers." But this is more complicated than the truth, which is just that C is pass-by-value for everything (except arguably arrays), and pointers are values just like everything else. Of course passing a pointer-to-X by value is passing X by reference; that's what a pointer is.

Rust is the same way. In my opinion, drawing a dichotomy like this

is rather missing the point, and making something appear to be special when in fact it isn't. There's no language-level distinction here.

A variable contains data. That data might be of type i32, or it might be of type Box<i32>, or &'a i32, etc. You can take that data out of the variable, and put it in another variable, or inside an array or a struct or something, but it's still an i32 or Box<i32> or &'a i32. That's why Rust allows you to make vectors of references and still checks that the vector doesn't outlive any of the contained references.

If you go down the route of thinking "a variable owns data, except when the variable is a reference, in which case it references data" you're drawing an arbitrary distinction between "references" and "data" which doesn't exist in real life. References are data. Not recognizing this fact is likely to cause confusion when analyzing more complicated lifetime relationships, like variable-containing-reference-to-slice-containing-references-to-structs-containing-references, because your mental model doesn't scale.

There aren't "reference variables" and "actual variables". There are variables, which may contain references. Or not.

4 Likes

No, I think that's unhelpful framing, and it leads to people trying to build tree structures out of &.

Arc<T> is also an indirection. Box is also a smart pointer. Calling references these things is technically true, but fails to capture what is unique about them.

The closest analog is IMHO compile-time locks for shared read or exclusive write: you lock a data structure to ensure it doesn't go anywhere, and then you pass that lock's "handle" to let others operate on the data while it is locked.

My mother is a special someone to me: she's my mother, after all. She is also unique in her DNA and stuff.

But she is one of 8 billion people in a planet. She is a human being. She is a person. She doesn't cease to be so just because she has other, additional special meaning.

(For making a more technical point, I could invoke the Liskov substitution principle at this point, but now that would be unhelpful.)

Anyway my point is that not knowing that references are pointers also leads to confusion. I intentionally didn't assert that they are "just like in C" or anything like that. Pointers are a much more general concept in computing, they aren't necessarily associated with any particular language.

1 Like

First, there's a distinction between what is true and technically correct, and what is useful:

A guy flying in a balloon is lost, and when flying over a village shouts to a villager below: "Where am I?" and the villager shouts back "You're in a balloon!"

And then, references are not pointers in exactly the same way pointers are not integers. They do fail the Liskov sustitution principle. They support some pointer-like operations, but not all. Specifically, a C pointer can either own or borrow, and such pointers can be used interchangeably. References can't, on principle. You can't even say the other way that pointer is a kind of a reference, because then you get circle-ellipse problem.

1 Like

Ideally I would like to remove those

Please don't imply that I'm trying to be pedantic. I'm not – I genuinely think that knowing why references are pointers is useful. I'd go as far as saying that it's necessary for building the right mental model. (Also, one can build a tree out of &Nodes, and I'm pretty sure I would do just that if I ever had the patience to build the AVR Rust toolchain and give it a shot on my ATmega328 board.)

I see this differently. To me, C pointers can't own, because C pointers don't have the ownership semantics aspect to their type. Do I get a type error in C if I try to move out from a value behind a pointer? No. Does a pointer I got from malloc() get automatically free()d when it goes out of scope? Also no. Do I get a compiler error when I copy a struct with a bunch of pointers which are supposed to be unique/"owning"? Still no. C doesn't even have the concepts of ownership or moving, so everything associated with these concepts is merely a question of convention.

Analogously, I'd say they can't borrow either – again, it's just not a thing in C, nothing enforces borrowing rules. So C pointers don't behave like a combination of owning and borrowing entities in Rust – they are simply completely ignorant of this dimension of typing and semantics.

To draw another analogy using human beings: I wouldn't call someone who's neither a doctor nor a lawyer an "expert in medicine and law," just because this person could perform both kinds of jobs equally badly.

What I mean to say is that Rust do auto de-referencing and from Programmer's point of view
that I can use t.f whether t : T or t : &T.

But I need to remember t is a reference and I cannot do t = q where q : T .

So based on discussion, can I conclude that
Rust references are "Pointers" ? or perhaps "Super Pointers" and Rust do auto-de-referncing ?

I'm not talking about mut or borrowing now.

You're using analogies of additive features (subclassing, being a doctor), but references are about restrictions (thus causing circle-ellipse problem in LSP). It's important what they don't allow compared to pointers.

I think this is-a pointer thing comes up because people really care about not copying, and this use-case overlaps with C's:

Borrowed Owned
&i32 Box<i32>
By reference &i32/int*
By value int

but equating that one case gives wrong mental model for what is not a reference. In C "not pointer" implies direct value and copying. In Rust "not reference" implies owning, saying nothing about indirection or copying.

Saying references are pointers suggests they exist on the Y axis (by-ref vs by-val) in the table above. But references in Rust exist on the other axis of ownership (by-ref vs owned). It happens that the by-ref case looks very similar, but they're part of different concepts, on different axis.

3 Likes

That's a brilliant insight. The differences between Rust's references and C`s pointers is multi-dimensional. Viewing those differences via projection into just one dimension is never going to provide sufficient understanding.

2 Likes

thanks for your explanation. I guess.. I getting the point

yes,...

On reflection, let's not let this discussion go too far off the rails.

discussion about whether references are pointers

By the same token, saying references are not pointers suggests the opposite: that they are copies. I would rather say "references are not just pointers", or "references are different from pointers", rather than simply saying "references are not pointers", which to my mind is even more likely to cause confusion. After all, if I were in a balloon and I shouted down to a person on the ground, "Where am I?" I certainly wouldn't want the answer to come back "You're not six miles west of Abilene!"

(Particularly if the intended meaning was "You're actually six miles west of Abilene and one hundred meters off the ground".)

But that is twisting my words. I never asserted that Rust references don't have additional properties to them. Those are exactly why they are useful, of course.

I disagree with this. These are inherently interrelated. How does owning not say anything about indirection or copying? An owning value means you can only move it, i.e. copying is not allowed. (Except for the very special case of Copy, which is just a subtype of Clone, and now that is really an orthogonal dimension.) It also means that the value in question itself is not behind indirection.

You are using Box as a counterexample, but I don't get why it should be a counterexample: it works perfectly consistently with other types. When you say let b = Box::new(42);, the Box-typed value in the variable b itself is definitely not behind indirection. It contains the indirection, but that is a completely different thing.

Or do you mean that ownership doesn't imply that a type has or has not pointers embedded in it? That doesn't work out as a difference, either. I could just as well create a typedef struct { void *inner; } Box in C, and then say the exact same things about it: when given Box b = Box { .inner = ptr }, the variable b is a value without indirection, and it contains a level of indirection, and that level of indirection is supposed to be unique, and it is an error to copy it.

The difference between C and Rust is "only" that C lets you do the wrong thing with its pointers. A pretty big difference in usability, but it doesn't imply that the two pieces of code do different things when used correctly, nor that they require a different mental model when it comes to understanding memory layout.

1 Like

But in the Rust book, the diagram implies that it is referring and there is a pointer in the diagram. As told by you the important point is that references are also about not owning.
So the dual view is required ?