Weirdness of '&' and '*' (for an old c programmer)

#1

Beforehand: To me, Rust is currently the most fascinating programming language i know about, solving several key problems i have been fighting against for 3 decades as ‘real world prgrammer’ in astonishingly innovative ways.

But… : one thing that is quite opaque to an old C(xx) programmer is the sematics of the symbol &, especiailly :

  • when is it needed,
  • when is it optional,
  • when has it no effect at all
  • when has it surprising effects

here is an example:

use std::num;

struct Vector {
    x: f64,
    y: f64,
}

fn length(p: &Vector) -> f64 {
    (p.x * p.x + p.y * p.y).sqrt()
}

impl Vector {
    fn length(&self) -> f64 {
        length(self)
    }
}

fn doit() {
    let p = Vector { x: 3., y: 4. };

    let l = length(&p); // & required, omitting yields compiler error

    let l = p.length(); // & NOT required
    let l = &p.length(); // but allowed
}

When calling the funtion length, the & is required, otherwise the compiler will refuse the code.

When calling the method length, the & is completely optional. One may even write an arbitry number of & characters, it doesn’t harm. (or * charcaters, for derefenecing)

Another somewhat related feature, that is really surprising to me: when you pass a reference to some object to a pattern matching construct, which contains variable bindings, than all these bindings are ‘magically’ converted to references. I desparately searched for an explanation of this behaviour in ‘The Book (Version 2)’, and finally found this little paragraph at the end of chapter 18

Anyway, today’s Rust doesn’t work like this. If you try to match on something borrowed, then all of the bindings you create will attempt to borrow as well.

I ask myself: What does ‘attempt’ mean in this context? Whar happens, if ‘attempting’ fails ?

The big problem with Rusts & (and *) for an 'old C(xx) programmer like me: In this ancient languages, it is absolutely clearly defined when you have to use some keyword or operator (&, *, ref, ->, etc.), to create a reference (or a pointer, what is semantically equivalent), or to derefence a reference. The & or * or ‘ref’ is NEVER optional, and you may NEVER add this operators arbitrarily without breaking the code.

Contrarily, in Rust, this area looks somewhat ‘fuzzy’ to me. Of course, this may stem from the fact that my brain still strongly associates ‘&’ with ‘Pointer’, but in Rust ‘&’ is actually a synonym for ‘borrowing’, which is a related, but in it’s depth different and perhaps more advanced concept.

Anyways, my suggestion for an upcoming revised version of ‘The Book’ would be: A chapter on it’s own about the actual semantics and usage of ‘&’ an ‘*’. This would help dinosaurs like me a lot to aquire ‘Rustical’ sense of coding.

1 Like
#2

Your confusion is understandable. I’ll try to give a bit more detail about what’s going on in each of those cases.

let l = length(&p); // & required, omitting yields compiler error

Hopefully this part doesn’t need too much explaining. The function takes a &Vector, so you need to borrow p when calling it.

let l = p.length(); // & NOT required

It’s true that you don’t see a & being used at the call site, but remember that the method’s signature fn length(&self) -> f64 does describe that the method call is borrowing self immutably. You can also write methods in a fully qualified form, which for this method looks like this: let l = Vector::length(&p). As you can see, a reference is present in this expanded form even though you don’t see it in the “sugary” syntax that’s normally used.

let l = &p.length(); // but allowed

Now this is where things get more interesting. When you call a method, Rust will implicitly dereference as many times as it needs to make the method call work. This is why you could also write this as let l = &&&&&&&&&&&&&&&&p.length() and the method call will still succeed. You could even do the same with the fully qualified syntax like so: let l = &&&&&&&&&&&&&&&&Vector::length(&p)

I forget where in the Book this implicit dereferencing stuff is discussed though or what the feature is even called. It’s probably something that should be called out explicitly somewhere if it isn’t already.

#3

Note, &p.length() is evaluated as &(p.length()). Still, the method resolution will also work fine for (&p).length() or (&&&&&&&p).length(). If you care to read internals, see:
https://rust-lang.github.io/rustc-guide/method-lookup.html

5 Likes
#4

This post is great, but you’ve made one error:

That’s not what’s happening here. Here, l is a &f64. That is, the order of operations:

let l = &(p.length()); // is this

let l = (&p).length(); // not this

@megamac, it can be a bit of an adjustment, but honestly, in my day-to-day work, I just type stuff and let the compiler tell me if I need an & somewhere or not. The messages are pretty clear, and then you know what to do in a given place. Learning the exact rules is, of course, a good way to level up in understanding, and I don’t want to discourage you from doing so, but it’s better for trivia and less for “how do I get my job done”, IMHO.

I would recommend checking out https://stackoverflow.com/questions/28519997/what-are-rusts-exact-auto-dereferencing-rules, which gets into this topic in detail. The only thing that question is missing is the match stuff, which you can find the rules-lawyer-y bits at https://doc.rust-lang.org/stable/reference/patterns.html#binding-modes You might also enjoy https://github.com/rust-lang/rfcs/blob/master/text/2005-match-ergonomics.md, which covers the “why” of this design a bit better.

Hope that helps!

4 Likes
#5

Ah right, good catch

1 Like
#6

:100:

Especially when you’re comparing two almost similar types (say an array and a &Vec), just add & until the compiler stops complaining.

3 Likes
#7

No. References are as much pointers, as pointers are integers.

  • &str is a struct passed by value. It’s equivalent to struct { char *data; size_t len }.
  • Box<u8> passes a byte by reference, equivalent to unsigned char *.

In Rust the distinction is about ownership. Even though borrowing of most types compiles to pointers, if you try to use borrows as pointers, you’ll often end up with wrong semantics and fight the borrow checker.
The same as if you said about C that pointers are integers, and had a fight with the type checker about (void*)2 + (void*)2.

3 Likes
#8

Here’s another thought, there are some times where an object can be auto-dereferenced. For example, if I have a struct:

struct Foo {
    data: Vec<u8>
}

and I own an reference to it:

{
    let my_foo = &another_foo;
}

then there are some instances where I can autoderef. Take the case with this impl for Foo:

impl Foo {
    pub fn byref(&self) -> Self {
        Foo { data: self.data.clone() }
    }
    pub fn bymut(&mut self) -> Self {
        self.data.push(1u8);
        Foo { data: self.data.clone() } 
    }
    pub fn byown(self) -> Self {
        //here we can invalidate `self` if we want. We own it now
        drop(self);
        Foo { data: Vec::new() }
    }
}

Then here we see a few cases of interesting things going on with autoderefs:

In the case of byref:

We can access one of self's value’s without the need of doing something like this:

(*self).data

or this

self->data

why? Well, let’s first clear something up that might confuse. This:

fn bar(&self);

is actually this:

fn bar(self: &Self);

where Self is whatever the type is that is implementing this function. self just so happens to have some syntactic sugar attached. So with that out of the way, I’ll start by quoting the deref page:

If T implements Deref<Target = U> , and x is a value of type T , then:

  • In immutable contexts, *x on non-pointer types is equivalent to *Deref::deref(&x) .
  • Values of type &T are coerced to values of type &U
  • T implicitly implements all the (immutable) methods of the type U .

Therefore something like what we do in byref would work, as &T implements this logically. Therefore we can directly access T or in this case Foo's fields directly without the need for implicit dereferencing.

In the case of bymut:

The case of bymut is pretty similar. It just so happens to be that it’s written like this:

fn bymut(&mut self) {//

and desugars to this:

fn bymut(self: &mut Foo) {//

so one thing to keep in mind is that superficially, anything you can do with &T or &Foo in this case, you can do with &mut T or &mut Foo. The mut part means that you essentially own it for a moment. By own I mean that you can mutate it and have full, exclusive access to this. Note that this does not copy the value, but instead insures that nothing could potentially run into this situation:

struct Container {
    data: &usize
}
impl Container {
    pub fn print(&self) {
        println!("{}", self.data);
    }
}
let mut mydata = 32usize;
let cont = Container { data: &mydata };
cont.print(); //Or Container::print(&cont)
//The above will output "32"
*mydata = 3; //Note the non-autodereferencing here on the lhs
cont.print(); //Uh-Oh, rust has a problem here!

The problem is that when you have a &T you are guaranteed that it will not change as long as you don’t use unsafe code.

In the case of byown:

In this case, we take literal ownership of self:

fn byown(self) {}
//Desugars into
fn byown(self: Foo) {}

Meaning that now we have full ownership of self and can do whatever we want with it. The actual data of Self is now stored in the stack space of byown (For this explanation. Compilers have their magic dust that they can use to change this) and therefore the data isn’t elsewhere, and so we don’t have to worry that it will invalidate the rule I talked about under bymut, in fact we can consume it, meaning that it gets dropped/deallocated/deleted or the equivalent of this:

public static void drop(Foo self) {
    free(&self);
}

(If I understood that correctly from a few minutes of googling). This is probably wrong, but I’m no C dev, and I have no experience designing a functional (not the paradigm) program.


Anyway, I hope that that cleared some things up for you!

1 Like
#9

Thanks again, folks, for reading and answering my newbie questions !

closed #10

This topic was automatically closed after 34 hours. New replies are no longer allowed.