How to properly use dereferencing mechanism?

Hello, everyone :wave:

I've just started learning Rust, and I've some problem with understanding content in Chapter 4 of Rust Book. It's related to referencing rather than moves or borrowing.

The full example is below and also on a Playground.

For me, a Rust beginner, a &String is a reference which can be used in read-only fashion. A reference lives on the stack. On the other hand there is a dereferencing operation, which retrieves a value from the heap.

Question 1:
push_str is a method operating on String, so (*ref_to_string).push_str is meaningful for me as we are dereferencing (obtaining the value from heap) and modifying it. Why then ref_to_string.push_str also works? In my imagination is a situation when we are invoking push_str on a pointer (so ptr, length, capacity struct) which does not have this method appended.

Question 2:
If (*ref_to_string).push_str works, why I cannot obtain a value from the heap and assign it to a new value q via let q = *ref_to_string?

Question 3:
Why are push_str (as a method of String type) works on both reference (ref_to_string ) and a String value, and addition do not work in the same way with reference to int (ref_to_int), and the int value itself?

fn main() {
    fn take_string_ref(ref_to_string: &mut String) {
        // Question 1
        ref_to_string.push_str("without dereferencing |");
        // above is syntax sugar for String::push_str(ref_to_string, "without dereferencing |");

        (*ref_to_string).push_str("| with dereferencing");

        // Question 2
        let s = ref_to_string.to_owned();
        // let q = *ref_to_string; // <- cannot do a dereference
    }

    fn take_int_ref(ref_to_int: &mut i32) {
        // Question 3
        // ref_to_int += 1;
        *ref_to_int += 1;
        println!("-> {}", ref_to_int);
        println!("-> {}", *ref_to_int);
    }

    let mut x = 5;
    let y = &mut x;

    let mut some_string = String::from("");

    take_int_ref(y);
    take_string_ref(&mut some_string);

    println!("value: {}", some_string);
}

Question 1. Generally the dot operator will add a reference or dereference as many times as it takes to match the signature of the function, so for example, this also works:

(&mut &mut &mut ref_to_string).push_str("without dereferencing |");

Question 2. When you type (*ref_to_string).push_str, the compiler finds the push_str method that takes an &mut String and then automatically adds an &mut so the signature matches, turning it into (&mut *ref_to_string).push_str. Since the reference you just dereferenced-away was added back, the value is only mutably borrowed by this operation.

However, the operation let q = *ref_to_string would result in q having the type String, i.e. q would be a ptr, length, capacity struct that has ownership of the data pointed at by ptr. However, the variable behind the ref_to_string already has ownership of that allocation, and the compiler prevents you from creating two variables with ownership of the same memory, since this would lead to a double-free once both String variables have been dropped.

To get an owned copy of the data, you must explicitly clone it by calling the clone method. Rust never performs expensive clones automatically.

Question 3. The += operator will, unlike the dot operator, not perform any automatic dereference operations, so you must put the right amount of dereferences such that the left-hand-side is an i32 and not e.g. &mut i32.

4 Likes

Thank you for fast and in-depth answers!

I would like to elaborate on some things more.

Question(s) 4:

Generally the dot operator will add a reference or dereference as many times as it takes to match the signature of the function

I will decouple it to a few separate questions:
4.1. Is it specific only to the dot operator?
4.2. Is it "built-in" behaviour or based on trait-y implementation and thus can be shared with other operators / structs?
4.3. Is it something related to "pointing-to-stack" vs "pointing-to-heap" (i32 vs String)?
4.4 Is it safe to implicitly inject a reference / dereference into code without programmer knowledge?

Yes well, actually the auto-dereferencing part of the story also applies in coercions, as I mentioned below

Yep

Nope

Interesting question. I guess the Rust programming language has decided that it is considered “safe”. In particular automatic referencing is super common. You wouldn’t want to need to always have to write (&mut x).foo() if x is some struct with a foo(&mut self) method. On the other hand it makes it a bit harder to keep track of whether you’re currently having a variable that’s directly containing a struct or just a reference to it. Also, de-referencing does have the ability to call arbitrary code, through the Deref trait. This plays into your question about if automatic dereferencing or referencing is built into the language: The dereferencing logic itself can be customized with the traits Deref and DerefMut.

On the other hand, common Deref implementations don’t do a lot more than actually dereferencing a pointer/reference. And then there’s the type system: Most of the time there is only one way to insert referencing/dereferencing operations in a way that the result type-checks anyways, so the implicitly injected code does “what the developer wants” and is usually inexpensive and side-effect-free.


There’s also coercions that play into the auto-deref discussion. Note how you can also do x.push_str(&some_string) where some_string: String even though push_str expects a &str.

Feel free to read through the details in the Rust Reference.

In particular:

Method call expressions - The Rust Reference

and

Type coercions - The Rust Reference

1 Like

I think it's perfectly fine. The compiler's borrow checker will tell you if you mess up the references.

There's one situation, however, when you absolutely do want to know your exact levels of references: in unsafe code interacting with raw pointers. So in unsafe, it's probably best to ascribe full types at least to raw pointers, and likely to references which are going to be turned into raw pointers as well. Utility methods implemented on only a single, specific type, like [T]::as_ptr() help a lot, too.

2 Likes

Also note that String itself is a kind of smart pointer and has a known size at compile time, so it usually lives on the stack. *(&String) doesn't necessarily obtains the value from heap, you need to dereference it one more time to reach the actual string data that lives on the heap (but this data is usually not useful unless you are using it behind a pointer, because it has not a known size on compile time).

For beginners which will read that in the future:

I've also found the reference to automatic referencing and dereferencing in the 5.3 chapter of The Rust Programming Language. It's just written a little after when I first met with the concept of references and methods on referred values.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.