Tricks to avoid?: cannot return value referencing local variable

Anytime I see the subject error I know it's because data created in a function is trying to be returned using a reference that will point to deleted memory as soon as it goes out of scope.

So how does this work? Isn't it technically returning a reference to data the local function owns?

fn rem_first_and_last(value: &str) -> &str {
    let mut chars = value.chars();
    chars.next();
    chars.next_back();
    chars.as_str()
}

Why don't higher level string functions work this way in Rust? Why cant replace return &str instead of String? My new obsession is avoiding String as I like the pureness of knowing how memory is being stored and the performance implications, is there merit to my assertions or does this all really not matter that much?

In your example, the underlying string is owned by the caller.
You are working only on the slice, not the underlying data.

Good question! The reason this can return a &str is because the returned value is a substring of value, and so the returned value can point into the same storage as what value points in to.

This is different from replace because after replacing stuff in the string, you get some new string data that does not appear in the input string, so you cannot just point to a substring of the input.

4 Likes

If your example code didn't work, the following would be invalid, too:

fn identity(x: &str) -> &str {
    x
}

After all, you are returning a local variable, right?

The point is, it's not about a particular variable being local or not. The lifetime associated with a reference is not the lifetime of the variable that happens to store the reference itself. It is instead the lifetime for which the referred value is valid (at the least).

You can tell this by looking at the signature of Chars::as_str():

impl<'a> Chars<'a> {
    pub fn as_str(&self) -> &'a str { … }
}

You can see that the lifetime of the returned string is explicitly annotated 'a, and it is not tied to the lifetime of the pointer to &self. Now if you also look up the signature of str::chars():

pub fn chars(&self) -> Chars<'_>

then you can see that – by lifetime elision – the lifetime parameter that occurs in Chars<'a> is tied to the lifetime of the underlying string slice.

6 Likes

Let's walk through how this function works to see how we're not returning a reference to locally created data.

rem_first_and_last takes a &str, which looks something like...

`value: &str`        (n.b. these are UTF8 bytes, not actually chars)
+---+---+                     +---+---+---+---+---+
| 5 | s | --- pointer to ---> |'A'|'B'|'C'|'D'|'E'|
+---+---+                     +---+---+---+---+---+

The 5 is the length and the s is a pointer to your string, which is a bunch of bytes in memory somewhere. This function doesn't own the str (on the right), it only has a reference to it (value). You can freely copy value around. When you create the iterator, it stores it's own copy of this reference, too. The str data is still owned by something outside of this function.

When you call next on the iterator, it figures out what the next char is, converts it from UTF8 to char, returns it, and "chops it off" of the beginning of it's copy of the &str.

+---+---+                         +---+---+---+---+
| 4 |s+1| --- pointer to -------> |'B'|'C'|'D'|'E'|
+---+---+                         +---+---+---+---+

It changed the length and pointed a little further into the str. But it didn't change anything on the right. Then when you call next_back, it figures out the char in a similar matter, and just shortens the length:

+---+---+                         +---+---+---+
| 3 |s+1| --- pointer to -------> |'B'|'C'|'D'|
+---+---+                         +---+---+---+

The other parts of the str are still there in memory, but they're not covered by this &str any more. No part of the str was modified, and the only new thing created was the copy of value. (The iterator did alter the length and pointer in it's copy of value, but not in the str itself -- not the characters "ABCDE".)

When you call chars.as_str(), it just returns this internal &str that now has length 3, and that gets returned to the caller. No local data is pointed to in this return -- the str pointed to still part of the original str that something outside this function owns.

It's all similar to this in the end:

fn f(slice: &[i32]) -> &[i32] {
    &slice[1..slice.len() - 1]
}
3 Likes

@geebee22 True, but my underlying question is still concerning why std doesn't provide a way to do replace on a &str owned by the caller.

I guess they decided to return String from replace as an optimization?

@alice thank you and everyone too for giving such informative answers. I guess I'm marking yours as the solution because of its conciseness.

My last question should probably be a new question ("does it matter?"). I'll post separately, thanks!

I can think of a couple possible reasons:

  1. Returning a new value instead of mutating in-place is more idiomatic
  2. You can't replace parts of a &mut str in-place anyway because any changes in size will leave "holes" of uninitialized data or require you to write into memory past the end of the reference
  3. You can't replace a &mut String in place because when you replace a short sequence with a longer one you'll overwrite text that hasn't been checked yet or be forced to do multiple scans through the string and a re-allocation anyway
1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.