Return Reference to Local Variable

Hello!

The title is a bit misleading, please bear with me.

Is something of the following form possible with for example Pin?

fn f() -> (Vec<u8>, &[u8]) {
    let s = receive_data();
    let important_data = &s[3..5];
    (s, important_data)
} 

This would obviously be possible by returning the start and end indices of important_data instead of a slice.
However, returning a slice comes with some ergonomic benefits.

I presume that this should be sound since the memory region backing the Vec shouldn't be moved when returning it from the function. Therefore any pointers into this region should still be valid?

Thanks for any help!

Best,
ambiso

There is the solution

You can also use crates like owning_ref or rental, but I don't think that this is worth it. If you are tempted to use thise crates, you should really rethink your data structures, as it points to some more pervasive problems with your data structures.

Don't use Pin, it is incredibly subtle and will require unsafe to do anything useful with it.

This is incorrect, there is nothing in the lifetimes that links the slice to the Vec. The caller could drop the Vec before the slice, making the slice invalid.

5 Likes

Thanks for your response!
Owning ref looks great!

Why do you think I should rethink my data structure design?
The point of this is to avoid copying the returned slice into a new Vec (removing an unnecessary allocation). It's meant to be used in this way:

fn g() {
    let _, data = f();
    do_something_with_data(data)
}

The use case would be reading a file or from the network, parsing/tokenizing some data from it and never copying information out of the original allocation where it can be avoided.

The actual underlying Vec<u8> isn't meant to be used for anything other than storage and being parsed in f().

The alternative implementation I can think of would collect the slice into a new Vec and return it, but this of course copies data:

fn f_prime() {
    let s = receive_data();
    s[3..5].collect()
}

This becomes costly when f_prime() does some actual parsing, and returns a Vec of Vec<u8> instead of just a single Vec<u8>.

A nice work-around for this kind of situations is using the Continuation-Passing Style pattern:

fn with_data<R, F> (ret: F) -> R
where
    F : FnOnce(&'_ [u8]) -> R,
{
    let s: Vec<u8> = receive_data();
    let important_data = &s[3 .. 5];
    ret(important_data)
}

which can be used in the following fashion:

fn g() {
    with_data(|data| {
        // do something with data
    })
}

or, if you happen to have a fn do_something (data: &[u8]) function already defined:

fn g() {
    with_data(do_something)
}

For instance, if someone really wanted to own important_data by .to_owned()-ing it, they could:

let owned_data: Vec<u8> = with_data(|data| data.to_vec());
let owned_data: Vec<u8> = with_data(<[_]>::to_vec);

let owned_data: Vec<u8> = with_data(|data| data.to_owned());
let owned_data: Vec<u8> = with_data(ToOwned::to_owned);
5 Likes

I like your solution, but I fear this could result in something similar to the callback hell.

Maybe this could be ameliorated with async?

Another solution would be to take &mut Vec<u8> and returns a subslice of it.

fn f(v: &mut Vec<u8>) -> &[u8] {
    *v = receive_data();
    &v[3..5]
}
1 Like

Alternatively make it a method on an object with a Vec inside and return a slice of that.

Yep, this might work, although it's giving me some C vibes that I personally don't like.

This is essentially the same as what @Hyeonu suggested. I think what I don't like about the solution is that the caller has to worry about the data structure created and used by f(). This may be justified if f were capable of handling different types of data structures but it's always just a Vec.

I don't think it's viable solution in non-GC language to not make caller to worry about the data structure created and used by f(). Because you want to return a referenc(&[u8]), someone else should "own" the backing value and storage during its lifetime. It can be caller, or caller of the caller, but not some non-local system that automatically collects non-reachable resources.

1 Like

See if this could be a valid workaround for you:

use std::ops::Range;

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Partial {
    data: Vec<u8>,
    range: Range<usize>,
}

impl Partial {
    fn new(data: Vec<u8>, range: Range<usize>) -> Self {
        let len = data.len();
        assert!(range.end <= len);
        assert!(range.start <= range.end);
        
        Self {
            data,
            range,
        }
    }
}

impl AsRef<[u8]> for Partial {
    fn as_ref(&self) -> &[u8] {
        unsafe { self.data.get_unchecked(self.range.clone()) }
    }
}

fn receive_data() -> Vec<u8> {
    unimplemented!()
}

fn f() -> Partial {
    let s = receive_data();
    Partial::new(s, 3..5)
}

I am pretty sure the code is sound (blame me if I am wrong) and it does not need to Pin the owning structure. Obviously, as others said, it is necessary to have an owning object around, in this case it is the Partial structure.

EDIT: @RustyYato highlighted that a decreasing range is valid, therefore it is necessary to check range.start against range.end, not against len.

You need to change,

assert!(range.start <= len);
// to
assert!(range.start <= range.end);

Because 2..0 is a valid range

1 Like

Yes, the caller has to clean up the Vec, however the difference is that I don't want to explicitly construct the backing Vec in the caller. The callee should construct the Vec and return it, handing over management of its lifetime to the caller.

This works great in this case, but I feel it doesn't generalizes well:
Say I have an f() that's supposed to return a single &[u8] and an h() returning a Vec<&[u8]>.
Do I have to manually write a type such as Partial and emulate the behavior of the returned type for every potentially returned structure?

I think this is what a solution with the rental crate might look like, however I can't get it to compile. It seems the crate hasn't been updated for a long time.

use rental::rental;
 
rental! {
    mod my_rentals {
        use super::*;

        #[rental]
        pub struct MyRental {
            data: Vec<u8>,
            refs: Vec<&[u8]>
        }
    }
}

fn receive_data() -> Vec<u8> {
    vec![1,2,3,4]
}

fn f() -> my_rentals::MyRental {
    let x = receive_data();
    my_rentals::MyRental::try_new(
        x,
        vec![&x[0..2], &x[1..3]]
    )
}

Note that this solution won't compile anyway, since x is first moved and then borrowed, but the compilation fails earlier to that, since rental's macro doesn't appear to be functioning:

error: cannot find derive macro `__rental_structs_and_impls` in this scope
  --> src/main.rs:3:1
   |
3  | / rental! {
4  | |     mod my_rentals {
5  | |         use super::*;
6  | |
...  |
12 | |     }
13 | | }
   | |_^
   |

Unfortunately I don't think owning_ref is capable of what I want.

I thought returning just one slice could ease the complexity of the question while obtaining the same answers, but there seems to be a different quality to what I'm asking.
I'd like a general solution (i.e. a crate or a method with little manual labor) to return a 2 values where one can hold references to the other.

It may be that what I am asking is impossible in safe Rust (or at all).

Essentially I'd like this playground to compile.

However, the function moves the Vec upon returning it, thus any references to it become invalid.
While the references to the data contained inside the Vec should still be valid, since the backing memory region containing the actual elements isn't moved when the Vec is moved, the lifetime of the Vec and the lifetime of its backing array aren't tracked independently.

To me, it looks like you are asking for a "general" way of expressing a very specific situation. For instance, you were writing about a situation "specific for a Vec", but with the example you made you are now talking about a vector of slices. What level of genericness do you want? If you are trying to solve your problem for "anything that can own data", I think that it could be impossible to write.

Just a silly question: why are you trying to create this kind of "abstraction over ownership and borrowing"? Even if you find a generic way of expressing your intent, I am not sure that hiding the reference to a slice behind a custom struct will help with a more-than-hundreds-LoC codebase. Maybe if you show us your practical needs, we can try to help you a little more.

Sure, I agree, at some point lifetime checks may become undecidable.
Honestly, there is not as much practical need as there is curiosity.
I'd like to find out the limits of what is efficiently checkable using Rust's lifetimes, and at what point we need a more sophisticated prover to ensure a programs' correct lifetimes.

I'm not entirely sure about what level of genericness I want, but lets try the following definition:
I want to return a struct of 2 values: a data value and another value of arbitrary type that contains references to the data. Concretely, I don't want the compiler to verify cyclic references, nor any recursive references within the struct. However, the references and data value should be able to have any internal structure they want. (But I'm not asking the compiler to verify this part).
Potentially, this illustration might help:

25122019 1842|690x450

In the image on the top right example you can see the references being a binary tree of references. I'm not asking the compiler to check lifetimes of the binary tree, rather this is to demonstrate that the references could also be of a different structure than a Vec.

As you can see I have some difficulty expressing my ideas - if you are able to reformulate them in a better way, please do!

1 Like
  • or use ::rental::* rather than just use ::rental::rental.
1 Like