Return Reference to Local Variable

Let's see if I understand correctly what you are trying to achieve. In the bottom left situation you would like to use Rc (or Arc) and Weak, in order to create the cyclic relationship. In the bottom right situation, you necessarily need a Pinned structure.

Both situations have problems, and for a valid reason. The first case has an intrinsic runtime cost, but this is necessary to guarantee both soundness and avoid leaking data. In the second case, as already said by @RustyYato, it is extremely hard to create a safe and ergonomic abstraction because of the nature of Pin.

You could also try to use raw pointers inside your structures, but you will find out that at that point you would need to implement something similar to what is already available in the std in order to create safe abstractions. Unfortunately, the reason you are not able to express your intent is because it is extremely hard in a safe way.

Just a small note: this is hard to do for a specific data structure. Doing that for a "generic value" is almost impossible. Simple example: how can you handle an inner self-reference &T if the owning data is Rc<RefCell<T>>? You could have a clone of that Rc and the RefCell could be changed, creating an unsoundness hole with your reference. Maybe in that case you could store a std::cell::Ref as smart reference, but this means that you cannot handle generically different data layouts with different properties using the same approach. At least I don't think that it is possible in a sound way.

1 Like

Such a prover is under development; it's called polonius and you can read about it on IRLO, the Rust internals forum.

use ::owning_ref::OwningRef;

fn gen_vec () -> Vec<u8>
{
    (0 .. 10).collect()
}

fn f() -> OwningRef<Vec<u8>, [u8]>
{
    let v = OwningRef::new(gen_vec());
    v.map(|it| &it[3 .. 5])
}

fn main ()
{
    assert_eq!(
        dbg!(&f()[..]),
        [3, 4],
    );
}
1 Like

That helps a bit!
Although I still can't figure out how to make the more complex example work with rental:

#[macro_use]
extern crate rental;

use rental::rental;

pub struct Slices<'a> { slices: Vec<&'a [u8]> }

impl<'a> std::ops::Deref for Slices<'a> {
    type Target = Vec<&'a [u8]>;
    fn deref(&self) -> &Self::Target {
        &self.slices
    }
}
 
rental! {
    mod my_rentals {
        use super::*;

        #[rental(deref_suffix)]
        pub struct OtherRental {
            data: Vec<u8>,
            refs: Slices<'data>,
        }
    }
}

fn f() -> my_rentals::OtherRental {
    let v = vec![0,1,2,3];
    my_rentals::OtherRental::new(
        v,
        |v| Slices{slices: vec![ &v[0..3], &v[1..4] ]}
    )
}

fn main() {
    let v = f();
    println!("{:?}", (*v)[0]);
}

yields the following error message:

error[E0495]: cannot infer an appropriate lifetime for lifetime parameter `'a` due to conflicting requirements
  --> src/main.rs:15:1
   |
15 | / rental! {
16 | |     mod my_rentals {
17 | |         use super::*;
18 | |
...  |
24 | |     }
25 | | }
   | |_^

Yes, but I don't think OwningRef accepts any more complex datatypes as its second type argument (such as Vec<&[u8]>. I'm sorry, my original question was not adequately detailed as I've explained in a previous post.

1 Like

That's great!

Note that the bottom situations aren't what I'm concerned with (that's what the red X in the top right corners were supposed to symbolize. I should have written it in prose as well, I'm sorry)
I'm only concerned with the upper two.

I'm not sure about the example in your last paragraph, but that is a good point!
The rental crate does also seem to have mutability support so maybe they have some solution for it?
For now I'm assuming that the data is frozen and never changes.

Ok, I totally got the other way around! :sweat_smile:

Now I think I am missing something: in the two examples shown in the upper part of the image why do you want to bind together the owned data with a reference to part of it? Don't get this as a critique, I am really trying to understand your use case.

Let's make a practice and simple example: std::collections::btree_map::Entry is a sort of smart pointer to one element of a complex structure (in the sense that it is not just a slice of memory). You can pass Entry around and use it like an smart reference (a sort of) and it cannot outlive the owned BTreeMap. From what I understood, in this particular case you would like a structure that holds both BTreeMap and Entry, giving access only to Entry.

I understand that you have got a function that gets the owned data and returns a portion of it, but should not be more ergonomic to have an impl fn for the owned data that returns the portion? Like tree.entry() returns Entry.

Sorry if I am still misunderstanding your needs, but I still have the feelings you are trying to follow a hard path, and that you could make your life easier with a different approach :smiley:

I need them together because I want to be able to have ownership of the entire thing at once. I want to be able to move it around or lend it others.
Otherwise I can't return it as a package.

That's a great comparison!
Note some things in your example:

  • as long as the Entry lives, you cannot mutate the BTreeMap either!
    (Since the Entry is in a way borrowing the BTreeMap?)
    See this playground.
    This is the same thing as I want, actually I never want to modify the underlying data, since that might invalidate references to it.
  • Entry borrows the BTreeMap mutably, therefore you cannot even obtain an immutable reference on the map while the Entry lives - see here
  • It's not (trivially?) possible to move/return/borrow the BTreeMap and Entry combination in one go!
    (What lifetime would you assign to Entry?)

If you can return BTreeMap and Entry as a single entity (with some level of ~genericness~), I think you've also solved my problem.

To your second proposal.
This is essentially equivalent to what @alice and @Hyeonu proposed, correct?
Then it's still not possible to move the data around easily.

The point is that if you bind together the owner of the data and a reference to it, you cannot move the new thing around, because it needs to be pinned!

Let's make a bit clearer. Imagine that you have got the struct S containing your owned data and a raw pointer of a portion that you need. S impls get_ref that creates a reference from the inner pointer. Now, whenever you pass an instance of S around, the compiler is free to memcpy your data, invalidating the inner pointer. Crates that helps with auto-referential structures use some sort of workaround (i.e. saving the relative offset of the referenced inner object) in order to avoid this kind of problem, but it is has a runtime cost.

Yes, exactly. My point (and the reason why I am still missing something) is that:

  • if all the work on the data is performed in functions linearly down in the stack, then you don't need to pass the owned data
  • if you get the owned data and you just want to return the data and a reference to it up in the stack, maybe it is better to use the suggestion of the method, in order to separate responsibilities about getting the data and working on it
  • if you need to send you data and your ref to another thread, you must create the ref after the data is sent (because the object is moved like in the example above)
  • if you need to reference the data from multiple places, then Rc is the way to go (but I don't think that this is what you need)

I figured out how to quote!

I'm not sure if I understand you correctly, but let me give it a try!
Let's get even more concrete. Say S is:

struct S {
    backing_data_array: *const i32
}

Are you saying the compiler is free to memcopy the backing_data_array? But how would it know its length?

I think what would be unsound, is having a reference to the backing_data_array pointer itself, since S may be moved.

That's not the use case, I want to pass data up the stack (without any extra allocations).

Are you referring to the callback method?

I see how this is a slick solution, but still you're not owning the value.

Say I have 3 functions:
tokenize receives the data and does for example tokenization,
build_ast uses the tokenization to construct an AST
constant_prop evaluates certain select things in the AST (e.g. constant propagation)

Now the callback style would very explicitly propagate through the entire hierarchy:

fn tokenize(f: FnType) -> R {
  let data = receive_data();
  // ...
  fn(tokenized_data_with_references_to_data)
}

fn build_ast(fn: FnType) -> R {
  tokenize(|tokens| {
     // ...
     fn(tree_with_refs_to_tokens)
  });
}

fn constant_prop(fn: FnType) -> R {
  build_ast(|ast| {
    // ...
    tree_with_refs_to_tokens_but_constants_propagated
  })
}

Now this works, and it works great if you structure your code this way.
This style of responsibility can also be used if you used the owning style that I'm aiming for:

fn tokenize() -> Tokenized {
  let data = receive_data();
  // ...
  Tokenized {
    data: data,
    parsed: tokenized_data_with_references_to_data,
  }
}

fn build_ast() -> AST {
  let tokens = tokenize();
  // ...
  AST {
    tokens: tokens,
    ast: tree_with_refs_to_tokens,
  }
}

fn constant_prop() -> AST {
  let ast = build_ast();
  // ...
  AST {
    tokens: ast.tokens,
    ast: tree_with_references_to_tokens
  }
}

But what I'm aiming for could also pass data up the stack and thus allow you to structure your code completely differently:

fn load_syntax() {
  let tokens = tokenzie();
  let ast = build_ast(tokens);
  let constant_propd_ast = constant_prop(ast)
}

With the functions defined in the following way:

fn tokenize() -> Tokenized {
  let data = receive_data();
  // ...
  Tokenized {
    data: data,
    parsed: tokenized_data_with_references_to_data,
  }
}

fn build_ast(tokens: Parsed) -> AST {
  // ...
  AST {
    tokens: tokens,
    ast: tree_with_refs_to_tokens,
  }
}

fn constant_prop(ast: AST) -> AST {
  // ...
  AST {
    tokens: ast.tokens,
    ast: tree_with_references_to_tokens
  }
}

I believe this responsibility style (I think there's also a name for these two styles, but I don't remember) isn't achievable using callbacks, since you can never actually pass data up the stack without extra allocations (i.e. cloning the token slices).

No, it memcpy the structure in the sense that it copies, in this case, the pointer. But in this hypothetical situation:

struct S<T> {
    data: [T; 10],
    self_ref: &'self [T],
}

moveing S could copy data and self_ref around, meaning that data is stored in another position in memory (almost always true if data is returned from a function and cannot fit into registers, because if I am not wrong we don't have RVO or NRVO), leading to an invalid self_ref and to UB.

Why not something like:

struct Tokenizer {
    data: Vec<u8>,
}

struct Tokens<'a> {
  data: &'a [u8],
}

struct AST<'a> {
    tokens: Tokens<'a>,
    ast: &'a [u8],
}

impl Tokenizer {
    fn receive() -> Self { todo!() }
    fn tokenize(&self) -> Tokens { todo!() }
}

impl<'a> Tokens<'a> {
    fn build_ast(self) -> AST<'a> { todo!() }
}

impl<'a> AST<'a> {
    fn constant_prop(self) -> Self { todo!() }
}

fn load_syntax() {
  let tokenizer = Tokenizer::receive();
  let tokens = tokenizer.tokenize();
  let ast = tokens.build_ast();
  let constant_propd_ast = ast.constant_prop();
}

There is a clear separation between the owned data (handled by Tokenizer) and all the reference used by the other structures.

1 Like

I want to get something like this working:

use std::marker::PhantomData;

pub struct Extractor<'a, T: 'a, R: ?Sized + 'a, F: Fn(&'a T) -> &'a R> {
    data: T,
    cb: F,
    marker: PhantomData<&'a R>
}

impl<'a, T, R: ?Sized + 'a, F: Fn(&'a T) -> &'a R> Extractor<'a, T, R, F> {
    pub fn get(&self) -> &R {
        let cb = &self.cb;
        cb(&self.data)
    }
    
    pub fn from(data: T, cb: F) -> Self {
        Extractor {
            data, 
            cb,
            marker: PhantomData
        }
    }
}

fn f<'a>() -> Extractor<'a, Vec<u8>, [u8], impl Fn(&'a Vec<u8>) -> &'a [u8]> {
    let s = (0..10u8).collect::<Vec<_>>();
    Extractor::from(s, |vec|{
        &vec[3..5]
    })
}

fn main() {
    println!("It works: {:?}", f().get());
}

A struct Extractor that owns a value of type T and a callback that act on &T and produces &R, but I cannot get the lifetime working.

You have to use HRTB, something like this (playground).

1 Like

True! This is a case that must be avoided.
I think rental requires the type you're holding references to to impl std::ops::Deref, you only get access to the dereferenced item, not the actual value on the stack/registers.
This of course assumes Deref is implemented in a way to return you a non moving memory region (i.e. something on the heap).

Hmm okay, now I do see a bit of the appeal of this structure!
Yet still I find it weird that I can't just own the value - This should be something that's possible!

This would require immovable types in general, something Rust doesn't have

1 Like

While you're correct that rust doesn't have immovable types, they can be emulated with macros, unsafe traits, and Pin, as attempted in my stackpin crate. Then, I used even more macros and unsafe traits to define the Transfer trait, that allows executing user defined code on "moving" immovable types, much like C++'s move constructor. It would be interesting to see if these crates could solve OP's use case.

Yes, but this doesn't help with creating immovable types that can safely exploit the fact that they are immutable.

@changhe3

Now this looks super weird and interesting!
I'm not sure where the value is even stored now?
The computation to obtain the vec of references seems to be performed every time you want to obtain them, right? It's not actually storing the result of the computation alongside the data?

Additionally I have difficulty thinking about how to return a vec![&vec[3..5], &vec[2..4]] from the callback.

Isn't this what Pin is for? Could you elaborate on this or link some relevant resources?
The std::pin module starts out with the following text:

It is sometimes useful to have objects that are guaranteed not to move, in the sense that their placement in memory does not change, and can thus be relied upon.

So I'm a bit confused about your statement.