The title is a bit misleading, please bear with me.
Is something of the following form possible with for example Pin?
fn f() -> (Vec<u8>, &[u8]) {
let s = receive_data();
let important_data = &s[3..5];
(s, important_data)
}
This would obviously be possible by returning the start and end indices of important_data instead of a slice.
However, returning a slice comes with some ergonomic benefits.
I presume that this should be sound since the memory region backing the Vec shouldn't be moved when returning it from the function. Therefore any pointers into this region should still be valid?
You can also use crates like owning_ref or rental, but I don't think that this is worth it. If you are tempted to use thise crates, you should really rethink your data structures, as it points to some more pervasive problems with your data structures.
Don't use Pin, it is incredibly subtle and will require unsafe to do anything useful with it.
This is incorrect, there is nothing in the lifetimes that links the slice to the Vec. The caller could drop the Vec before the slice, making the slice invalid.
Why do you think I should rethink my data structure design?
The point of this is to avoid copying the returned slice into a new Vec (removing an unnecessary allocation). It's meant to be used in this way:
fn g() {
let _, data = f();
do_something_with_data(data)
}
The use case would be reading a file or from the network, parsing/tokenizing some data from it and never copying information out of the original allocation where it can be avoided.
The actual underlying Vec<u8> isn't meant to be used for anything other than storage and being parsed in f().
The alternative implementation I can think of would collect the slice into a new Vec and return it, but this of course copies data:
fn f_prime() {
let s = receive_data();
s[3..5].collect()
}
This becomes costly when f_prime() does some actual parsing, and returns a Vec of Vec<u8> instead of just a single Vec<u8>.
This is essentially the same as what @Hyeonu suggested. I think what I don't like about the solution is that the caller has to worry about the data structure created and used by f(). This may be justified if f were capable of handling different types of data structures but it's always just a Vec.
I don't think it's viable solution in non-GC language to not make caller to worry about the data structure created and used by f(). Because you want to return a referenc(&[u8]), someone else should "own" the backing value and storage during its lifetime. It can be caller, or caller of the caller, but not some non-local system that automatically collects non-reachable resources.
I am pretty sure the code is sound (blame me if I am wrong) and it does not need to Pin the owning structure. Obviously, as others said, it is necessary to have an owning object around, in this case it is the Partial structure.
EDIT: @RustyYato highlighted that a decreasing range is valid, therefore it is necessary to check range.start against range.end, not against len.
Yes, the caller has to clean up the Vec, however the difference is that I don't want to explicitly construct the backing Vec in the caller. The callee should construct the Vec and return it, handing over management of its lifetime to the caller.
This works great in this case, but I feel it doesn't generalizes well:
Say I have an f() that's supposed to return a single &[u8] and an h() returning a Vec<&[u8]>.
Do I have to manually write a type such as Partial and emulate the behavior of the returned type for every potentially returned structure?
I think this is what a solution with the rental crate might look like, however I can't get it to compile. It seems the crate hasn't been updated for a long time.
use rental::rental;
rental! {
mod my_rentals {
use super::*;
#[rental]
pub struct MyRental {
data: Vec<u8>,
refs: Vec<&[u8]>
}
}
}
fn receive_data() -> Vec<u8> {
vec![1,2,3,4]
}
fn f() -> my_rentals::MyRental {
let x = receive_data();
my_rentals::MyRental::try_new(
x,
vec![&x[0..2], &x[1..3]]
)
}
Note that this solution won't compile anyway, since x is first moved and then borrowed, but the compilation fails earlier to that, since rental's macro doesn't appear to be functioning:
Unfortunately I don't think owning_ref is capable of what I want.
I thought returning just one slice could ease the complexity of the question while obtaining the same answers, but there seems to be a different quality to what I'm asking.
I'd like a general solution (i.e. a crate or a method with little manual labor) to return a 2 values where one can hold references to the other.
However, the function moves the Vec upon returning it, thus any references to it become invalid.
While the references to the data contained inside the Vec should still be valid, since the backing memory region containing the actual elements isn't moved when the Vec is moved, the lifetime of the Vec and the lifetime of its backing array aren't tracked independently.
To me, it looks like you are asking for a "general" way of expressing a very specific situation. For instance, you were writing about a situation "specific for a Vec", but with the example you made you are now talking about a vector of slices. What level of genericness do you want? If you are trying to solve your problem for "anything that can own data", I think that it could be impossible to write.
Just a silly question: why are you trying to create this kind of "abstraction over ownership and borrowing"? Even if you find a generic way of expressing your intent, I am not sure that hiding the reference to a slice behind a custom struct will help with a more-than-hundreds-LoC codebase. Maybe if you show us your practical needs, we can try to help you a little more.
Sure, I agree, at some point lifetime checks may become undecidable.
Honestly, there is not as much practical need as there is curiosity.
I'd like to find out the limits of what is efficiently checkable using Rust's lifetimes, and at what point we need a more sophisticated prover to ensure a programs' correct lifetimes.
I'm not entirely sure about what level of genericness I want, but lets try the following definition:
I want to return a struct of 2 values: a data value and another value of arbitrary type that contains references to the data. Concretely, I don't want the compiler to verify cyclic references, nor any recursive references within the struct. However, the references and data value should be able to have any internal structure they want. (But I'm not asking the compiler to verify this part).
Potentially, this illustration might help:
In the image on the top right example you can see the references being a binary tree of references. I'm not asking the compiler to check lifetimes of the binary tree, rather this is to demonstrate that the references could also be of a different structure than a Vec.
As you can see I have some difficulty expressing my ideas - if you are able to reformulate them in a better way, please do!