Returning the slice and the actual data itself from the function

Hi, I recently started looking at rust and I was experimenting with the borrow checker to see how it works. When I create a function like:

fn create_string() -> &str {
    let s = String::from("test");
    let slice = &s[0..5];

    slice
}

This code obviously buggy, as the variable s is deleted at the function scope. But variables if returned from the function should not be dropped afaik. So if I return the original variable s as well This is what I did.

fn create_string() -> (String, &str) {
    let s = String::from("test");
    let slice = &s[0..5];

    (s, slice)
}

In here I am returning a slice which points to the data that is created on the local scope, which would normally get freed. But I also return the local variable s from the function scope which holds the original data as well. So when this function gets called, the variable s should not be freed when the scope ends, but returned to the caller. This seemed to me like it would work but it doesn't. The code compiles with lifetime specifier error and since I am new to rust I can't really tell how to solve it.

I looked up in the internet but didn't see any examples that return both the owner and the slice to the string itself.

No, but they get moved. When a value is moved, every existing reference to that value becomes invalidated.

5 Likes

Is there a way of telling rust that the above code is valid? Because if the underlying pointer doesn't get freed this is something only on the rust semantic level that creates the error. Technically the slice is valid as long as the original data is valid.

Not possible with safe rust because the borrow checker doesn't know that there is a memory allocation. it just sees that you have a reference to a object and then you move the object, which would normally invalidate the ptr.
There are a couple of crates that try to do self referential structs (which this basically is). i haven't tried them.

2 Likes

Does the unsafe way of doing things introduce any overhead? I am looking at the source code of the links you have provided but I don't have enough rust knowledge to understand what they are exactly doing.

Also how would I do this with unsafe rust? Without a 3rd party library?

There are two approaches to doing it unsafely. You could just do it easy and very risky by returning a pointer. That would be no overhead. But i think you should think about your API more before doing it that way. You could just return the Range (0..5) along the String and then create the &str reference after the function.
Those libraries try to be safe (provide a safe API) and i don't think there is any overhead, but i'm not sure. i don't really know them, just that they exist. They use macros that create helper structs to "explain" what is happening to the borrow checker, so it is pretty complex.

1 Like

Even the relatively battle-tested crates that are dedicated to self-referencial structs have had/still have soundness issues, so the prudent answer is "don't".

References are a poor fit for the job. Return something like range instead, perhaps wrapped in a newtype that can hand you back the slice ergonomically, perhaps even maintain an invariant that it's within the allocation, etc.

5 Likes

Yeah it seems like returning an integer range would be the best option in this case. Thanks for the help. I will keep this thread open just in case someone shares a different approach.

1 Like

It's an unfortunate limitation. The safe, possibly zero-overhead solution to this is to split the API and structs into data-loading and data-referencing parts:

let s = create_string();
let slice = get_string_slice(&s);

Once the owning data is kept somewhere, you can pass down immutable references to both the full data and the parts.

2 Likes

The other thing you can do here is use a crate like Bytes — Rust library // Lib.rs to share the ownership between both the full string and the substrings.

1 Like

I have been trying some other ways, and I tried this as well and it works, but if the create_string has a complicated way of getting the ranges than you have to split that into a function, and call it on get_string_slice and rely on common subexpression elimination. It's a bit cumbersome.