Safety of the slice lifetime extending

I'm trying to wrap my head around rules for references. Here's a simplified example of code I've run into in my own project:

fn main() {
    // We have some owned value - in this example, `Vec<u8>`.
    let mut v = b"Hello bytes world!".to_vec();
    // Now, here's a slice of data backed by the Vec.
    // It can't be 'static, obviously.
    let refer: &[u8] = &v;
    // Then, we want to run some processing on this slice.
    // Not that this function does not return anything, so, after it returns,
    // `refer` is not used anymore.
    process_slice(refer);
    // And here we want to mutate the backing Vec. Is it safe?
    // On the one hand, all shared references to it are gone.
    // On the other hand, there was a `'static [u8]` pointing to it -
    // can the compiler assume from its mere existence, 
    // that the corresponding memory is forever immutable?
    v[0] = b'Y';
    println!("{}", String::from_utf8(v).unwrap());
}

fn process_slice(refer: &[u8]) {
    // Here, we extend the lifetime of existing reference,
    // since `accepts_static` requires 'static for some reason.
    let static_refer = unsafe { std::slice::from_raw_parts(refer.as_ptr(), refer.len()) };
    let remade = String::from_utf8(accepts_static(static_refer)).unwrap();
    // Use the return value, so that the call to `accepts_static`
    // cannot be optimised away.
    println!("{}", remade);
}

// This is just an example of function requiring 'static slice.
// In real world, there are some other borrow-checking issues
// forcing me to use some signature like this, but the gist is the same:
// function accepts 'static slice, but its return value is fully owned,
// so the slice is not really borrowed after `accepts_static` returns.
fn accepts_static(data: &'static [u8]) -> Vec<u8> {
    data.to_vec()
}

(Playground)

The question is in the comments: is this code safe? Miri doesn't say anything, but we know that this is not a guarantee.

AFAIK this should be safe since livetime annotations don't affect the result of compilation and you are not retaining the slice longer than its actual lifetime.

I would question the API though! If accepts_static does not need access to the passed reference for longer than the duration of its call, why does it take a 'static in the first place? You say there are some other borrow checking issues. What kind of issues?

Even if these "issues" really can't be solved why not use the unsafe operation inside of accepts_static and keep having its signature reflect its true behavior? That way, it is e.g. easier to not forget that accepts_static does guarantee not accessing its argument after the fact, and also you don't need to invoke unsafe on every single call to it that depends on that guarantee.


Edit: Or are the borrow-checking issues that require the 'static outside of the accepts_static function, at the use-site? I even in this case a more general accepts_static, not needing 'static should be usable in every place where one with the 'static requirement can be used,

after all, 'static is a valid life-time to be substituted for 'a in

accepts_static_or_nonstatic<'a>(&'a [u8]) -> Vec<u8> { /* ... */ }

Even a T: 'static bound is met by the type T == (fn(&[u8]) -> Vec<u8>). You must be doing some pretty crazy stuff here, please give us a hint at what’s actually going on here "in real world".

1 Like

Well, it's a little complicated, but I'll try to make the problem clear enough. Please feel free to ask if something doesn't make sense.

I'm trying to make some kind of "type-erased generic callbacks", with the following logic:

  • slice is deserialized into some type In,
  • user-provided function converts In to Out,
  • Out is serialized to bytes.

The first attempt, without (de)serialization, only with conversion logic, was here. Of course, it's possible to do this all based on Box<dyn Fn> (or some other smart pointer), but I'd like to explore other possibilities.

First of all, here's a sketch of the working code with similar functionality:

use serde::de::Deserialize;
use serde::Serialize;
use serde_json::from_slice;
use serde_json::to_vec;

// This structure is provided by library user,
// types In and Out are also chosen by them.
struct Action<In, Out>(fn(In) -> Out);

impl<In, Out> Action<In, Out> {
    // Action is a three-step process:
    fn act<'a>(&self, data: &'a [u8]) -> Vec<u8>
    where In: Deserialize<'a>, Out: Serialize
    {
        // Deserializing data from slice...
        let data: In = from_slice(data).unwrap();
        // ...processing it...
        let data: Out = (self.0)(data);
        // ...serializing result to vector
        let data: Vec<u8> = to_vec(&data).unwrap();
        data
    }
}

fn str_identity(s: &str) -> &str {
    s
}

fn main() {
    let mut v = b"\"Hello bytes world!\"".to_vec();
    let action = Action(str_identity);
    let processed = action.act(&v);
    println!("{}", String::from_utf8(processed).unwrap());
    v[1] = b'Y';
    println!("{}", String::from_utf8(v).unwrap());
}

Playground

However, in real code I have to make Action non-generic, in other words, to make its type independent of In and Out, so that these actions can be stored in the same way (for example, in the same Vec). Here's the attempt I've done for now:

use serde::de::Deserialize;
use serde::Serialize;
use serde_json::from_slice;
use serde_json::to_vec;

struct Action {
    wrapper: for<'a> fn(&'a Self, &'a [u8]) -> Vec<u8>,
    // This pointer is, in fact, the function pointer of kind `fn(T) -> U`.
    action: *const (),
}

impl Action {
    fn new<'a, In: Deserialize<'a>, Out: Serialize>(action: fn(In) -> Out) -> Action {
        Action {
            wrapper: Action::wrapper::<In, Out>,
            action: action as *const (),
        }
    }
    
    // I can't here accept `data: &'a [u8]`, since this will conflict with
    // setting the `wrapper` field in `new`
    fn wrapper<'a, In: Deserialize<'a>, Out: Serialize>(this: &Self, data: &[u8]) -> Vec<u8> {
        // commenting out this line triggers error
        let data: &'a [u8] = unsafe { std::slice::from_raw_parts(data.as_ptr(), data.len()) };
        let data: In = from_slice(data).unwrap();
        // Safety: `this.action` can only be cast from `fn(In) -> Out`
        let action: fn(In) -> Out = unsafe { std::mem::transmute(this.action) };
        let data: Out = action(data);
        let data: Vec<u8> = to_vec(&data).unwrap();
        data
    }

    fn act<'a>(&self, data: &'a [u8]) -> Vec<u8> {
        (self.wrapper)(self, data)
    }
}

fn str_identity(s: &str) -> &str {
    s
}

fn main() {
    let mut v = b"\"Hello bytes world!\"".to_vec();
    let action = Action::new(str_identity);
    let processed = action.act(&v);
    println!("{}", String::from_utf8(processed).unwrap());
    v[1] = b'Y';
    println!("{}", String::from_utf8(v).unwrap());
}

Playground

The line I'm asking for in this topic is the first one inside wrapper:

let data: &'a [u8] = unsafe { std::slice::from_raw_parts(data.as_ptr(), data.len()) };

The problem is, as far as I can tell, is the following:

  • When creating function pointer to fill the wrapper field, I have to set the fixed type.
  • However, in different calls to wrapper I will deserialize provided slices into possibly different types, because they will have different lifetime parameters - bound to the lifetime of the passed-in slice.

If I understand correctly, this might be possible to do with GATs (by making In the generic associated type of some trait and binding the trait's lifetime to the slice), but, again, I'm exploring the stable features for now. Is there any more clear workaround?

Here's a program that uses your Action to dereference a dangling pointer in another thread (playground):

#[derive(Deserialize)]
struct Text<'a> {
    data: &'a str,
}

fn print_text(t: Text<'static>) { // print_text requires `Text<'static>`, which implements `Deserialize<'static>`
    use std::thread;
    thread::spawn(move || {
        thread::sleep_ms(500);
        println!("{}", t.data);
    });
}

fn main() {
    let action = Action::new(print_text); // 'a is inferred to be 'static here
    {
        let v = b"{ \"data\": \"Hello bytes world!\" }".to_vec();
        let _processed = action.act(&v);  // but here it's shorter than 'static
    } // v is dropped here
    std::thread::sleep_ms(1000); // wait for the other thread to do its thing
}

The multithreading isn't really important to the lifetime issue here; thread::spawn was just the first thing that sprang to mind that you can call in a fn and requires 'static.

2 Likes

Aha. Thanks, I see the problem, will think it through.

I think this is sound:

struct Action<'a> {
    wrapper: fn(&Self, &'a [u8]) -> Vec<u8>,
    // This pointer is, in fact, the function pointer of kind `fn(T) -> U`.
    action: *const (),
}

impl<'a> Action<'a> {
    fn new<In: Deserialize<'a>, Out: Serialize>(action: fn(In) -> Out) -> Self {
        // making wrapper a closure -- although it doesn't capture anything and can therefore be
        // coerced to a function pointer -- still helps with safety because it lets us reuse the
        // definitions of `In`, `Out` and `'a` that are already in scope.
        let wrapper = |this: &Self, data: &'a [u8]| {
            let data: In = from_slice(data).unwrap();
            // Safety: `this.action` can only be cast from `fn(In) -> Out`
            let action: fn(In) -> Out = unsafe { std::mem::transmute(this.action) };
            let data: Out = action(data);
            let data: Vec<u8> = to_vec(&data).unwrap();
            data
        };
        Action {
            wrapper,
            action: action as *const (),
        }
    }

    fn act(&self, data: &'a [u8]) -> Vec<u8> {
        (self.wrapper)(self, data)
    }
}

At first glance you might think that putting 'a as a parameter on Action will make it impossible to call act() with a short-lived buffer. But that's not quite right. You can still write things like this:

    let action = Action::new(str_identity);
    let v = b"\"Hello bytes world!\"".to_vec();
    let _processed = action.act(&v);
    let processed = {
        let w = b"\"Hello again world!\"".to_vec();
        action.act(&w)
    };
    println!("{}", String::from_utf8(processed).unwrap());

But moving let _processed to the end will cause an error (​`w` does not live long enough).

I think this might be the best you can do without creating unsoundness, requiring In and Out be 'static, or needing GATs.

1 Like