What lifetime annotation should be used if the returned value refers to a coder-managed object created in that function?

From the e-book I see the following comment:

If the reference returned does not refer to one of the arguments, the only other possibility is that it refers to a value created within this function, which would be a dangling reference since the value will go out of scope at the end of the function.

However, there ARE possibilities that the value will NOT go out of scope at the end of the function, because, for example, I've used mem::forget() or Box::into_raw() to let Rust give up the management of some objects.

Now, if a reference to those objects should be returned, how can I fix the compiler error of "expected lifetime parameter"? I don't know what lifetime annotation should I put in the function signature.

An illustration:

struct Point{
    x:i32,
    y:i32
}

fn f() -> &Point{
    let b = Box::new(Point{x:1,y:2});
    let pRef : &Point = Box::into_unique(b).as_ref();
    Box::into_raw(b);
    pRef
}

You can tell that to the Rust compiler using the 'static lifetime and a bit of unsafe code.

The latter is needed here because you are making a statement that the value will live forever after that, which the compiler cannot check, and which can have horrifying consequences if you end up breaking this contract. Here is an example of how to do it manually:

struct Point {
    x:i32,
    y:i32
}

fn f() -> &'static Point {
    let b = Box::new(Point{x:1,y:2});
    let ptr = Box::into_raw(b);
    unsafe { &*ptr }
}

In the near future (1.26?), you will be able to use the safe Box::leak() function, which is currently unstable, to shorten this code and avoid using unsafe code yourself:

fn f() -> &'static Point {
    let b = Box::new(Point{x:1,y:2});
    Box::leak(b)
}

Of course, I should point out that purposely leaking memory like this can blow up your RAM consumption all the way to a system or application crash, and should be done with extreme care.

2 Likes

Thanks a lot!

I’m curious about the use case for this. If I see a fn returning a static reference, I wouldn’t assume it’s also leaking the value. I’d return the Box itself or at least a raw ptr to signal the caller that this isn’t an “ordinary” static reference.

I'm actually writing JNI. The native function returns JByteBuffer. It is created by new_direct_byte_buffer(). And the parameter to the function (a mut slice) is created within the function (the owner of the slice is created within the function itself).

1 Like

Usually you use unbound lifetimes for that sort of FFI stuff. You shouldn't really use 'static in this case unless you can avoid it because it's deliberately misleading the caller. The memory actually belongs to the JVM and its lifetime should really be bound to it.

To indicate this, you can either use unbounded lifetimes or write the function as fn<'jvm>(data: *mut u8, len: usize, _: &'jvm JVM) -> &'jvm mut [u8]. That lets you bind the lifetime of the slice to the JVM, even if you don't actually need the JVM for anything.

2 Likes

Basically what @Michael-F-Bryan said. NewDirectByteBuffer is a tricky API - are you handing off the BB to Java code afterwards? If so, you lose tracking of when the BB is no longer in use, unless you have a clear contract in the code. If the JVM lives for the entire duration of your program, this will leak memory. I don’t recall if there’s a way to associate a deleter with the DBB so the JVM can call it when the DBB is dead in Java land.

It might be easier (if this works with your scenario) to let Java code allocate the DBB off the C heap (ie ByteBuffer.allocateDirect()) and give you a pointer to it instead.

I'm writing a Rust key-value store. I'll use Java client to set and get. When I'm coding the "get()" part, I need to return a JByteBuffer from a Vec< u8 >, say v. So basically I want to return env.new_direct_byte_buffer(v.as_mut_slice()).unwrap(), where env is the first parameter of the native function.

Maybe I should use new_byte_array() instead? Is it less tricky?

I at first chose DBB because when I was implementing "set()", there is this get_direct_buffer_address to let me get the address of the array. But now I realize that to implement "get()" I don't need that feature.

Besides, I don't really understand why this would leak memory, or why Rust will need to implement a deleter (if it can).

This is what I thought: new_direct_byte_buffer() will receive a parameter of slice (a kind of reference) from Rust side, and use the content in it to create a DBB object in the JVM. Then, the code execution goes back to Java side, while the DBB already resides in Java's off-heap memory. Then Java side will totally manage this DBB, so Rust side needn't bother any more.

new_direct_byte_buffer() is effectively borrowing your allocation; it takes your allocation and wraps it with a DirectByteBuffer Java type. This DBB is then exposed to Java code, which may hold on to it for an arbitrary amount of time. When the DBB is no longer in use on the Java side, GC detects it. For DBB's backed by C heap data (i.e. ByteBuffer.allocateDirect(...)), there's a "cleaner" associated with the DBB that runs as part of finalization (that's changing in a future Java version but that's immaterial to the point here). This cleaner will call free() on the allocation, and the native allocator reclaims the allocation.

In your case, you're attaching a "foreign" allocation to the DBB - Java cannot call free() on it because (a) it doesn't own this memory and (b) it doesn't know which allocator is backing the allocation. So when the DBB is garbage collected, the native memory is not reclaimed.

new_byte_array() should be easier because that's a JVM managed byte[] and it'll reclaim it as part of normal Java heap GC.

2 Likes

Thanks a lot. Your explanation makes this whole process much clearer to me.