Validity of memory area after `std::mem::forget`

Hello everybody,

I am wondering how "safe" std::mem::forget is, when it comes to relying on the forgotten structure being still valid ("still there") after the call.

The documentation on std::mem::forget says on values used as arguments:

Any resources the value manages, such as heap memory or a file handle, will linger forever in an unreachable state. However, it does not guarantee that pointers to this memory will remain valid.

I also thought about using Box::leak, but if I understand it correctly this will result in a copy of my structure, which I am trying to prevent.

Why I want to use it

I want to provide the structure to a device and will clean up resources later.

Definitely won't be safe. As the value went away, rustc is at full rights to put whatever other data there.

If the structure is already in a Box, leak will not make a copy: it will destroy the box and return a ’static reference to the memory it was managing.

1 Like

forget() moves the value, so its address becomes invalid. It is like drop() that destroys the immediate value, only without running its destructors. It's tricky to use it safely, which is why it's been replaced with specific methods like ManuallyDrop, Box::leak, *::into_raw.

Box::leak doesn't copy anything. Box::new does.

Calling std::mem::forget is just an ordinary function call that moves the value somewhere else (into the function). The actual forgetting happens once its been moved somewhere you don't control, and rules regarding where it was stored before are governed by the same rules as all other moves.

Thanks for the quick replies.

No, the value is not boxed. The value is coming from something like:

fn do_something<T> (data: T) {
    // Expose address to device and
    // leak memory here.
}

I don't wanna restrict the interface to using Box<T>. But might consider it, if I don't find another solution.

I just looked into ManuallyDrop. Thanks for the hint. I think I will go for this one, as it also doesn't automatically dereference T if T is &T like mem::forget does.

Edit for the solution:

fn do_something<T> (data: T) {
    // Expose address of data here to device

    // After this call, data will NOT be cleaned up.
    // Memory must be cleaned up manually.
    ManuallyDrop::new(data);
}

This should work fine even when Tis of type &T.

It sounds like you're looking for a way not just to avoid calling destructors, but to "freeze" the memory used for data indefinitely so that the device (whatever it is) can use the address after do_something returns. You cannot do this, neither with mem::forget nor ManuallyDrop. When do_something returns, references to its local variables cannot be dereferenced without causing undefined behavior. (That includes anything you could do to manually clean it up.)

The only option you have, if you want the memory address to remain valid so your device can use it / you can clean it up after do_something returns, is to put it somewhere that isn't on the stack -- for example, in a Box.

3 Likes

You could also put it in a static.

1 Like

Yes, you are so right, I only tested it on a toy example, with a main and a single function, where the variable from the main of course stayed valid till the end. A more complex example was, as you said, undefined behaviour.

To understand my use-case better, I will describe it below.

Use-case

Description
I am implementing an object (some sort of queue, with controlling functionality), which will provide an interface for device drivers. The interface should provide the drivers with a function that takes an object and provides the device with the memory address of the object. After the device processed the object, the object can be passed back to the driver as a Vec<u8> or mut u8, I am still not sure here.

Based on the propositions I think I will split the interface into two functions:

// This function needs the driver to take care of memory.
// I need something which takes raw pointers, as some drivers 
// are getting memory from somewhere else.
fn do_semthing_with_raw<T: IntoU8Slice>(data: *mut T) {...}

fn do_something<T:IntoU8Slice>(data: Box<T>) {
   // leak memory and work with raw from here on
   let raw_ptr = Box::into_raw(data);
   
   // ... do the rest
}

Problems with this solution

  • How to exclude T = &T?

If you see any other problems or have any suggestion, I am happy to hear them.
But first of all, thank you for the help.

Current Solution

In order to keep the structures in memory, the structure needs to be kept on the heap.

Solution

Box the value and leak the box via Box::leak() or via Box::into_raw().

You can add a T: 'static bound to forbid any type that holds references with limited lifetimes:

fn do_something<T: IntoU8Slice + 'static>(data: Box<T>)
1 Like

What you've done here is actually just reinvent forget:

So if forget was the wrong choice for this use, then this is wrong too.

EDIT: PR up so the compiler will tell you that itself: https://github.com/rust-lang/rust/pull/75912

2 Likes

While thinking about this I wondered if objects are moved on the stack.

Example:
An owned, non copy object is passed from function to function.
Does the object stay at the same Stack Memory address all the time and the functions „only“ get a reference to that location?

Can anyone explain me how this is handled and if it is rust specific or generalizable?

The thing that describes how functions pass arguments and return values to each other is called an ABI (not to be confused with an API). ABIs are specific to a platform (compiler + operating system + target architecture), so there's no universal answer.

In general it's often both. Objects that are "large enough" may be stored in the caller's stack space and passed by reference internally, whereas "small" ones might be copied back and forth between caller and callee or simply stored in registers. You cannot rely on the address of anything being the same across function calls. Even if the address does happen to be the same, I don't think you can usefully (and correctly) use that information.

The above applies to any compiled language, like Rust, C, C++. Bytecode-compiled languages like Python, Perl, anything on the JVM also follow the same general principle but they don't generally have platform-specific variations because the bytecode is platform independent.

Rust generally expects objects to be relocatable: that they’ll continue to work properly if they are bit-for-bit moved to another location. When you change the ownership of an object by passing it as a parameter, returning it from a function, or storing it in a struct field, it gets copied into that location and the compiler treats its old location as a chunk of uninitialized memory. The only difference that Copy makes is that this last step of invalidating the old memory is skipped.

This strategy does have drawbacks. For example, an object can never hold a pointer to a location inside itself— if it gets moved, then the pointer is no longer referring to the right place.

These problems caused some trouble with the design for async, and this is where Pin came from: it guarantees that none of the operations that might move an object around in memory can ever be done to the pinned value, so it can safely use non-relocatable patterns.

2 Likes

Thank you both. This answered all my questions.

Summarizing I could say:

  • Passing owned objects around is possibly much more expensive than using references
  • Memory addresses of objects are only constant when stored on the heap and leaked
    • Leaving out growing and shrinking datastructures like Vec, which probably must be handled with extra caution, when mutated.

This is a reasonable first approximation of the situation, but there’s a little bit of nuance that you may be missing. This may be more detail than you need/are interested in, but I want to make sure you understand that your takeaways are more guidelines than hard-and-fast rules:

The cost has much more to do with the stack size than owned vs borrowed: an & reference is small (1 or 2 words), but the corresponding Box is the same size. Similarly, it’s possible to have a struct with a lifetime annotation (which means it’s a borrowed firm of something else) that is large and unweildy.

That’s one situation where an object’s memory location is constant, but not the only one. As long as an & reference to an object exists, it’s also guaranteed not to move, for example.

When you have an &mut reference you can swap two values using mem::swap.

1 Like

You’re right, of course. I’ve removed the &mut from my post; it also provides some guarantees re: movement, but they’re more complicated than simply ‘it doesn’t’.