Leak memory for garbage collection

I am trying to build a simple garbage collected VM in rust. Right now I am trying to figure out how to actually get memory to leak. Rust is so good a ownership I can't seem to get around it. In this example I am trying to make a hashmap of pointers to strings. anytime a string is updated I want to leak the old value (so that it wont break any code that has a reference to it) and put in the new value. Now anything that tries to access that string will get the new value and the old one can be garbage collected once no one is using it. This is what I have so far (I know it is a mess).

use std::collections::HashMap;
use std::pin::Pin;
    
fn main() {
    let mut map: HashMap<u64, Pin<Box<Pin<Box<String>>>>> = HashMap::default();
    
    map.insert(0, Box::pin(Box::pin("zero".to_owned())));
    map.insert(1, Box::pin(Box::pin("one".to_owned())));
    map.insert(2, Box::pin(Box::pin("two".to_owned())));
    
    let mut boxed_value = map.get_mut(&0).unwrap().as_mut().get_mut(); // This addr should be constant
    println!("box addr = {:p}", boxed_value);
    
    let string_inside: *mut String = boxed_value.as_mut().get_mut();
    println!("addr before = {:p}", string_inside);
    println!("value before = {}", unsafe{&*string_inside});

    // std::mem::forget(*boxed_value); // borrow checker won't let me move into forget
    
    std::mem::replace(&mut *boxed_value, Box::pin("zilch".to_owned()));
    
    println!("addr after = {:p}", string_inside);
    println!("value after = {}", unsafe{&*string_inside});
}

The problem with this code is that drop is being called on the old string "zero" (so the value after is garbage). Whenever I try and use Box::leak or mem::forget it tells me that it can't move into those functions. How can I force a leak in rust?

Note that you can't simulatenously keep the address of the new and the old value constant. You can replace the old value with a new one, but either you allocate a new box for the new value, or you find a new place for the old value. You can't store two objects at the same address. So there's something definitely wrong with your expectations: you can't allow already-existing references to the place refer to the old string, while expecting newly-created references to the same place to refer to the new value.

If you want pointers to the old value to be valid after the operation, you'll have to go with the first option, i.e. allocate a new place for the new object.

The second option is possible too, but based on your description, it's probably not what you want.

1 Like

You don't need Pin for this, or Box, even. String is already a pointer (it points to a str), so you can just do something like this:

use std::collections::HashMap;
use std::mem;

fn main() {
    let mut map: HashMap<u64, String> = HashMap::default();

    map.insert(0, "zero".to_owned());
    map.insert(1, "one".to_owned());
    map.insert(2, "two".to_owned());

    let boxed_value: &mut String = map.get_mut(&0).unwrap(); // This addr should be constant
    println!("box addr = {:p}", boxed_value);

    let string_inside: *mut str = boxed_value.as_mut_str();
    println!("addr before = {:p}", string_inside);
    println!("value before = {}", unsafe { &*string_inside });

    mem::forget(mem::replace(&mut *boxed_value, "zilch".to_owned()));
    //   ^^^^^^ forget the String that was taken out of the map

    println!("addr after = {:p}", string_inside);
    println!("value after = {}", unsafe { &*string_inside });
}

The part that prevents the String from being dropped and invalidating string_inside is calling mem::forget on the value returned from mem::replace -- that, after all, is the String that owns *string_inside, so that's what needs to be forgotten. You can't forget it before mem::replace because it's still part of the map.

Pin is mostly useful for exposing a safe API when you're doing unsafe internally. It doesn't help you with the unsafe part.

2 Likes

Most of the time Pin comes up, it’s to point out things it doesn’t do, so it can be hard to get a coherent understanding of what it’s actually for. I think I’ve got my head wrapped around it now, but I could still have something wrong:

The purpose of Pin is only to allow an object to obtain a stable pointer to itself; if you need a stable pointer to some other object, you can get it from &, &mut, or Box.

Pin<impl Deref<Target = T>> is a contract between T and its owner that it will never be moved again (which is enforced by the compiler). Crucially, any type that doesn’t care about its own memory address implements Unpin which completely opts out of this contract— If your code needs the location of an Unpin type to not change, you must ensure it yourself by holding one of the reference types above.

As it’s generally not a breaking change for types to get new trait implementations, you should treat all types outside your control as potentially Unpin, even if they don’t implement it right now— it could be added in a point release of even a post-1.0.0 crate, which could trigger UB in any code that assumed otherwise.

1 Like

The way I like to see it is as a tool to transfer promises through unknown code.

      unknown user code
       Λ             \
      /               V
library 1           library 2

Both libraries have unsafe code, but can only communicate through unknown user code that might do arbitrary weird things. How can library 1 promise library 2 that an object will never again move? The unknown user code probably doesn't have an unsafe block, so the unsafe code in the two libraries must be correct regardless of what crazy things the user code does.

The Pin type allows you to transfer such a promise by having library 1 give a pinned pointer the user code, which then passes it on to library 2. Since a pinned pointer to a non-Unpin type is opaque without the use of unsafe code, the user code can't do anything to the value behind it.

Some example of libraries: Library 1 might be the implementation of Box::pin, tokio::pin!, or the inners of tokio::spawn. The most common example of library 2 is the auto-generated code inside an async function, but also includes custom streams (example) or futures.

2 Likes

This all makes sense, but it seems to be a weaker-than-expected promise: Library 1 can't rely at all on a non-Unpin type in Library 2 remaining non-Unpin in the future. As long as L2 doesn't implement Unpin, it knows that it can rely on nothing surprising happening. If it ever does implement Unpin (because the implementation doesn't care about the address anymore, for example), that unknown user code now has permission to do whatever it wants.

This isn't necessarily a bad thing, but it's worth noting that the guarantee really only goes one way: the owning library can guarantee that it never attempts to move the object, but can't itself rely on downstream code not doing anything unexpected.

Sure, Pin restricts what library 1 can do, and relaxes what library 2 can do. Ultimately all promises boil down to restricting what can happen in one place, so that more things can happen somewhere else.

Implementing Unpin means that the type does not reference itself, so the "surprising" things that can happen are limited to simple mutation (which you could also do with interior mutability, or projection).

If Library 1 only uses the part of the Pin interface that does not require Unpin, it will continue to work exactly as before when Library 2 implements Unpin. Library 1 can't be "surprised" by a type unexpectedly being Unpin.

1 Like

Note that this "simple mutation" includes things like the user code calling mem::swap on the value.

True, but again, Unpin types cannot be self-referential, so mem::swap in that case is a true swap: you don't get stuff like a referencing b and b referencing a like you could if you weren't using Pin. (If you could, it wouldn't be sound to implement Unpin.)

Another way to think about it: If library 1 makes the promise that it wont move an Unpin type, then library 1 can still rely on that promise, because there's no unknown user code in the way of the promise. After all, a mem::swap doesn't invalidate pointers to the value. You don't need Pin to promise yourself not to move something.

Another thing: Pin allows more than self-referential types. It also allows you to build linked lists on the stack, which are in some sense not self-referential. Tokio's broadcast channel does this.

3 Likes

Unpin isn't an unsafe trait, so it should be OK for L2 to implement if it removes all other unsafe code from the library. If it's a container, though, L1 might be incorrectly relying on it being !Unpin to store a back pointer, making an overall self-referential struct that could be broken by mem::swap.


Edit: Or is it always unsafe to manually implement an auto trait?

If this container you talk about stores the value directly inside itself, then the container is L2, not L1. If the container stores it behind a Box, then sure, it's L1, but you can always promise yourself not to move something, so the container doesn't need Pin to be safe.

Edit: In some sense, if it is stored directly inside itself, it occupies a dual role of both L1 and L2. Someone has to promise the container that it doesn't move, but the container can forward that promise to its field (this is called a pin projection). The stream example I linked earlier does this.

A call to mem::swap does not invalidate pointers to that value. In fact, I just remembered that Pin::set exists, which makes this fact even clearer.

No, it's safe to do so.

1 Like

You don't need Pin for this, or Box , even. String is already a pointer (it points to a str ), so you can just do something like this:

What will happen when the hashmap has to be resized? If there is not room to grow in it's current location it will be reallocated somewhere else (If I understand correctly). This means that the pointer will become invalid. Is that not true?

Resizing the HashMap doesn't reallocate the Strings. The Strings will be moved, but a String is just a pointer (plus length and capacity). Its backing buffer (the str) remains in the same place as long as the String itself doesn't need to reallocate. (This means you have to avoid exposing &mut String to in the public API, since that would permit resizing the String, which would invalidate string_inside.)

If I understand your implementation, you are not taking a pointer to the buffer (the str in this case) you are taking a pointer to the String (which is a struct of length, capacity, and pointer to buffer). While the buffer should never move, the String allocation will move. So we can't hold a pointer to String and assume it's valid. Am I misunderstanding this?

In my example, boxed_value: &mut String is a reference to the String inside the HashMap, which, after the mem::replace, now contains "zilch".

On the other hand, string_inside: *mut str refers to the str, which has a static address, and therefore still contains "zero" even after boxed_value is replaced.

Neither of these pointers is invalidated in the example. boxed_value remains valid, although its contents have been replaced with a different String. If you tried to make the HashMap reallocate, boxed_value would become invalid, but Rust's normal borrowing rules would prevent you from using it. string_inside remains valid because its referent has been leaked by calling mem::forget.

If you need a String with a static address, you could use Box<String>. I can't think of any reason to double up the Box, though. More pointers, more problems.

2 Likes