TrashPool to retain temporaries past the end of a scope

Hello Everyone,

This is my implementation for TrashPool, which is an object into which you can shove ownership of any temporary objects you want, and hang onto a reference, and they will live for the life of the TrashPool. It is conceptually along the same vein as ObjectiveC's (and Swift's and I'm sure other languages') AutoReleasePool.

I made it to specifically address the ugliness here: Help improving ergonomics

The unsafe coercion effectively destroys any knowledge of the internal references, which I believe is ok in this case because nothing can leave the TrashPool. It's my first foray into unsafe code though, so I may have missed something important. I did notice that dyn object references were 128 bits under the hood... so I'm basically relying on the first 64 bits of that being an ordinary reference. I figure, as long as the code that is putting the ownership into the TrashPool is safe, the right thing will happen... But I might be missing something big.

If the concept is sound, improvements could be made to improve allocation efficiency, etc.

Anyway, here's the implementation:

impl<T: ?Sized> Trashable for T {}
pub trait Trashable {}

pub struct TrashPool<'a> {
    trash_vec : Vec<Box<dyn Trashable + 'a>>
}

impl <'a>TrashPool<'a> {
    pub fn new() -> TrashPool<'a> {
        TrashPool{trash_vec : vec![]}
    }
    pub fn dump<T:Trashable + 'a>(&'_ mut self, input : T) -> &'a T {

        self.trash_vec.push(Box::new(input));
        let trash_ref = &**self.trash_vec.last().unwrap();

        unsafe { &*(trash_ref as *const dyn Trashable as *const T) }
    }   
}

And here is an example usage:

use std::cell::{RefCell};

pub struct Container {
    data : RefCell<RefCell<Vec<u32>>>
}

impl Container {
    pub fn borrow_iterator<'b, 'a : 'b>(&'a self, trash_pool : &'b mut TrashPool<'a>) -> impl Iterator<Item=&'b u32> {
        let inner_cell_ref = trash_pool.dump(self.data.borrow());
        let vec_ref = trash_pool.dump(inner_cell_ref.borrow());
        vec_ref.iter()
    }
}

fn main() {
    let container = Container{data : RefCell::new(RefCell::new(vec![1u32; 2]))};

    let mut trash_pool = TrashPool::new();
    let iter = container.borrow_iterator(&mut trash_pool);
    
    for value in iter {
        println!("{}", value);
    }
}

Maybe I'm heavily under the influence of the Ikea effect (IKEA effect - Wikipedia), but I feel like this is a reasonably nice solution to a whole class of problems that have been plaguing my use of Rust from the get-go. I understand it will always be better to structure your functions not to require a trash pool, but at least all the ugliness is contained in situations where there is no choice.

Thanks everyone for all your help and hand-holding over the past few days.

2 Likes

Sorry, this is unsound. The <'a> lifetime can be chosen by the caller, so if I choose it to be static, I can transmute lifetimes to 'static.

fn main() {
    let mut pool: TrashPool<'static> = TrashPool::new();
    
    let s = "abc".to_string();
    let s = pool.dump(s);
    
    drop(pool);
    
    // printing deallocated memory
    println!("{}", s);
}

playground

3 Likes

If we can fix that soundness issue, the idea is nice! It's like a mini multi-type arena allocator with a nice concise implementation.


We can't rely on the fact that the first 64 bytes of a dyn Trait pointer are the pointer, but we can definitely rely on the fact that there is a pointer in there, and that *const dyn Trait as *const ConcreteType extracts it.

2 Likes

I don't understand why you would need unsafe or interior mutability, etc. for this. If you want a bunch of objects to survive together, and hand out references, then why don't you just put them in a standard collection type that supports getting its values by reference (pretty much any sane collection should allow this)?

let vec = vec![1, 2, 3]; // 1, 2, and 3 survive too as long as `vec` survives

// hand out references to the objects
let ref_1 = &vec[0];
let ref_3 = &vec[2];

(And if you need simultaneous mutable access to several elements, that can be wrapped too, with clever use of <[T]>::split_at_mut() and the like.)

1 Like

Damn! Thanks a lot for looking at it.

I could bound the lifetimes like this:

impl <'b, 'a : 'b>TrashPool<'a> {
    pub fn dump<T:Trashable + 'a>(&'b mut self, input : T) -> &'b T {

but then I can't take a reference to something that's already been dumped, do something with the reference, and then dump the next intermediate that gets created with the new lifetime, into the same TrashPool. With this change, it really isn't any more useful than just a vanilla collection. (Aside from automatic type indifference because of the coercion, but there are already plenty of ways to handle that)

One possible fix is to use a separate lifetime for each dumped object, but the syntax gets insane quickly, and then it's not much better than requiring multiple separate TrashBag<T>s like I did in the straw-man case of this thread: (Help improving ergonomics)

What would be awesome is if the language had a capability for dynamically created lifetimes (still at compile time). Something along the lines of

<'mylifetime(x), 'a : 'mylifetime(all)>
&'mylifetime(x) MyType

where 'mylifetime(x) could refer to a single lifetime in a set of lifetimes where new lifetimes are created as needed, and 'mylifetime(all) could be the most bounded of all lifetimes in that set. Essentially 'mylifetime(0) : 'mylifetime(1), 'mylifetime(1) : 'mylifetime(2), ... , 'mylifetime(n) : 'mylifetime(all)

Perhaps it would be possible to do this with macros. Also this would fall apart if the calling code tried to dump anything inside a loop. Is there a better way?

Thanks everyone.

This feels a lot like the StringPool I wrote to help work with ASTs which borrow from the source text it came from (e.g. so you don't make unnecessary copies when storing identifiers).

The problem I had is that you'll dynamically load source code from disk and then parse them into memory. My load_from_disk() function needed a way to load text and then "extend" the lifetime of the loaded text so we can return the parsed data without running into use-after-move (e.g. the returned value references a local variable) or self-referencing structs.

Here's the StringPool:

use std::{cell::RefCell, collections::HashSet};

/// A simple string pool.
///
/// The typical use case for a [`StringPool`] is when dealing with values that
/// borrow from their original text instead of making their own copies.
///
/// By placing the source text into the string pool and deferring its cleanup
/// until the [`StringPool`] is destroyed, you can avoid annoying lifetime
/// issues or self-referential structs.
#[derive(Debug, Default, Clone, PartialEq)]
pub struct StringPool(RefCell<HashSet<Box<str>>>);

impl StringPool {
    pub fn empty() -> Self { StringPool::default() }

    /// Adds the text to the string pool, returning a reference which will live
    // as long as the [`StringPool`] itself.
    pub fn intern<'pool>(&'pool self, text: &str) -> &'pool str {
        let mut pool = self.0.borrow_mut();

        let interned_string: &str = match pool.get(text) {
            Some(existing_value) => &existing_value,
            _ => {
                let boxed_copy: Box<str> = text.into();
                pool.insert(boxed_copy);
                &pool.get(text).unwrap()
            },
        };

        // SAFETY: by construction, it is safe to expand the string's
        // lifetime to that of the StringPool.
        //
        // While the Box may move around when our hash set gets resized, the
        // bytes making up the string will stay in the same place somewhere
        // on the heap.
        //
        // Additionally, once a string is added to the pool it can never be
        // removed.
        //
        // This means any &'pool pointers returned from this function will
        // be valid until the StringPool is dropped.
        unsafe {
            return std::mem::transmute(interned_string);
        }
    }
}

And how it gets used:

use crate::{StringPool, TestCase};
use anyhow::{Context, Error};
use glob::Pattern;
use std::path::{Path, PathBuf};

pub fn load_from_disk<'s>(
    test_root: &Path,
    string_pool: &'s StringPool,
) -> Result<Vec<TestCase<'s>>, Error> {
    log::debug!("Loading test fixtures from \"{}\"", test_root.display());

    let mut test_cases = Vec::new();
    let candidate_pattern = Pattern::new("*.input.ftl")?;

    for entry in test_root.read_dir()? {
        let entry = entry?;
        let path = entry.path();

        if candidate_pattern.matches_path(&path) {
            let tc = load_test_case(path, test_root, string_pool)?;
            test_cases.push(tc);
        }
    }

    Ok(test_cases)
}

fn load_test_case<'s>(
    input_file: PathBuf,
    test_root: &Path,
    string_pool: &'s StringPool,
) -> Result<TestCase<'s>, Error> {
    let name = file_name(&input_file).ok_or_else(|| {
        Error::msg(format!(
            "Unable to get the filename for \"{}\"",
            input_file.display()
        ))
    })?;

    let input = std::fs::read_to_string(&input_file).with_context(|| {
        format!("Unable to read \"{}\"", input_file.display())
    })?;
    let input = string_pool.intern(&input);

    let fixture = fluent_syntax::parser::parse(input).unwrap();
    let output_file_name = test_root.join(&name).with_extension("output.ftl");

    let expected_output = std::fs::read_to_string(&output_file_name)
        .with_context(|| {
            format!("Unable to read \"{}\"", input_file.display())
        })?;

    Ok(TestCase {
        input_file,
        name,
        fixture,
        expected_output,
    })
}

fn file_name(path: &Path) -> Option<String> {
    let stem = path.file_stem()?.to_str()?;

    // we only want the text up to the first dot
    let first_bit = match stem.find(".") {
        Some(ix) => &stem[..ix],
        None => stem,
    };

    Some(first_bit.to_string())
}
3 Likes

If you want multiple dumped items, I guess you would have to use interior mutability and take self by &self.

2 Likes

I feel like I'm missing something, but I don't really see how that would help? I mean, two different lifetimes which cover the same or similar code area don't behave differently from using the same lifetime twice (or calling the same function with two different lifetimes).

Going off of this suggestion, what about something like this? It uses UnsafeCell for interior mutability since I believe that an be made fully sound, but it could also be done more safely with a RefCell:

use std::cell::UnsafeCell;

impl<T: ?Sized> Trashable for T {}
pub trait Trashable {}

#[derive(Default)]
pub struct TrashPool<'a> {
    trash_vec: UnsafeCell<Vec<Box<dyn Trashable + 'a>>>,
}

impl<'a> TrashPool<'a> {
    pub fn new() -> TrashPool<'a> {
        Self::default()
    }
    pub fn dump<'b, T>(&'b self, input: T) -> &'b T
    where
        T: Trashable + 'a,
    {
        let boxed = Box::new(input) as Box<dyn Trashable + 'a>;
        
        // SAFTEY: this is the only function where we access trash_vec, and it
        // cannot be called recursively as there are no calls into user code from
        // this point on.
        let trash = unsafe { &mut *self.trash_vec.get() };
        trash.push(boxed);
        let trash_ref: &dyn Trashable = &**trash.last().unwrap();

        // SAFETY: we know this is a T as we just stuck it in the vec.
        // we know we can extend the lifetime from (within this method) to
        // 'b because we never remove items from the trash_vec, so the only
        // time it's dropped is when we are dropped.
        unsafe { &*(trash_ref as *const _ as *const T) }
    }
}
1 Like

I think it is ok, but it could not be done with a RefCell. That said, I think you will find that its lifetimes does not allow it to be used like you intended it.

1 Like

I think there might be a problem in the code I posted: inside dump, we get a mutable reference to self.trash_vec while immutable references to internal elements exist. I think the mutable reference to the Vec covers all the elements inside, so this is UB?

If this is indeed a problem we could fix it by using Box::into_raw, storing *mut pointers inside the Vec instead, and manually implementing Drop to drop them, I think...

First off I am very appreciative for all the help and advice.

Sadly, using @daboross's version, (regardless of possible internal UB) calling it with this:

pub fn borrow_iterator<'b, 'a : 'b>(&'a self, trash_pool : &'b TrashPool<'a>) -> impl Iterator<Item=&'b u32> {
    let inner_cell_ref = trash_pool.dump(self.data.borrow());
    let vec_ref = trash_pool.dump(inner_cell_ref.borrow());
    vec_ref.iter()
}

I get the same error:

143 |         let vec_ref = trash_pool.dump(inner_cell_ref.borrow());
    |                                  ^^^^ ...but data from `trash_pool` flows into `self` here

I'm still trying to understand how exactly using an &str instead of a generic is working for @Michael-F-Bryan, but I believe the crux is that his pool never takes ownership of the object. I made something work with &T as an argument to dump(), but that left me back where I started with a borrow from a temporary needed in order to call dump() in the first place. :frowning:

The Vec can grow while its elements are borrowed. I don't think this data structure can be made sound without reserving capacity up-front and disallowing the Vec from growing beyond that capacity.

1 Like

Isn't that a non-problem when you box individual elements and only give out references to the data inside each box? I'm pretty sure that's why they aren't directly stored in the Vec.

1 Like

Yes, you're right! I wasn't thinking through it clearly. The boxing appeared to be used only for dynamic dispatch, to store heterogeneous types. But it also acts as indirection for the elements.

Odd. Actually, this works too with an unmodified version of @daboross's function prototype:

let temp_str = trash_pool.dump("1234567890");
let substring = trash_pool.dump(&temp_str[2..5]);
println!("{}", substring);

I wonder if that's because the compiler is being extra clever and copying something so we don't have a dependent reference... Because doing it with a Ref is no good. Or if the fact that we have a 'static lifetime on the "1234567890" &str is what's making everything ok.

In your earlier function, I feel like the lifetime bounds are requiring something that you don't actually need to require. I always get confused about what they mean, so I don't actually know what's wrong, but taking out the relationship between 'a and 'b seems to allow it to compile:

(playground: Rust Playground)

Edit: cancel that, seems I misread the playground output or something... Looked like it had compiled successfully but I think I must not have actually recompiled it.

I think the 'static probably helps here. This similar example with an owned string fails to compile:

let trash = TrashPool::default();
let s: &str = trash.dump(String::from("hi!"));
let s2: &&str = trash.dump(s);
error[E0597]: `trash` does not live long enough
  --> src/main.rs:52:19
   |
52 |     let s: &str = trash.dump(String::from("hi!"));
   |                   ^^^^^ borrowed value does not live long enough
53 |     let s2: &&str = trash.dump(s);
54 | }
   | -
   | |
   | `trash` dropped here while still borrowed

This is incorrect.

The reason my StringPool works and is sound is because I've actually got two levels of indirection. First there's a HashSet as the top-level container, but because things can move when the HashSet resizes I need to have the string itself allocated on the heap as a Box<str>.

That means when you add more items and the set resizes, you're just shuffling a bunch of pointers around and it's sound to take a reference to the string on the heap and extend its lifetime to that of the StringPool.

1 Like

I see. I failed to appreciate that let boxed_copy: Box<str> = text.into(); was indeed making a copy. Even though you plainly wrote it in the variable's name. :man_facepalming:

You wrote a very cool bit of code for solving the use case you outlined of avoiding tons of copies of the same string being created. :smiley: Thanks for sharing it!

In my use case, some of the objects I want to hand off are actually Ref<T>s and other reference-containing types so they can't be made self-contained within the pool. Objects inside the pool will always need to reference things outside the pool, so the returned ref needs to be bounded by the lifetime of the input in addition to being bounded by the lifetime of the pool.

I was able to generalize your code to a generic <T>, but only if T had a 'static lifetime. As soon as T's lifetime was bounded, I couldn't get the addition of dependent objects to work anymore.

For my use case to be possible, I think I need to bound the lifetime of an individual object in the pool, without imputing the same lifetime boundary to the lifetime of the whole pool. I don't think Rust's lifetime syntax has a way to do this.

I'm trying some experiments with Weak<T> because that essentially punts the ownership checking to runtime.

Thanks again for all your help, @Michael-F-Bryan, @alice & @daboross!

Dumb question:

Is there a way to say 'lifetime_c is bounded by the shorter of 'lifetime_a and 'lifetime_b without saying anything about the relative bounds of 'lifetime_a or 'lifetime_b? I haven't seen anything like this in the documentation or other people's code, but perhaps I'm not using the right search terms or looking in the right place.

Thanks!