How to cache a vector's capacity?

Here is a situation using this: Rust Playground

The problem is: I use that vector as a buffer, clean it up after use. But rustc doesn't know the vector is empty when moving it somewhere, it requires me to give a lifetime annotation, but it is actually unnecessary. Is that a crate has achieved this, or how to achieve it(I know this must need unsafe code)? Thanks.

You only need the Vec inside print_short_name, so declare it inside that function and remove the field.

    fn print_short_name(&self, times: usize) {
        let mut cache_vec: Vec<&str> = vec![];
        // we needs this vector as a buffer to temporarily 
        // store `&str`s, and removes them at the end.
        self.node.find_short_in(&mut cache_vec);
        for _ in 0..times {
            for string in cache_vec.iter() {
                println!("{}", string);
            }
        }
    }
2 Likes

But if I call this function very frequently, it may cause many allocate operations. I want to do some optimization here. sorry I forgot to say this before.

You are not going to be able to put short-lived references in a long-lived type. As far as the type system is concerned, that would create a dangling reference.

Just supply the cache as a separate argument and let the compiler reason about local variables separately.

4 Likes

You can also define an Iterator for Node that can yield the values one-by-one, so you don't need to allocate anything.

1 Like

Yes, but the workaround is the same to @jendrikw 's: if the main(assume it is not a main but a common function) be called frequently, the frequent allocation appears again.

In real case, the struct like ShortNamePrinter is deeply nested, giving a mutable reference of a buffer vector will greatly increase the complexity of the api. To do so, I have to add a &mut Vec<...> in each function's arguments. I hope there is a way to keep its allocation in memory.

As I commented in the code, calling function find_short_in is time-consuming. To simplify the example, I wrote finding strings that is short. find_short_in is called as many times as the iterators are traversed. I need a vector to decrease the overhead.

Disregarding of whether this is a good practice or not, I wonder if this would work:

fn reuse_vec<'a, 'b, T: ?Sized>(mut vec: Vec<&'a T>) -> Vec<&'b T> {
    vec.clear();
    // SAFETY: `vec` is an empty `Vec` and transmutation only changes lifetime.
    unsafe {
        std::mem::transmute::<Vec<&'a T>, Vec<&'b T>>(vec)
    }
}

fn main() {
    let mut reused = {
        let hello = "Hello World".to_string();
        let v: Vec<&str> = vec![&hello];
        println!("v = {v:?}");
        reuse_vec(v)
    };
    {
        let again = "Hi Again".to_string();
        reused.push(&again);
        println!("reused = {reused:?}");
    }
}

(Playground)

Output:

v = ["Hello World"]
reused = ["Hi Again"]


Or even more daring:

Edit: The example below is likely unsound see this response below.

use std::alloc::Layout;

fn reuse_vec<T, U>(mut vec: Vec<T>) -> Vec<U> {
    vec.clear();
    // Can the following assertion be checked at compile-time?
    assert_eq!(Layout::new::<T>(), Layout::new::<U>());
    // SAFETY: `vec` is an empty `Vec` and the
    // elements' memory layout doesn't change.
    unsafe {
        std::mem::transmute::<Vec<T>, Vec<U>>(vec)
    }
}

fn main() {
    let mut reused: Vec<&str> = { // note we need a type annotation here!
        /* … */
    };
    /* … */
}

(Playground)

This second example will fail to detect bad conversions at compile-time and throw a runtime error, which isn't nice. You'll see that if you remove the type annotation in the first line of main.

1 Like

Please don't transmute stuff behind indirection – this includes pointers and pointer-containing collections. If you need to "cheat" with the lifetimes, do it at the point where it is required. (Your transmute-wrapping fn lacking unsafe also makes it unsound, btw.)

The above example could be better implemented by storing a Vec<&'static str> and then using it for storing temporary elements, then clearing it. Needless to say, it's also dangerous and bad practice.

2 Likes

You don't need any unsafe code to reuse the buffer. There's a feature in the standard library that can be exploited for this purpose: Vec iteration is specialized so that if you consume a Vec and produce a Vec, the allocation is reused if possible, even if the type is different.

Thus, you can change the lifetime to 'a by a dummy map(), as long as the len() of the vector is 0 when you do it:

struct ShortNamePrinter {
    cache_vec: Vec<&'static str>,
    node: Node,
}

...

let mut results = mem::take(&mut self.cache_vec)
    .into_iter()
    .map(|_| -> &str { unreachable!() })
    .collect();

Then put it back in the cache the same way:

results.clear();
self.cache_vec = results
    .into_iter()
    .map(|_| -> &'static str { unreachable!() })
    .collect();

In this playground program I've modified the original code to use this technique. Debug prints show that the allocation is reused after it is created the first time:

[src/main.rs:28] self.cache_vec.as_ptr() = 0x0000000000000008
[src/main.rs:28] self.cache_vec.as_ptr() = 0x000055d9d4837a60
[src/main.rs:28] self.cache_vec.as_ptr() = 0x000055d9d4837a60
[src/main.rs:28] self.cache_vec.as_ptr() = 0x000055d9d4837a60
[src/main.rs:28] self.cache_vec.as_ptr() = 0x000055d9d4837a60
13 Likes

An amazing workaround! But will the capacity be kept?

The capacity is just a number that describes the allocation. If the allocation is kept, as it is, then the capacity must be kept.

1 Like

Yes, that is "reusing the allocation".

Thanks

:melting_face: I know why this works but it still feel like abuse dark magic.

I think the magician is rustc optimizer, it's hard to look into the optimizer. :smile:

Thanks for explaining. But if the generic type T Vec<T> has is opaque, we don't know if T is a reference or not, how to deal with it?
I modified Vec<&'static str> to Vec<u8>, the debug info said the pointer changed to 0x1, which declares the allocation is not reused.

Thanks, your workaround inspired me. I made a new workaround and passed Miri: Rust Playground

What does "behind indirection" mean? Because the elements of the Vec are behind a pointer (that points to the heap)?

I can't see why my first example should be unsound. Afterall, it doesn't change types but only lifetimes and these are existing only during compile-time (to reject programs).

What do you mean? Do the transmutation of each element in the Vec? That sounds more dangerous to me.

What exactly is unsound and why?

If that example you gave works, then it's certainly favorable to anything that uses unsafe, but I feel like it's not very obvious that .into_iter().map(…).collect() preserves the allocation. (I believe it does, but it's still a bit mysterious to me.)

I feel like a lifetime transmutation (apart from being unsafe) is more clear in regard to what happens (but happy if I'm proven wrong or if someone can explain to me what's unsound about my code example).

Relying on the optimizer, in contrast, seems to be more unreliable to me.

I didn't say it was unsound. But transmuting pointers is dangerous, because if transmute<T, U> is valid, then it does not imply that transmute<&T, &U> or transmute<*T, *U> is valid, or vice versa.

Yes.

reuse_vec is unsound, because it is not unsafe, but the annotated lifetimes are unconstrained, so you can use this function to create dangling pointers in safe code.