How to cache a vector's capacity?

Fancyflame · May 27, 2023, 9:38am

Here is a situation using this: Rust Playground

The problem is: I use that vector as a buffer, clean it up after use. But rustc doesn't know the vector is empty when moving it somewhere, it requires me to give a lifetime annotation, but it is actually unnecessary. Is that a crate has achieved this, or how to achieve it(I know this must need unsafe code)? Thanks.

jendrikw · May 27, 2023, 9:44am

You only need the Vec inside print_short_name, so declare it inside that function and remove the field.

    fn print_short_name(&self, times: usize) {
        let mut cache_vec: Vec<&str> = vec![];
        // we needs this vector as a buffer to temporarily 
        // store `&str`s, and removes them at the end.
        self.node.find_short_in(&mut cache_vec);
        for _ in 0..times {
            for string in cache_vec.iter() {
                println!("{}", string);
            }
        }
    }

Fancyflame · May 27, 2023, 10:26am

But if I call this function very frequently, it may cause many allocate operations. I want to do some optimization here. sorry I forgot to say this before.

H2CO3 · May 27, 2023, 10:37am

You are not going to be able to put short-lived references in a long-lived type. As far as the type system is concerned, that would create a dangling reference.

Just supply the cache as a separate argument and let the compiler reason about local variables separately.

jendrikw · May 27, 2023, 10:47am

You can also define an Iterator for Node that can yield the values one-by-one, so you don't need to allocate anything.

Fancyflame · May 27, 2023, 11:06am

Yes, but the workaround is the same to @jendrikw 's: if the main(assume it is not a main but a common function) be called frequently, the frequent allocation appears again.

In real case, the struct like ShortNamePrinter is deeply nested, giving a mutable reference of a buffer vector will greatly increase the complexity of the api. To do so, I have to add a &mut Vec<...> in each function's arguments. I hope there is a way to keep its allocation in memory.

Fancyflame · May 27, 2023, 11:13am

As I commented in the code, calling function find_short_in is time-consuming. To simplify the example, I wrote finding strings that is short. find_short_in is called as many times as the iterators are traversed. I need a vector to decrease the overhead.

jbe · May 27, 2023, 1:51pm

Disregarding of whether this is a good practice or not, I wonder if this would work:

fn reuse_vec<'a, 'b, T: ?Sized>(mut vec: Vec<&'a T>) -> Vec<&'b T> {
    vec.clear();
    // SAFETY: `vec` is an empty `Vec` and transmutation only changes lifetime.
    unsafe {
        std::mem::transmute::<Vec<&'a T>, Vec<&'b T>>(vec)
    }
}

fn main() {
    let mut reused = {
        let hello = "Hello World".to_string();
        let v: Vec<&str> = vec![&hello];
        println!("v = {v:?}");
        reuse_vec(v)
    };
    {
        let again = "Hi Again".to_string();
        reused.push(&again);
        println!("reused = {reused:?}");
    }
}

(Playground)

Output:

v = ["Hello World"]
reused = ["Hi Again"]

Or even more daring:

Edit: The example below is likely unsound see this response below.

use std::alloc::Layout;

fn reuse_vec<T, U>(mut vec: Vec<T>) -> Vec<U> {
    vec.clear();
    // Can the following assertion be checked at compile-time?
    assert_eq!(Layout::new::<T>(), Layout::new::<U>());
    // SAFETY: `vec` is an empty `Vec` and the
    // elements' memory layout doesn't change.
    unsafe {
        std::mem::transmute::<Vec<T>, Vec<U>>(vec)
    }
}

fn main() {
    let mut reused: Vec<&str> = { // note we need a type annotation here!
        /* … */
    };
    /* … */
}

(Playground)

This second example will fail to detect bad conversions at compile-time and throw a runtime error, which isn't nice. You'll see that if you remove the type annotation in the first line of main.

H2CO3 · May 27, 2023, 3:18pm

Please don't transmute stuff behind indirection – this includes pointers and pointer-containing collections. If you need to "cheat" with the lifetimes, do it at the point where it is required. (Your transmute-wrapping fn lacking unsafe also makes it unsound, btw.)

The above example could be better implemented by storing a Vec<&'static str> and then using it for storing temporary elements, then clearing it. Needless to say, it's also dangerous and bad practice.

kpreid · May 27, 2023, 3:35pm

You don't need any unsafe code to reuse the buffer. There's a feature in the standard library that can be exploited for this purpose: Vec iteration is specialized so that if you consume a Vec and produce a Vec, the allocation is reused if possible, even if the type is different.

Thus, you can change the lifetime to 'a by a dummy map(), as long as the len() of the vector is 0 when you do it:

struct ShortNamePrinter {
    cache_vec: Vec<&'static str>,
    node: Node,
}

...

let mut results = mem::take(&mut self.cache_vec)
    .into_iter()
    .map(|_| -> &str { unreachable!() })
    .collect();

Then put it back in the cache the same way:

results.clear();
self.cache_vec = results
    .into_iter()
    .map(|_| -> &'static str { unreachable!() })
    .collect();

In this playground program I've modified the original code to use this technique. Debug prints show that the allocation is reused after it is created the first time:

[src/main.rs:28] self.cache_vec.as_ptr() = 0x0000000000000008
[src/main.rs:28] self.cache_vec.as_ptr() = 0x000055d9d4837a60
[src/main.rs:28] self.cache_vec.as_ptr() = 0x000055d9d4837a60
[src/main.rs:28] self.cache_vec.as_ptr() = 0x000055d9d4837a60
[src/main.rs:28] self.cache_vec.as_ptr() = 0x000055d9d4837a60

Fancyflame · May 27, 2023, 3:41pm

An amazing workaround! But will the capacity be kept?

kpreid · May 27, 2023, 3:48pm

The capacity is just a number that describes the allocation. If the allocation is kept, as it is, then the capacity must be kept.

H2CO3 · May 27, 2023, 3:48pm

Yes, that is "reusing the allocation".

Fancyflame · May 27, 2023, 3:55pm

Thanks

zirconium-n · May 27, 2023, 3:59pm

I know why this works but it still feel like ~~abuse~~ dark magic.

Fancyflame · May 27, 2023, 4:03pm

I think the magician is rustc optimizer, it's hard to look into the optimizer.

Fancyflame · May 27, 2023, 4:09pm

Thanks for explaining. But if the generic type T Vec<T> has is opaque, we don't know if T is a reference or not, how to deal with it?
I modified Vec<&'static str> to Vec<u8>, the debug info said the pointer changed to 0x1, which declares the allocation is not reused.

Fancyflame · May 27, 2023, 4:11pm

Thanks, your workaround inspired me. I made a new workaround and passed Miri: Rust Playground

jbe · May 27, 2023, 4:12pm

What does "behind indirection" mean? Because the elements of the Vec are behind a pointer (that points to the heap)?

I can't see why my first example should be unsound. Afterall, it doesn't change types but only lifetimes and these are existing only during compile-time (to reject programs).

What do you mean? Do the transmutation of each element in the Vec? That sounds more dangerous to me.

What exactly is unsound and why?

If that example you gave works, then it's certainly favorable to anything that uses unsafe, but I feel like it's not very obvious that .into_iter().map(…).collect() preserves the allocation. (I believe it does, but it's still a bit mysterious to me.)

I feel like a lifetime transmutation (apart from being unsafe) is more clear in regard to what happens (but happy if I'm proven wrong or if someone can explain to me what's unsound about my code example).

Relying on the optimizer, in contrast, seems to be more unreliable to me.

H2CO3 · May 27, 2023, 4:15pm

I didn't say it was unsound. But transmuting pointers is dangerous, because if transmute<T, U> is valid, then it does not imply that transmute<&T, &U> or transmute<*T, *U> is valid, or vice versa.

Yes.

reuse_vec is unsound, because it is not unsafe, but the annotated lifetimes are unconstrained, so you can use this function to create dangling pointers in safe code.

Topic		Replies	Views
Pattern: how to reuse a `Vec<&str>` across loop iterations? help	13	2884	September 26, 2021
Another lifetime problem help	6	220	June 11, 2024
What is the best way to cache method return value in struct field? help	8	2699	July 7, 2021
What's the most efficient way to make a "small" vector?	7	1743	January 12, 2023
How to completely overwrite an Option mut Vec? help	6	701	February 6, 2022

How to cache a vector's capacity?

Related topics