Pattern: how to reuse a `Vec<&str>` across loop iterations?

duck_tape · June 28, 2021, 4:03pm

In a previous question an answer was partially found: Review of unsafe usage. I've reduced the problem a bit more and really feel like there is a pattern for doing this that I'm missing.

I don't want to re-allocate the vec each loop. The answer in the link above is good, but really constrains things like propagating errors and storing the cache on a struct etc.

Is there a simpler way?

use std::{
    error::Error,
    io::{BufRead, BufReader},
};

use regex::Regex;

const DATA: &[u8] = b"
This is a line
this is a second line
";

// Does not compile
// How do I reuse `cache` in a loop and tell the compiler that it is in fact safe
// because I've cleared all references to buffer?
fn main() -> Result<(), Box<dyn Error>> {
    let r = Regex::new(r"\s+").unwrap();
    let mut reader = BufReader::new(DATA);
    let mut buffer = String::new();

    let mut cache: Vec<&str> = vec![];
    loop {
        if reader.read_line(&mut buffer)? == 0 {
            break;
        }

        // Drop the newlin
        buffer.pop();

        for value in r.split(&buffer) {
            cache.push(value);
        }

        // Do stuff with cache and clear it
        for value in cache.drain(..) {
            println!("{}", value);
        }

        buffer.clear();
    }

    Ok(())
}

(Playground)

Yandros · June 28, 2021, 4:32pm

Hmm, yeah, one of the limitation of the lifetimes in the type system is that it is almost impossible to make Rust understand that an empty collection such as Option<…'lt> or Vec<…'lt> does not depend on 'lt.

Here is an ad-hoc solution:

-   let mut cache: Vec<&str> = vec![];
+   let mut cached_capacity: Vec<&str> = vec![];
    loop {
+       let mut cache: Vec<&str> = cached_capacity;
        if reader.read_line(&mut buffer)? == 0 {
            break;
        }

        // Drop the newlin
        buffer.pop();

        for value in r.split(&buffer) {
            cache.push(value);
        }

        // Do stuff with cache and clear it
        for value in cache.drain(..) {
            println!("{}", value);
        }
+       debug_assert!(cache.is_empty()); // guaranteed by `.drain(..)`
+       cached_capacity = unsafe {
+           // Safety: layout of types which only differ in lifetimes is the same
+           ::core::mem::transmute(cache)
+       };

        buffer.clear();
    }

alice · June 28, 2021, 4:36pm

The following utility function could be used for that:

fn reset_vector<'a, 'b, T>(mut vec: Vec<&'a T>) -> Vec<&'b T> {
    let cap = vec.capacity();
    let ptr = vec.as_mut_ptr();
    
    std::mem::forget(vec);
    
    unsafe {
        Vec::from_raw_parts(ptr.cast(), 0, cap)
    }
}

SkiFire13 · June 28, 2021, 4:37pm

Here's a weird hack that doesn't use unsafe but instead relies on inplace collection:

use std::{
    error::Error,
    io::{BufRead, BufReader},
};

use regex::Regex;

const DATA: &[u8] = b"
This is a line
this is a second line
";

fn reuse_vec<T, U>(mut v: Vec<T>) -> Vec<U> {
    assert_eq!(std::mem::size_of::<T>(), std::mem::size_of::<U>());
    assert_eq!(std::mem::align_of::<T>(), std::mem::align_of::<U>());
    v.clear();
    v.into_iter().map(|_| unreachable!()).collect()
}

// Does not compile
// How do I reuse `cache` in a loop and tell the compiler that it is in fact safe
// because I've cleared all references to buffer?
fn main() -> Result<(), Box<dyn Error>> {
    let r = Regex::new(r"\s+").unwrap();
    let mut reader = BufReader::new(DATA);
    let mut buffer = String::new();

    let mut outer_cache: Vec<&'static str> = vec![];
    loop {
        let mut cache = reuse_vec(outer_cache);
        
        if reader.read_line(&mut buffer)? == 0 {
            break;
        }

        // Drop the newlin
        buffer.pop();

        for value in r.split(&buffer) {
            cache.push(value);
        }

        // Do stuff with cache and clear it
        for value in cache.drain(..) {
            println!("{}", value);
        }
        outer_cache = reuse_vec(cache);

        buffer.clear();
    }

    Ok(())
}

Edit: here you can see how it optimizes when T and U are references to strs Compiler Explorer

duck_tape · June 28, 2021, 4:52pm

Thanks all for the help!

Here's what I ended up with after adding back in a bit of the real life complexity. I don't love that it involves a second pass over cache to reuse values but that's not too awful.

use std::{
    error::Error,
    io::{BufRead, BufReader},
};

use regex::Regex;

const DATA: &[u8] = b"
This is a line
this is a second line
";

fn reuse_vec<T, U>(mut v: Vec<T>) -> Vec<U> {
    assert_eq!(std::mem::size_of::<T>(), std::mem::size_of::<U>());
    assert_eq!(std::mem::align_of::<T>(), std::mem::align_of::<U>());
    v.clear();
    v.into_iter().map(|_| unreachable!()).collect()
}

fn reuse_cache<T, U>(cache: Vec<Vec<T>>) -> Vec<Vec<U>> {
    assert_eq!(std::mem::size_of::<T>(), std::mem::size_of::<U>());
    assert_eq!(std::mem::align_of::<T>(), std::mem::align_of::<U>());
    cache.into_iter().map(|c| reuse_vec(c)).collect()
}

fn main() -> Result<(), Box<dyn Error>> {
    let r = Regex::new(r"\s+").unwrap();
    let mut reader = BufReader::new(DATA);
    let mut buffer = String::new();

    // In real code I have a vec of vecs, this is contrived just to work for the example
    let mut outer_cache: Vec<Vec<&'static str>> = vec![vec![]; 100];
    loop {
        let mut cache: Vec<Vec<&str>> = reuse_cache(outer_cache);
        if reader.read_line(&mut buffer)? == 0 {
            break;
        }

        // Drop the newline
        buffer.pop();

        for (i, value) in r.split(&buffer).enumerate() {
            if let Some(inner) = cache.get_mut(i) {
                inner.push(value)
            } else {
                panic!("out of bounds")
            }
        }

        // Do stuff with cache and clear it
        for inner in cache.iter_mut() {
            for value in inner.drain(..) {
                println!("{:?}", value);
            }
        }
        outer_cache = reuse_cache(cache);

        buffer.clear();
    }

    Ok(())
}

duck_tape · June 28, 2021, 4:55pm

So this actually works really nicely when cache gets more complex. Based on my reading of transmute and your safety comment, it seems like transmuting a Vec<Vec<&str>> is safe as well because it's really only lifetimes we are changing.

nbaraz · June 28, 2021, 5:29pm

I tried wrapping this behaviour in a struct, but failed (I couldn't resolve the lifetime issues): Playground

Is this approach possible?

steffahn · June 28, 2021, 5:39pm

Similar to my answer in the other thread, you can use variance and remove the reuse_cache call in the line

let mut cache: Vec<Vec<&str>> = reuse_cache(outer_cache);

instead writing just

let mut cache = outer_cache;

duck_tape · June 28, 2021, 5:40pm

now that I've seen how all of this comes together your original answer makes even more sense.

steffahn · June 28, 2021, 5:59pm

I would probably replace these with compile-time checks. ~~E.g. using static_assertions::{assert_eq_align, assert_eq_size}.~~

SkiFire13 · June 28, 2021, 6:37pm

I don't think those work with generic types

steffahn · June 28, 2021, 7:06pm

Good point. Unfortunately there seems to be no clean way around this yet. Only thing I could come up with would be triggering some error: any use of this value will cause an error and error: erroneous constant encountered, but only with cargo build, not cargo check, which isn’t nice at all.

mbrubeck · June 28, 2021, 7:26pm

There are some library crates with implementations of this pattern:

and a proposal to add it to the standard library:

https://github.com/rust-lang/rfcs/pull/2802

system · September 26, 2021, 7:26pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Reuse a `Vec<T>'s` allocation beyond the scope of its content help	4	1917	January 23, 2022
Is my Vec<u8> line-splitter-cache module safe? code review	3	550	June 29, 2021
How to cache a vector's capacity? help	60	1963	August 29, 2023
How to completely overwrite an Option mut Vec? help	6	701	February 6, 2022
Vector lifetime rust help	13	179	November 6, 2024

Pattern: how to reuse a `Vec<&str>` across loop iterations?

Related topics