Pattern: how to reuse a `Vec<&str>` across loop iterations?

In a previous question an answer was partially found: Review of unsafe usage. I've reduced the problem a bit more and really feel like there is a pattern for doing this that I'm missing.

I don't want to re-allocate the vec each loop. The answer in the link above is good, but really constrains things like propagating errors and storing the cache on a struct etc.

Is there a simpler way?

use std::{
    error::Error,
    io::{BufRead, BufReader},
};

use regex::Regex;

const DATA: &[u8] = b"
This is a line
this is a second line
";

// Does not compile
// How do I reuse `cache` in a loop and tell the compiler that it is in fact safe
// because I've cleared all references to buffer?
fn main() -> Result<(), Box<dyn Error>> {
    let r = Regex::new(r"\s+").unwrap();
    let mut reader = BufReader::new(DATA);
    let mut buffer = String::new();

    let mut cache: Vec<&str> = vec![];
    loop {
        if reader.read_line(&mut buffer)? == 0 {
            break;
        }

        // Drop the newlin
        buffer.pop();

        for value in r.split(&buffer) {
            cache.push(value);
        }

        // Do stuff with cache and clear it
        for value in cache.drain(..) {
            println!("{}", value);
        }

        buffer.clear();
    }

    Ok(())
}

(Playground)

3 Likes

Hmm, yeah, one of the limitation of the lifetimes in the type system is that it is almost impossible to make Rust understand that an empty collection such as Option<…'lt> or Vec<…'lt> does not depend on 'lt.

Here is an ad-hoc solution:

-   let mut cache: Vec<&str> = vec![];
+   let mut cached_capacity: Vec<&str> = vec![];
    loop {
+       let mut cache: Vec<&str> = cached_capacity;
        if reader.read_line(&mut buffer)? == 0 {
            break;
        }

        // Drop the newlin
        buffer.pop();

        for value in r.split(&buffer) {
            cache.push(value);
        }

        // Do stuff with cache and clear it
        for value in cache.drain(..) {
            println!("{}", value);
        }
+       debug_assert!(cache.is_empty()); // guaranteed by `.drain(..)`
+       cached_capacity = unsafe {
+           // Safety: layout of types which only differ in lifetimes is the same
+           ::core::mem::transmute(cache)
+       };

        buffer.clear();
    }
3 Likes

The following utility function could be used for that:

fn reset_vector<'a, 'b, T>(mut vec: Vec<&'a T>) -> Vec<&'b T> {
    let cap = vec.capacity();
    let ptr = vec.as_mut_ptr();
    
    std::mem::forget(vec);
    
    unsafe {
        Vec::from_raw_parts(ptr.cast(), 0, cap)
    }
}
2 Likes

Here's a weird hack that doesn't use unsafe but instead relies on inplace collection:

use std::{
    error::Error,
    io::{BufRead, BufReader},
};

use regex::Regex;

const DATA: &[u8] = b"
This is a line
this is a second line
";

fn reuse_vec<T, U>(mut v: Vec<T>) -> Vec<U> {
    assert_eq!(std::mem::size_of::<T>(), std::mem::size_of::<U>());
    assert_eq!(std::mem::align_of::<T>(), std::mem::align_of::<U>());
    v.clear();
    v.into_iter().map(|_| unreachable!()).collect()
}

// Does not compile
// How do I reuse `cache` in a loop and tell the compiler that it is in fact safe
// because I've cleared all references to buffer?
fn main() -> Result<(), Box<dyn Error>> {
    let r = Regex::new(r"\s+").unwrap();
    let mut reader = BufReader::new(DATA);
    let mut buffer = String::new();

    let mut outer_cache: Vec<&'static str> = vec![];
    loop {
        let mut cache = reuse_vec(outer_cache);
        
        if reader.read_line(&mut buffer)? == 0 {
            break;
        }

        // Drop the newlin
        buffer.pop();

        for value in r.split(&buffer) {
            cache.push(value);
        }

        // Do stuff with cache and clear it
        for value in cache.drain(..) {
            println!("{}", value);
        }
        outer_cache = reuse_vec(cache);

        buffer.clear();
    }

    Ok(())
}

Edit: here you can see how it optimizes when T and U are references to strs Compiler Explorer

10 Likes

Thanks all for the help!

Here's what I ended up with after adding back in a bit of the real life complexity. I don't love that it involves a second pass over cache to reuse values but that's not too awful.

use std::{
    error::Error,
    io::{BufRead, BufReader},
};

use regex::Regex;

const DATA: &[u8] = b"
This is a line
this is a second line
";

fn reuse_vec<T, U>(mut v: Vec<T>) -> Vec<U> {
    assert_eq!(std::mem::size_of::<T>(), std::mem::size_of::<U>());
    assert_eq!(std::mem::align_of::<T>(), std::mem::align_of::<U>());
    v.clear();
    v.into_iter().map(|_| unreachable!()).collect()
}

fn reuse_cache<T, U>(cache: Vec<Vec<T>>) -> Vec<Vec<U>> {
    assert_eq!(std::mem::size_of::<T>(), std::mem::size_of::<U>());
    assert_eq!(std::mem::align_of::<T>(), std::mem::align_of::<U>());
    cache.into_iter().map(|c| reuse_vec(c)).collect()
}

fn main() -> Result<(), Box<dyn Error>> {
    let r = Regex::new(r"\s+").unwrap();
    let mut reader = BufReader::new(DATA);
    let mut buffer = String::new();

    // In real code I have a vec of vecs, this is contrived just to work for the example
    let mut outer_cache: Vec<Vec<&'static str>> = vec![vec![]; 100];
    loop {
        let mut cache: Vec<Vec<&str>> = reuse_cache(outer_cache);
        if reader.read_line(&mut buffer)? == 0 {
            break;
        }

        // Drop the newline
        buffer.pop();

        for (i, value) in r.split(&buffer).enumerate() {
            if let Some(inner) = cache.get_mut(i) {
                inner.push(value)
            } else {
                panic!("out of bounds")
            }
        }

        // Do stuff with cache and clear it
        for inner in cache.iter_mut() {
            for value in inner.drain(..) {
                println!("{:?}", value);
            }
        }
        outer_cache = reuse_cache(cache);

        buffer.clear();
    }

    Ok(())
}

So this actually works really nicely when cache gets more complex. Based on my reading of transmute and your safety comment, it seems like transmuting a Vec<Vec<&str>> is safe as well because it's really only lifetimes we are changing.

1 Like

I tried wrapping this behaviour in a struct, but failed (I couldn't resolve the lifetime issues): Playground

Is this approach possible?

Similar to my answer in the other thread, you can use variance and remove the reuse_cache call in the line

let mut cache: Vec<Vec<&str>> = reuse_cache(outer_cache);

instead writing just

let mut cache = outer_cache;
1 Like

:+1: now that I've seen how all of this comes together your original answer makes even more sense.

I would probably replace these with compile-time checks. E.g. using static_assertions::{assert_eq_align, assert_eq_size}.

I don't think those work with generic types

1 Like

Good point. Unfortunately there seems to be no clean way around this yet. Only thing I could come up with would be triggering some error: any use of this value will cause an error and error: erroneous constant encountered, but only with cargo build, not cargo check, which isn’t nice at all.

There are some library crates with implementations of this pattern:

and a proposal to add it to the standard library:

7 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.