In a previous question an answer was partially found: Review of unsafe usage. I've reduced the problem a bit more and really feel like there is a pattern for doing this that I'm missing.
I don't want to re-allocate the vec each loop. The answer in the link above is good, but really constrains things like propagating errors and storing the cache on a struct etc.
Is there a simpler way?
use std::{
error::Error,
io::{BufRead, BufReader},
};
use regex::Regex;
const DATA: &[u8] = b"
This is a line
this is a second line
";
// Does not compile
// How do I reuse `cache` in a loop and tell the compiler that it is in fact safe
// because I've cleared all references to buffer?
fn main() -> Result<(), Box<dyn Error>> {
let r = Regex::new(r"\s+").unwrap();
let mut reader = BufReader::new(DATA);
let mut buffer = String::new();
let mut cache: Vec<&str> = vec![];
loop {
if reader.read_line(&mut buffer)? == 0 {
break;
}
// Drop the newlin
buffer.pop();
for value in r.split(&buffer) {
cache.push(value);
}
// Do stuff with cache and clear it
for value in cache.drain(..) {
println!("{}", value);
}
buffer.clear();
}
Ok(())
}
Hmm, yeah, one of the limitation of the lifetimes in the type system is that it is almost impossible to make Rust understand that an empty collection such as Option<…'lt> or Vec<…'lt> does not depend on 'lt.
Here is an ad-hoc solution:
- let mut cache: Vec<&str> = vec![];
+ let mut cached_capacity: Vec<&str> = vec![];
loop {
+ let mut cache: Vec<&str> = cached_capacity;
if reader.read_line(&mut buffer)? == 0 {
break;
}
// Drop the newlin
buffer.pop();
for value in r.split(&buffer) {
cache.push(value);
}
// Do stuff with cache and clear it
for value in cache.drain(..) {
println!("{}", value);
}
+ debug_assert!(cache.is_empty()); // guaranteed by `.drain(..)`
+ cached_capacity = unsafe {
+ // Safety: layout of types which only differ in lifetimes is the same
+ ::core::mem::transmute(cache)
+ };
buffer.clear();
}
Here's a weird hack that doesn't use unsafe but instead relies on inplace collection:
use std::{
error::Error,
io::{BufRead, BufReader},
};
use regex::Regex;
const DATA: &[u8] = b"
This is a line
this is a second line
";
fn reuse_vec<T, U>(mut v: Vec<T>) -> Vec<U> {
assert_eq!(std::mem::size_of::<T>(), std::mem::size_of::<U>());
assert_eq!(std::mem::align_of::<T>(), std::mem::align_of::<U>());
v.clear();
v.into_iter().map(|_| unreachable!()).collect()
}
// Does not compile
// How do I reuse `cache` in a loop and tell the compiler that it is in fact safe
// because I've cleared all references to buffer?
fn main() -> Result<(), Box<dyn Error>> {
let r = Regex::new(r"\s+").unwrap();
let mut reader = BufReader::new(DATA);
let mut buffer = String::new();
let mut outer_cache: Vec<&'static str> = vec![];
loop {
let mut cache = reuse_vec(outer_cache);
if reader.read_line(&mut buffer)? == 0 {
break;
}
// Drop the newlin
buffer.pop();
for value in r.split(&buffer) {
cache.push(value);
}
// Do stuff with cache and clear it
for value in cache.drain(..) {
println!("{}", value);
}
outer_cache = reuse_vec(cache);
buffer.clear();
}
Ok(())
}
Edit: here you can see how it optimizes when T and U are references to strs Compiler Explorer
Here's what I ended up with after adding back in a bit of the real life complexity. I don't love that it involves a second pass over cache to reuse values but that's not too awful.
use std::{
error::Error,
io::{BufRead, BufReader},
};
use regex::Regex;
const DATA: &[u8] = b"
This is a line
this is a second line
";
fn reuse_vec<T, U>(mut v: Vec<T>) -> Vec<U> {
assert_eq!(std::mem::size_of::<T>(), std::mem::size_of::<U>());
assert_eq!(std::mem::align_of::<T>(), std::mem::align_of::<U>());
v.clear();
v.into_iter().map(|_| unreachable!()).collect()
}
fn reuse_cache<T, U>(cache: Vec<Vec<T>>) -> Vec<Vec<U>> {
assert_eq!(std::mem::size_of::<T>(), std::mem::size_of::<U>());
assert_eq!(std::mem::align_of::<T>(), std::mem::align_of::<U>());
cache.into_iter().map(|c| reuse_vec(c)).collect()
}
fn main() -> Result<(), Box<dyn Error>> {
let r = Regex::new(r"\s+").unwrap();
let mut reader = BufReader::new(DATA);
let mut buffer = String::new();
// In real code I have a vec of vecs, this is contrived just to work for the example
let mut outer_cache: Vec<Vec<&'static str>> = vec![vec![]; 100];
loop {
let mut cache: Vec<Vec<&str>> = reuse_cache(outer_cache);
if reader.read_line(&mut buffer)? == 0 {
break;
}
// Drop the newline
buffer.pop();
for (i, value) in r.split(&buffer).enumerate() {
if let Some(inner) = cache.get_mut(i) {
inner.push(value)
} else {
panic!("out of bounds")
}
}
// Do stuff with cache and clear it
for inner in cache.iter_mut() {
for value in inner.drain(..) {
println!("{:?}", value);
}
}
outer_cache = reuse_cache(cache);
buffer.clear();
}
Ok(())
}
So this actually works really nicely when cache gets more complex. Based on my reading of transmute and your safety comment, it seems like transmuting a Vec<Vec<&str>> is safe as well because it's really only lifetimes we are changing.
Good point. Unfortunately there seems to be no clean way around this yet. Only thing I could come up with would be triggering some error: any use of this value will cause an error and error: erroneous constant encountered, but only with cargo build, not cargo check, which isn’t nice at all.