Hi, I came across some code logic where I read each line from a file as String, then store each str segment in a map for later use.
fn main() {
let file = File::open("input.txt").unwrap();
let reader = BufReader::new(file);
let mut record: HashMap<&str, u8> = HashMap::new();
for line in reader.lines() {
match line {
Ok(l) => {
UnicodeSegmentation::graphemes(l.as_str(), true)
.for_each(|x| *record.entry(x).or_insert(0) += 1);
for (s, count) in record.drain() {
// do something fancy
}
},
Err(_) => (),
}
}
}
the above code won't compile because the &str in the inner loop, which is stored into the map from outer scope does not live long enough.
Is there a way I can prove to the compiler that the map will be empty after each iteration? By that I can avoid moving the map initialization into the loop so reuse the same map.
It's because graphemes are taken from l, which is taken from line, which is discarded on every iteration of the loop, so the record hash map would become invalid after one loop iteration.
You can collect all lines into a Vec first, so that they have a permanent place to live for longer than one loop iteration.
But how else do you imagine this to work? e.g. how would you do it in C?
Without keeping track of what strings have been created, it's literally impossible to free them later. If you do Box::leak(line.into_boxed_str()) it'll be safe and it'll work, but you won't have the strings later to free their memory.
You can also read the whole file into memory with fs::read_as_string(); and use s.split('\n') to iterate lines. Then the lines will be held in the file's memory.
fn main() {
let file = File::open("input.txt").unwrap();
let reader = BufReader::new(file);
for line in reader.lines() {
match line {
Ok(l) => {
let mut record: HashMap<&str, u8> = HashMap::new();
UnicodeSegmentation::graphemes(l.as_str(), true)
.for_each(|x| *record.entry(x).or_insert(0) += 1);
for (s, count) in record.drain() {
// do something fancy
}
},
Err(_) => (),
}
}
}
In such I can avoid read the entire file into memory, but allocate a map per loop.
That .drain() is not taken into account by borrow checker. It doesn't know what it does.
You can:
Hack it with unsafe. Transmute &line to &'static line, so that the borrow checker will not prevent you from storing potentially-unsafe pointers.
Create that temporary hashmap each iteration of the loop. Perhaps use with_capacity if you have a good estimate to reduce number of reallocations. Hopefully freeing followed by allocation of the same-sized block will pick the same block again from some hot cached free list, so the overhead will be small.
Are you sure you need to, though? Did you measure? Just a small anecdote: My code iterates over millions of entities, and for each iteration step, I need a Vec that will hold up to 2 elements. I also though that I should reuse the same Vec, so I made a proper benchmark, then hoisted the Vec out of the loop to reuse it... and it made no difference! I'm not sure why, maybe the allocator was smart or the compiler or both of them, but that shows that maybe you don't need to try to do what you want right now.