Reuse map from outer scope?

rockmen1 · February 8, 2019, 3:23pm

Hi, I came across some code logic where I read each line from a file as String, then store each str segment in a map for later use.

fn main() {
    let file = File::open("input.txt").unwrap();
    let reader = BufReader::new(file);

    let mut record: HashMap<&str, u8> = HashMap::new();

    for line in reader.lines() {
        match line {
            Ok(l) => {
                UnicodeSegmentation::graphemes(l.as_str(), true)
                    .for_each(|x| *record.entry(x).or_insert(0) += 1);
                for (s, count) in record.drain() {
                    // do something fancy
                }
            },
            Err(_) => (),
        }
    }
}

the above code won't compile because the &str in the inner loop, which is stored into the map from outer scope does not live long enough.
Is there a way I can prove to the compiler that the map will be empty after each iteration? By that I can avoid moving the map initialization into the loop so reuse the same map.

kornel · February 8, 2019, 3:33pm

It's because graphemes are taken from l, which is taken from line, which is discarded on every iteration of the loop, so the record hash map would become invalid after one loop iteration.

You can collect all lines into a Vec first, so that they have a permanent place to live for longer than one loop iteration.

rockmen1 · February 8, 2019, 3:37pm

But that will need to allocate a Vec, the reason I would like to reuse the map is avoid allocation as much as possible.

kornel · February 8, 2019, 3:37pm

But how else do you imagine this to work? e.g. how would you do it in C?

Without keeping track of what strings have been created, it's literally impossible to free them later. If you do Box::leak(line.into_boxed_str()) it'll be safe and it'll work, but you won't have the strings later to free their memory.

kornel · February 8, 2019, 3:41pm

You can also read the whole file into memory with fs::read_as_string(); and use s.split('\n') to iterate lines. Then the lines will be held in the file's memory.

rockmen1 · February 8, 2019, 3:42pm

All the l will be of no use after one loop, where I put record.drain() just to clear its content.

kornel · February 8, 2019, 3:44pm

Your code does equvalent of this:

for line in reader.lines() {
        record.entry(&line);
        drop(line);
    }

which is like:

record.entry(&line);
drop(line);
// search in second iteration of the loop to compare against previous line
record.keys().any(|x| *using x crashes*)

rockmen1 · February 8, 2019, 3:45pm

the best I can do now is:

fn main() {
    let file = File::open("input.txt").unwrap();
    let reader = BufReader::new(file);

    for line in reader.lines() {
        match line {
            Ok(l) => {
                let mut record: HashMap<&str, u8> = HashMap::new();
                UnicodeSegmentation::graphemes(l.as_str(), true)
                    .for_each(|x| *record.entry(x).or_insert(0) += 1);
                for (s, count) in record.drain() {
                    // do something fancy
                }
            },
            Err(_) => (),
        }
    }
}

In such I can avoid read the entire file into memory, but allocate a map per loop.

rockmen1 · February 8, 2019, 3:47pm

My intent was like:

record.entry(&line);
// do something in current iteration
record.clear();
drop(line);

kornel · February 8, 2019, 3:49pm

That .drain() is not taken into account by borrow checker. It doesn't know what it does.

You can:

Hack it with unsafe. Transmute &line to &'static line, so that the borrow checker will not prevent you from storing potentially-unsafe pointers.
Create that temporary hashmap each iteration of the loop. Perhaps use with_capacity if you have a good estimate to reduce number of reallocations. Hopefully freeing followed by allocation of the same-sized block will pick the same block again from some hot cached free list, so the overhead will be small.

rockmen1 · February 8, 2019, 4:00pm

Hack it with unsafe . Transmute &line to &'static line , so that the borrow checker will not prevent you from storing potentially-unsafe pointers.

How can I manually drop the transmuted variable afterwards?

kornel · February 8, 2019, 4:01pm

You don't. References are never freed by definition. The compiler will drop the owned line regardless of what you unsafely do to its references.

KillTheMule · February 8, 2019, 4:05pm

Are you sure you need to, though? Did you measure? Just a small anecdote: My code iterates over millions of entities, and for each iteration step, I need a Vec that will hold up to 2 elements. I also though that I should reuse the same Vec, so I made a proper benchmark, then hoisted the Vec out of the loop to reuse it... and it made no difference! I'm not sure why, maybe the allocator was smart or the compiler or both of them, but that shows that maybe you don't need to try to do what you want right now.

rockmen1 · February 9, 2019, 2:39am

In my case, it's roughly 5% performance difference.

Topic		Replies	Views
Variable does not live long enough	23	19999	July 3, 2022
Why not work? about hashmap	8	1759	January 12, 2023
Split a string and store it in a hashmap help	14	3135	April 20, 2020
Confused about lifetime of hashmap key help	3	359	April 6, 2021
Pattern: how to reuse a `Vec<&str>` across loop iterations? help	13	2452	September 26, 2021

Reuse map from outer scope?

Related Topics