Par_iter borrows an object and can't be returned

I have code that uses osmpbfreader to iterate over objects in files. The problem is that I need to process it in 2-3 passes, and there's a lot of overhead in it to write the cycle every time.

// pass 1
let file = std::fs::File::open(&std::path::Path::new(osm_file_path))?;
let mut reader = OsmPbfReader::new(file)?;
for obj in reader.par_iter().map(Result::unwrap) {
	match obj {
		OsmObj::Way(way) => {
			if !way.tags().contains_key("highway") { continue; }
			// do useful work
		}
		_ => {}
	}
}

// pass 2
let file = std::fs::File::open(&std::path::Path::new(osm_file_path))?;
let mut reader = OsmPbfReader::new(file)?;
for obj in reader.par_iter().map(Result::unwrap) {
	match obj {
		OsmObj::Way(way) => {
			if !way.tags().contains_key("highway") { continue; }
			// do another kind of work
		}
		_ => {}
	}
}

// pass 3...

So I wanted to abstract this into a struct with Iterator trait. I checked Iterator trait, and it hasn't an init method to be called when for loop starts, so I understand that I have to initialize the internal iterator myself and then get its items in next.

Something like:

use osmpbfreader::reader::ParIter;
struct WaysIterator(ParIter);

impl WaysIterator {
	fn new(osm_file_path) -> WaysIterator {
		// create reader and do par_iter
	}
}

impl Iterator for WaysIterator {
	fn next() -> Option<Way> {
		loop {
			match self.0.next() {
				Some(OsmObj::Way(way)) => {
					if way.tags().contains_key("highway") {
						return way
					}
				Some(x) => {}
				None => return None
				}
			}
		}
	}
}

Unfortunately, I'm stuck at implementing new.

The problem is that par_iter borrows the reader, not takes ownership, and upon returning the reader will be lost. So the compiler resists:

struct RoadIter<'a>(ParIter<'a, std::fs::File>);

impl RoadIter<'_> {
	fn new(osm_file_path: &std::ffi::OsStr) -> Result<RoadIter, Box<dyn Error + 'static>> {
		let file = std::fs::File::open(&std::path::Path::new(osm_file_path))?;
		let mut reader = Box::new(OsmPbfReader::new(file));


		let pi = reader.par_iter();
				 ^^^^^^
		         error: src/main.rs:29: `*reader` is borrowed here

		Ok(RoadIter(pi))
					^^
		error: src/main.rs:30: returns a value referencing data owned by the current function
		error: src/main.rs:30: cannot return value referencing local data `*reader`

	}
}

I tried to keep both the reader and ParIter in the struct, but the same error persists.

struct RoadIter<'a>(OsmPbfReader<std::fs::File>, ParIter<'a, std::fs::File>);

Box-ing it didn't help either.

struct RoadIter<'a>(Box<OsmPbfReader<std::fs::File>>, ParIter<'a, std::fs::File>);
...
	let mut reader = Box::new(OsmPbfReader::new(file));

The same error: cannot move out reader, because it's borowed.

What can be done about this?

Not familiar with this library, but as a general rule, in cases like this, you need to implement Iterator on a separate struct that borrows from the struct that owns the prepared data. This way, the latter keeps the data alive and the former can return borrows of it from next().

That's what I was doing. This struct was made to keep the reader object, and the iterator.

struct RoadIter<'a>(ParIter<'a, std::fs::File>);
// or version 2:
struct RoadIter<'a>(OsmPbfReader<std::fs::File>, ParIter<'a, std::fs::File>);

The problem is that when I call .par_iter on OsmPbfReader instance, par_iter borrows the reader, and I can't return it alone or in the struct.

You can't put OsmPbfReader<std::fs::File> and ParIter<'a, std::fs::File> in the same struct. You have to construct and return the Reader, then in a separate function construct your Iterator that borrows it.

That makes sense, but how should they call each other?

I guess if I compose them this way, the problem will be the same?

fn make_reader(...) {
    OsmPbfReader(path...)
}

fn make_me_an_iter(path: ...) -> ... {
    r = make_reader(path);
    r.par_iter()
}

So it should be

   fn make_reader(...) {
    OsmPbfReader(path...)
}

fn make_me_an_iter(reader: &OsmPbfReader ...) -> ... {
    r = make_reader(path);
    r.par_iter()
}

Is this correct?

I thought a bit and rewrote it inside-out. Instead of getting an iterator like I'd do in Python, I tried passing a callable:

fn reduce<T>(path: &OsStr, init: T, func: &dyn Fn(OsmObj, T) -> T) -> Result<T, Box<dyn Error>> {
	let file = std::fs::File::open(&std::path::Path::new(path))?;
	let mut reader = OsmPbfReader::new(file);
	let mut val = init;
	for obj in reader.par_iter().map(Result::unwrap) {
		if obj.tags().contains_key("highway") {
			val = func(obj, val);
		}
	}
	Ok(val)
}


fn count_ways_in_node(obj: OsmObj, mut counter: WaysInNodesCounter) -> WaysInNodesCounter {
	match obj {
		OsmObj::Way(w) => {
			for node_id in w.nodes {
				*counter.entry(node_id).or_insert(0) += 1;
			}
		}
		_ => {}
	}
	counter
}


fn function_that_uses_it_all(osm_file: &std::ffi::OsStr) -> Result<(), Box<dyn Error>> {
	let mut counter: WaysInNodesCounter = HashMap::new();
	counter = reduce(osm_file, counter, &count_ways_in_node)?;
            ...
     }

What I don't like is this match block still necessary in the callable:

match obj {
	OsmObj::Way(w) => { // useful work here }
	_ => {}
}

This is because I can't pass Enum variant as an accepted type.

Here's the key fact you need to work with: OsmPbfReader::par_iter returns a type (ParIter) which borrows the OsmPbfReader. This means that, in order to use the ParIter you must separately keep the OsmPbfReader alive. You are free to write wrapper types around both of these, but you cannot combine them into one type or one function. You can reduce the code duplication by adding functions and/or types, as long as you keep the two parts separate.

Working off the code sample you provided in your first post, here's some helper functions that remove the duplicated code (not tested):

use std::fs::File;
use std::path::Path;
use std::io;
use osmpbfreader::reader::OsmPbfReader;
use osmpbfreader::objects::{OsmObj, Way};

fn new_reader(path: &Path) -> Result<OsmPbfReader<File>, io::Error> {
    let file = std::fs::File::open(&std::path::Path::new(osm_file_path))?;
    Ok(OsmPbfReader::new(file))
}

fn filter_iter<'a>(reader: &'a OsmPbfReader<File>) -> impl Iterator<Item = Way> + 'a {
    reader.par_iter()
        .filter_map(|result| {
            match result.unwrap() {
                OsmObj::Way(way) => {
                    if !way.tags().contains_key("highway") { return None; }
                    Some(way)
                }
                _ => None
            }
        })
}

fn main() {
    let path = ...;
    for way in filter_iter(&new_reader(path)) {
        // pass 1
    }
    for way in filter_iter(&new_reader(path)) {
        // pass 2
    }
}

If you want to write an explicit struct instead of using -> impl Iterator you can still do that, but you have to make it do only the parts that are in filter_iter, and not the parts that are in new_reader, so that the struct borrows the OsmPbfReader — it cannot own it.

2 Likes

Interesting. Thanks a lot! Will try this. I've never seen Impl in the return type, or don't recall reading it.

Here's what I finally made of this code. Doing it, encountered the same problem with retruning a reference to an internal var (OsmObj::way() returns a Option<&Way>, not Option<Way>), but found a way around.


fn get_reader(path: &OsStr) -> Result<OsmPbfReader<File>, Box<dyn Error>> {
	let file = File::open(&Path::new(path))?;
	Ok(OsmPbfReader::new(file))
}

fn iterate_objs<'a, F, Y>(reader: &'a mut OsmPbfReader<File>, filter_fn: &'a mut F) -> impl Iterator<Item = Y> + 'a
	where F: FnMut(OsmObj) -> Option<Y> {
	reader.par_iter().map(|r| r.unwrap()).filter_map(filter_fn)
}

fn main() {
	// ...
	let mut counter: HashMap<NodeId, i32> = HashMap::new();
	let mut reader = get_reader(osm_file)?;

	for way in iterate_objs(&mut reader, &mut |o: OsmObj| o.way().cloned()) {
		for node_id in &way.nodes {
			*counter.entry(*node_id).or_insert(0) += 1;
		}
	}
}

Now, this is really compact code! Thanks for help!

If you use a match (or if let) in the filter as you did in your original code, instead of .way(), then you will not need to clone the Way as o.way().cloned() does. (This matters only if you care about processing speed.)

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.