Saving a struct and its .into_iter counterpart into my struct

A pattern that I see in some packages is that they have FormatReader struct, and FormatIterator that handles iteration. In osmpbfreader, struct OsmPbfReader returns OsmObjs as iterator.

XML reader which I use, is pretty clear:

pub struct EventReader<R: Read> {
    source: R,
    parser: PullParser
}

impl<R: Read> IntoIterator for EventReader<R> {
    type Item = Result<XmlEvent>;
    type IntoIter = Events<R>;

    fn into_iter(self) -> Events<R> {
        Events { reader: self, finished: false }
    }
}

pub struct Events<R: Read> {
    reader: EventReader<R>,
    finished: bool
}

(edit: this is not my code, it's xml-rs crate)

I'm trying to implement a wrapper to this, in roder to parse XML and return Osm objects. I need to have both EventReader (it has position which tells pos in text to report errors), and Events (it has the iterator proper). But I can't figure out how.

pub struct OsmReader {
	er: EventReader<Box<dyn Read>>,
	ev: Events<Box<dyn Read>> // iterator over XMLEvents,
	// Events implements Iter: next() -> Option<Result<XmlEvent, dyn std::error::Error>>
}
impl OsmReader {
	pub fn new(path: String) -> Result<Self, Box<dyn Error>> {
		// a wrapper for flat/gzipped/bzipped files
                let rd = ...;
		let er = EventReader::new(rd);
		Ok(OsmReader { er: er, ev: er.into_iter() })
	}
}

Error is that {er: er causes er to move. Or if I do let ev = er.into_iter() earlier, this call will cause er to move ether.

Trying to wrap these in Box changed nothing:

pub struct OsmReader {
	er: Box<EventReader<Box<dyn Read>>>,
	ev: Box<Events<Box<dyn Read>>> // iterator over XMLEvents,
	// Events implements Iter: next() -> Option<Result<XmlEvent, dyn std::error::Error>>
}

...
	let er = Box::new(EventReader::new(rd));
	let ev = Box::new((*er).into_iter());
	Ok(OsmReader { er: er, ev: ev })

same error:

55  |         let ev = Box::new((*er).into_iter());
    |                                 ----------- `*er` moved due to this method call
56  |         Ok(OsmReader { er: er, ev: ev })
    |                            ^^ value used here after move

I could store just one of them, but this is inconvenient when I implement an iterator, and causes a ton of other code to write (in particular, EventReader exposes position, but Events does not, and EventReader has next() but the return type is deeper nested :frowning: ).

Is there any hope?

Your into_iter(self) here takes self by value, which is a move:

The pattern is usually as follows:

  • into_iter(self): consumes self returning an iterator over T
  • iter(&self): uses a reference to self, returning an iterator over &T
  • iter_mut(&mut self): uses a mutable reference to self, returning an iterator over &mut T

So currently, no matter what you do, calling into_iter(self) is going to consume self, regardless of whether it's boxed or not.

You have a couple options:

  • Use iter(&self) instead, but then you need to worry about reference lifetimes
  • You can clone the thing before you call into_iter(self)
2 Likes

I didn't make it clear, but both EventReader and into_iter(self) are in xml-rs crate. And it doesn't implement Clone trait either.

If this were my own code, sure, I'd have done one of these.

In that case your options are one (or more) of:

  • Modify the xml-rs crate to provide the feature you want
  • Parse the data twice so you can have the 2 separate handles you want
  • Use a different XML crate (if possible)

I think in this case the easiest thing is to just parse it twice, which is probably not as painful as it sounds unless we're talking gigabytes of XML or something.

1 Like

I'd submit a PR to implement Position for Events. In the meanwhile, you could make your own iterator to use based on Events. Or, you could use into_inner to get the EventReader back when you need position.

1 Like

as said the insurmountable issue is you can't have er and ev at the same time... BUT, I have often found most things that consume their initial structure have another into_inner() function which gets you back to the starting point. Cursor for example.. I was confounded with the same sort of problem till I saw the intended pattern.

let f = Foo();
let f_cursor = f.into_iter();
// use f_cursor
let f = f_cursor.into_inner();
// now operate on the outer object again

In generally this style of state-transition is pretty robust; if not frustrating at times. Make sure you get everything you need from the outer object before transitioning into the iterable/cursorable state. (file size, num lines or what-have-you).

2 Likes

Yes, it does have into_inner(). It, in its turn, consumes the iterator structure. But probably I can live with this, because I need position() only for post-mortem when my program finds XML invalid and unprocessable. (Though, still it'll be impossible to indicate position for warnings, if I ever need them.)

    /// Note that this operation is destructive; unwrapping the reader and wrapping it
    /// again with `EventReader::new()` will create a fresh reader which will attempt
    /// to parse an XML document from the beginning.
    pub fn into_inner(self) -> R {
        self.source
    }

Apparently, the crate has been unmaintaned for a year so far, and our forum fellow kornel offered his help. My respect and good luck with the initiative!

Still I'd prefer xml-rs, because the rewrites to quick-xml look scary.

1 Like

(Existing issue for what I suggested.)

1 Like