I'm working with the zip crate to extract files from ZIP archives. The library provides a way to get each file in the archive by an index, with the signature
Note that ZipFile takes an explicit lifetime, which is tied to the lifetime of the ZipArchive it came from. That makes sense.
I'm trying to wrap this API into an iterator that implements Iterator<Item=Result<ZipFile, ...> so that I can compose it with all the regular suspects (map(), filter(), etc.). Unfortunately, I can't seem to appease the borrow checker.
struct ZipIter<'a> {
z: &'a mut ZipArchive<BufReader<File>>,
idx: usize
}
impl<'a> Iterator for ZipIter<'a> {
// Fallible is Result<T, failure::Error>, but that doesn't seem relevant here...
type Item = Fallible<ZipFile<'a>>;
fn next(&mut self) -> Option<Self::Item> {
if self.idx < self.z.len() {
let ret = self.z.by_index(self.idx);
self.idx += 1;
Some(ret.map_err(|e| Error::from(e)))
}
else {
None
}
}
}
gives (on nightly, with edition = "2018")
error[E0495]: cannot infer an appropriate lifetime for autoref due to conflicting requirements
--> src/zip_mod.rs:51:30
|
51 | let ret = self.z.by_index(self.idx);
| ^^^^^^^^
|
note: first, the lifetime cannot outlive the anonymous lifetime #1 defined on the method body at 48:5...
--> src/zip_mod.rs:48:5
|
48 | / fn next(&mut self) -> Option<Self::Item> {
49 | | if self.idx < self.z.len() {
50 | | // TODO: Filter out directories?
51 | | let ret = self.z.by_index(self.idx);
... |
57 | | }
58 | | }
| |_____^
note: ...so that reference does not outlive borrowed content
--> src/zip_mod.rs:51:23
|
51 | let ret = self.z.by_index(self.idx);
| ^^^^^^
note: but, the lifetime must be valid for the lifetime 'a as defined on the impl at 45:6...
--> src/zip_mod.rs:45:6
|
45 | impl<'a> Iterator for ZipIter<'a> {
| ^^
= note: ...so that the types are compatible:
expected std::iter::Iterator
found std::iter::Iterator
error: aborting due to previous error
From what I can make of the error, I need to add more explicit lifetimes, since my iterator type needn't have the exact same lifetime as the underlying ZipArchive. But even after reading the relevant Rust Book chapter, I'm having trouble, and decorating &mut self with a different lifetime only makes the errors multiply.
Let’s pause for a moment and look at the following code:
let mut archive = ...; // create ZipArchive
let first = archive.by_index(0).unwrap();
let second = archive.by_index(1).unwrap();
// use `first` and `second`
The compiler wouldn’t allow this code because archive remains mutably borrowed while first is alive. The question then is why does ZipArchive require a mutable borrow of self for what otherwise looks like a readonly interface? I don’t know without looking at its code in detail . It’s possible, for example, that it’s reusing some internal buffer/state to materialize a ZipFile and this buffer/state can only support a single zipfile at a time. If that’s the case, then calling by_index the second time in the above snippet, if allowed, would invalidate first's state.
Now, if you look at the snippet above, it’s essentially what your iterator would look like:
let mut iter = ZipIter::new(&mut archive);
let first = iter.next().unwrap();
let second = iter.next().unwrap();
So the question is why does by_index take &mut self? If it’s actually doing something like the buffering mentioned (aka streaming iterator), then it’s rightfully written. If it’s not relying on that and it’s otherwise safe to have code like the above, then you could implement next() with unsafe code inside that bypasses the compiler-enforced borrow checking. But then you’re on your own in terms of ensuring things are sound.
Poking around, I believe it's because manipulating the returned ZipFile causes the actual seeks and reads on the underlying Read, which require mutability.
Thanks for breaking that down! Two quick follow-up questions:
I'm trying to get a sense for when it's a good idea to use interior mutability. Might a RefCell<ZipArchive> be appropriate here if I was confident that only one iterator item was used at a time? (I probably won't go this route, but it's helpful to think about.)
It seems like errors become much less clear as soon as explicit lifetimes get involved. Even though mutability was the actual problem, the ones here don't mention it at all! Is there something I can do to help the compiler give more helpful errors, or is parsing these just something that will have to come with more experience?
This is generally a problem I've noticed as well. The issue detected by borrowck is that it cannot guarantee the mutable borrow is unique over the anonymous lifetime #1. But the error message doesn't actually mention mutability anywhere. This is probably one of those cases where the error message is generic to lifetime constraints, and has nothing to do with mutability. It would be very nice if borrowck had a little bit more knowledge that it could provide, though.
Yeah. The bigger issue, so to speak, is it reuses the underlying reader to read the ZipFile content.
RefCell would be very awkward, if not impossible, in this particular case. You’d need to use RefCell::borrow_mut() which returns a RefMut smart pointer. But then you need to call by_index through it, but that gives you a value back that’s of the “wrong” lifetime. It’s a mess
Moreover, even if it could work, it’s a footgun because if you accidentally retain multiple references (easy to do if you plan on using iterator combinators), it would panic.
I’d just punt on the iterator impl and use normal streaming iteration.
Lifetimes are ok in and of themselves, it’s mutable references that make things more complicated in Rust. A lot of code slinging immutable refs around “just works” for most people, but as soon as mutable ones enter the picture, things get more dicey due to their move-only behavior (or reborrow) and affect on variance of other lifetime’d types.
I don’t think there’s anything you can do to get better error messages in these cases. NLL comes with better error messages (from what I’ve seen) for some cases, and the compiler folks are always looking at improving them still. But I think a large part will be just getting used to them - at some point you’ll “just know” what they mean