How would I expand some items from an iterator?

I'm just learning Rust (coming from a Python background) and I want to try to learn to write idiomatic Rust, and not just translate approaches from other languages.

At the moment, I'm working on a function to "glob-expand" an argument list, according to the rules:

  • Literal strings get passed through unchanged
  • Patterns are glob-expanded and the list of matched files passed through.

In Python, I'd write this as a generator:

def expand(args):
    for arg in args:
        if is_literal(arg):
            yield arg
        else:
            for path in glob(arg):
                yield path

I've written a Rust version that uses an explicit vector:

fn main() {
    let mut result: Vec<PathBuf> = Vec::new();
    for arg in env::args().skip(1) {
        if is_literal(&arg) {
            result.push(arg.into());
        } else {
            for p in glob(&arg).unwrap() {
                result.push(p.unwrap());
            }
        }
    }
    println!("{:?}", result);
}

but I feel as though I should be able to do something much cleaner - probably using flat_map(). But when I tried, I got into a mess trying to make one-element lists that could be re-flattened, and handling the possible errors. It felt clumsy and inefficient.

I have something working, but I don't want to leave it at a point where I'm feeling "this would have been so much cleaner in Python" as that means I've not really understood the right way of doing things in Rust. So could someone give me some pointers on what an idiomatic Rust solution would look like, and what ideas I need to properly understand in order to make this feel more natural (I already get the strong feeling that iterators are very much the right thing to use, but it's the details on how to use them effectively that I'm less sure on).

I think my biggest stumbling block was when I reached a point where something is too complex to write as a sub-expression. At that point in Python, I naturally reach for a generator function, but I don't know what the equivalent tool would be in Rust.

Thanks for any help!

You can just do the equivalent of what you'd do in Python: if the input is an iterator, then make the output an iterator, too! Although there's no yield sugar for it AFAICT, writing an iterator is pretty simple in Rust, too:

struct FlattenGlob<T> {
    inner: T,
    glob: Vec<String>,
}

impl<T: Iterator<Item=String>> Iterator for FlattenGlob<T> {
    type Item = String;

    fn next(&mut self) -> Option<Self::Item> {
        if self.glob.len() > 0 {
            return self.glob.pop();
        }
        let arg = self.inner.next()?;

        if is_literal(&arg) {
            return Some(arg);
        } else {
            self.glob = glob(&arg).unwrap()
                .into_iter()
                .map(Result::unwrap)
                .collect();
            self.next() // recurse for correct behavior if glob is empty
        }
    }
}

This is a quick sketch, so it still collects the elements of the glob into a Vec – I was assuming that's what your glob() function does although I'm not sure. If it returns an iterator, you can adapt the code accordingly, it won't be any more complicated. Furthermore, you could use a Vec::IntoIter instead, in order to preserve the order of items after expansion, if it matters at all. (Here I was lazy and used pop() for demonstration purposes, which results in first-in-last-out behavior.)

You shouldn’t feel too bad about this actually, as this is currently a weak point for Rust. There is a proposal for generator support, but it is very much a work in progress:

https://doc.rust-lang.org/beta/unstable-book/language-features/generators.html

So patterns that fit really well with generators, like your example, probably will be more verbose until this feature is implemented.

3 Likes

Thanks, that's really helpful and I'll study it. I don't really have a good feel yet for writing my own iterator, so this is a great example. My apologies for not saying so, but the glob function I was using comes from the (standard?) glob crate, which returns an iterator. My main question was actually about how I could avoid building a Vec of the results (on the assumption that, as with Python, it's more idiomatic to use iterators to reduce memory usage and avoid the overheads of potentially large temporary data structures).

Thanks. I had seen that, so I was aware that if I needed to write my own iterator, it would be a little more verbose than Python, and I don't have a problem with that. What I was probably more interested in, is how to use existing iterators and iterator methods more effectively.

My almost-working version was something like this (from memory, so probably wrong!)


env::args()
    .skip(1)    /* Drop the program name */
    .flat_map(
        |arg| {
            if is_literal(arg) {
                iter::once(arg)
            } else {
                glob(arg)  /* This is where I get stuck - see below */
            }
    })

That looks OK to me (first time round I hadn't been aware of iter::once - that helps a lot here!) but the nested glob call needs work, because of error handling. It returns a Result which can error (invalid pattern) or return an iterator of Results, which contain either a PathBuf or an IO error. So I need to deal with an outer error if glob fails, and an inner error if one of the glob results is a failure. And ideally I'd like to do that cleanly - without just letting the code panic. I don't really know how to do that.

Also, and this is where I was interested about what is idiomatic, is a chained iterator like this, with a multi-line closure in the middle, idiomatic Rust? Or is it going to look like it was written by a kid playing with a shiny new toy (which, to be fair, is what I feel like - Rust's iterators are really cool :slightly_smiling_face:)?

Here's how you can do it with flat_map. One thing I'm making use of is the Either type from itertools, which is needed because flat_map requires you to return a specific concrete type of iterator, but you want to return two different kinds. Either simply allows combining them into one type.

I have also added conversions of the two kinds of errors the glob crate returns into a single type with the into() and map_err() calls.

fn foo() -> impl Iterator<Item = Result<PathBuf, Error>> {
    env::args()
        .skip(1)
        .flat_map(
            |arg| {
                if is_literal(&arg) {
                    Either::Left(iter::once(Ok(PathBuf::from(arg))))
                } else {
                    match glob(&arg) {
                        Err(err) => Either::Left(iter::once(
                            Err(err.into()))),
                        Ok(glob) => Either::Right(
                            glob.map(|res| res.map_err(|err| err.into()))
                        ),
                    }
                }
            }
        )
}

playground

1 Like

Perfectly. Nothing wrong with that.

I didn't remember glob handling in std and a quick search on doc.rust-lang.org/std reveals no glob function. I'm assuming we're talking about this crate then?

That wasn't really clear from your original question. In this case you need to come up with an error type that can accomodate both kinds of errors. Box<dyn Error> is such a type (although that's a lazy choice, but it works). Like this (playground):

struct FlattenGlob<T> {
    inner: T,
    paths: Paths,
}

impl<T: Iterator<Item=String>> Iterator for FlattenGlob<T> {
    type Item = Result<PathBuf, Box<dyn Error>>;

    fn next(&mut self) -> Option<Self::Item> {
        if let Some(result) = self.paths.next() {
            return Some(result.map_err(|error| Box::new(error) as _));
        }

        let arg = self.inner.next()?;

        if is_literal(&arg) {
            return Some(Ok(arg.into()));
        } else {
            match glob(&arg) {
                Err(error) => return Some(Err(Box::new(error))),
                Ok(paths) => {
                    self.paths = paths;
                    self.next()
                }
            }
        }
    }
}

By the way, I don't really see the need for manually checking for a literal (by which I assume you mean non-glob) path. If you don't specify any wildcard characters, glob() will return a single item with the same path you gave it.

Yes. Sorry for the confusion, I find it hard to be clear what's "standard" and what's not. I assume anything under std:: is the standard library, but beyond that some crates (like glob) seem to be maintained by the Rust maintainers, which others are 3rd-party. Most things I find come from google search leading me to docs.rs, and getting from there to the project homepage is quite difficult for someone new to the ecosystem (I found it now, there's a link to the crate, from there to creates.io, and from there to the repository).

Just to be clear, that's not a complaint, I'm still finding my way around, but it might be a useful perspective for someone looking at discoverability.

No, it wasn't. I'm still fumbling round trying to understand what precisely my problem is. Thanks for your patience in replying. Yes, it does seem like creating a custom error type that unifies the various error types that can be produced is the best option. Do people routinely do that in Rust (looking at various crates, all of which have their own result type, I guess the answer is "yes")? From my Python background, where everything is more dynamic, there's not as much need, so that's a good insight for me into how strict typing affects things.

I thought from my experiments, that didn't happen if the file didn't exist - glob("foo.txt") returns an empty iterator if foo.txt doesn't exist. I'll re-check. But it's crucial to me (for tedious reasons that aren't particularly relevant here - I can explain if you're interested, though) that non-wildcard values that don't name an existing file are passed through unchanged.

And regardless, I'm learning more from trying to incorporate both possibilities, so it's a win even so :slightly_smiling_face:

Yes, and crates like thiserror even codify the pattern. Of course this is not needed in Python because it is dynamically typed so you can just throw whatever exception you want to throw.

Ah, so that's the trick! I see, in this case manual checking is indeed needed.

1 Like

Regarding the manual error enum, please see the playground in my previous post, which has an example of that.

As for whether people routinely do this: I would say yes for libraries, but I tend to use a box error in applications. The reason is that an enum lets your users more easily handle the errors your library produces, but this is not as important in an application where you can change the error type if needed.

1 Like

This is sort of a sidenote, but from a docs.rs page, you can find links to the "Homepage" (which is often the repository) or the repository by poking around in the drop-down menu available at the top of the page. (It took me a little while to realize that the entry packed full of github icons was a repo link, too.)

This is really interesting. Rather than simply using what you wrote, I'm trying to understand the ideas behind it. I've written a version of the code that returns a vector rather than an iterator, so that I can focus on the error handling. The following works:

fn gen() -> Result<Vec<PathBuf>, PatternError>  {
    let mut paths : Vec<PathBuf> = Vec::new();

    for arg in env::args().skip(1) {
        if is_literal(&arg) {
            paths.push(arg.into());
        } else {
            paths.extend(glob(&arg)?.map(|p| p.unwrap()))
        }
    }
    Ok(paths)
}

But I've still got that unwrap in there. I'd like to use the approach you used, using impl Trait (which I'd never heard of before - thanks for that! I suspect it's newer than the book I have been reading). But I can't get it right, as I want to return a Result<Iterator, Error> but that's not a trait, and yet if I try to just return it, the fact that the Iterator and Error are is messing things up.

At this point, I feel like I'm just changing the code arbitrarily, and seeing what errors the compiler throws out. I don't feel like I understand what's going on here at all :slightly_frowning_face: Is there anything I can read that would help me understand how to structure error propagation and error handling code?

To be honest, I can just move on at this point. I have two or three variations that work fine for my purposes, and I could just use one and write my application with that. But I'm having fun learning, and the help I've got here has been really useful, so I'm hoping to squeeze as much information out of the exercise as I can :slight_smile:

You can use impl Iterator here too, but the main challenge is that to do this, you need to be able to predict if an error will occur before iterating the iterator.

In some cases, e.g. when using collect(), you can make use of the fact that you can collect an Iterator<Item = Result<A, B>> into a Result<Vec<A>, B>, but I don't know of an equivalent for extend. In this case I believe you would have to just use a for loop:

for val in glob(&arg) {
    paths.push(val?);
}

Notice the question mark which unwraps the result, and immediately returns a result instead of a panic on error.

Ultimately the flat_map example of mine still produces an Iterator<Item=Result<A, B>>, so I've just given the problem of figuring out if it failed to the caller. The caller can make use of the collect() magic, or otherwise match or question mark them in a loop.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.