Returning an iterator

Hello!

I'm very new to Rust, and am trying to make a small command line tool. I want to make my program either read in a space-separated list of strings as arguments from the command line, or read lines from stdin if no arguments were given. So my thought was to make a function that checks if the number of arguments is larger than 0, if it is, return at iterator over the arguments, if not return an iterator over lines from stdin.

However, when I'm trying to do this I run in to mismatched type errors. I have tried several different ways, but below is my latest attempt.

#[derive(Debug)]
struct Input<'a> {
    input_args: &'a Vec<String>
}

impl<'a> Input<'a> {
    fn from(input_arg: &'a Vec<String>) -> Input {
        Input { input_args: input_arg }
    }

    fn iterator(&self) -> impl Iterator<Item=&String> {
        if self.input_args.len() != 0 {
            self.input_args.iter()
        }
        stdin().lock().lines().into_iter()
            .map(|line| &line.expect("IO error"))
    }
}

This gives me this error:

error[E0308]: mismatched types
  --> src/main.rs:36:13
   |
36 |             self.input_arg.iter()
   |             ^^^^^^^^^^^^^^^^^^^^^- help: try adding a semicolon: `;`
   |             |
   |             expected (), found struct `std::slice::Iter`
   |
   = note: expected type `()`
              found type `std::slice::Iter<'_, std::string::String>`

I don't understand why it is expecting type (), when I am trying to return a type that implements Iterator<Item=&String>?

Can someone give me a hint to what I'm not getting?

() is expected because this if lacks the else branch. You can either change this if to early return:

        if self.input_args.len() != 0 {
            return self.input_args.iter()
        }

or put the rest of the function in the else branch.


There are also some other issues with the code. Here you try to borrow a temporary String (line):

        stdin().lock().lines().into_iter()
            .map(|line| &line.expect("IO error"))

this will result in "does not live long enough" error. The simplest solution would be to change Item=&String to Item=String (in general, if you return a reference, the true owner needs to already exist before calling the function).

Also, Rust won't allow you to return two different iterators in a -> impl Iterator function. You can work around that by returning Box<dyn Iterator> or using enum approach, eg. by using the either crate. Here's a thread describing a similar issue. And another one.

2 Likes

Thank you, I solved it with the either crate. Unfortunately I wasn't able to get it exactly the way I wanted to, I had to collect the stdin data in a Vec and return an iterator over that. As the stdin().lock()-method took a reference to the stdin struct and therefore the reference didn't live long enough when doing stdin().lock().lines().into_iter().map(|line| line.expect("IO error"))

You could have the Input create the locked stdin at construction time and keep it as a field on itself.

And depending on requirements that could be loosened to any type implementing BufRead.

There are two problems here:

  • You're trying to use a lazy iterator on something that you must lock to gain access to. IE. You would have to keep a lock to stdin until the caller has exhausted the iterator and dropped it.
  • You're also trying to send references to the strings instead of owned objects which causes even more lifetime issues. IE a similar problem to #1

Try something like this:

fn iterator(&self) -> Vec<String> {
    if self.input_args.len() != 0 {
        self.input_args.iter().collect::<Vec<String>>()
    }
    else {
        stdin().lock().lines().into_iter()
            .map(|line| line.expect("IO error"))
            .collect::<Vec<String>>()
    }
}

Where you get rid of the lazy iterator and the references and just calculate it on the function call. This would be somewhat slower if you don't use the entire contents of the Vec<String> but it wouldn't be too large.

I could only get it working when I made the constructor take the lock as a parameter, but this seems to work. Maybe a bit unfortunate that I have to lock stdin even if I'm using the command line arguments.

Edit: I realised I could take a reference to the Stdin and only lock it if I need to. That's probably better.

You’re trying to use a lazy iterator on something that you must lock to gain access to. IE. You would have to keep a lock to stdin until the caller has exhausted the iterator and dropped it.

Since my program is just iterating through all the inputs and calling a function on each of them, I figured it would be preferable to just iterate once and not collect and then iterate again. Is this still not a good idea?

If it were a matter of really optimizing it, then I'd rethink the system or implement an iterator to do the job, but otherwise I think that the collect calls is alright.