Is there a way to iterate a struct with &str (the String slice) field?

I've learnt Rust a few months, but I occasionally come across the puzzle about this:

// This fails
use std::marker::PhantomData;

fn main() {
    let s = S { s: std::str::from_utf8(&[65]).unwrap(), };
    println!("{:?}", s);
    let yield_s = YieldS { vec: (65..70).collect(),
                           pos: 0,
                           pht: PhantomData, };
    yield_s.map(|S { s }| println!("{}", s)).last();
}

#[derive(Debug, Clone)]
struct S<'a> {
    s: &'a str,
}

struct YieldS<'a> {
    vec: Vec<u8>,
    pos: usize,
    pht: PhantomData<&'a str>,
}

impl<'a> Iterator for YieldS<'a> {
    type Item = S<'a>;

    fn next(&mut self) -> Option<Self::Item> {
        if self.pos < self.vec.len() {
            self.pos += 1;
            Some(Self::from_bytes(&self.vec[self.pos - 1..self.pos]))
        } else {
            None
        }
    }
}

impl<'a> YieldS<'a> {
    pub fn from_bytes(b: &'a [u8]) -> S { S { s: std::str::from_utf8(b).unwrap(), } }
}

The code above fails with:

error[E0495]: cannot infer an appropriate lifetime for lifetime parameter in function call due to co
nflicting requirements
  --> src/main.rs:36:36
   |
36 |             Some(Self::from_bytes(&self.vec[self.pos - 1..self.pos]))
   |                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: first, the lifetime cannot outlive the anonymous lifetime defined on the method body at 33:13.
..
  --> src/main.rs:33:13
   |
33 |     fn next(&mut self) -> Option<Self::Item> {
   |             ^^^^^^^^^
note: ...so that reference does not outlive borrowed content
  --> src/main.rs:36:36
   |
36 |             Some(Self::from_bytes(&self.vec[self.pos - 1..self.pos]))
   |                                    ^^^^^^^^
note: but, the lifetime must be valid for the lifetime `'a` as defined on the impl at 30:6...
  --> src/main.rs:30:6
   |
30 | impl<'a> Iterator for YieldS<'a> {
   |      ^^
note: ...so that the types are compatible
  --> src/main.rs:36:18
   |
36 |             Some(Self::from_bytes(&self.vec[self.pos - 1..self.pos]))
   |                  ^^^^^^^^^^^^^^^^
   = note: expected `YieldS<'_>`
              found `YieldS<'a>`

For more information about this error, try `rustc --explain E0495`.
error: could not compile `rustdx` due to previous error

I know the owned type String solves the issue with slight overheads. But I'm not sure whether I'm missing something...

// This works
fn main() {
    let s = Sowned { s: std::str::from_utf8(&[65]).unwrap().into(), };
    println!("{:?}", s);

    let yield_sowned = YieldSowned { vec: (65..70).collect(),
                                     pos: 0, };
    let res = yield_sowned.enumerate()
                          .fold("".into(), |acc, (i, Sowned { s })| format!("{}\n{}: {}", acc, i, s));
    println!("{}", res);
}

#[derive(Debug, Clone)]
struct Sowned {
    s: String,
}

struct YieldSowned {
    vec: Vec<u8>,
    pos: usize,
}

impl Iterator for YieldSowned {
    type Item = Sowned;

    fn next(&mut self) -> Option<Self::Item> {
        if self.pos < self.vec.len() {
            self.pos += 1;
            Some(Self::from_bytes(&self.vec[self.pos - 1..self.pos]))
        } else {
            None
        }
    }
}

impl YieldSowned {
    pub fn from_bytes(b: &[u8]) -> Sowned { Sowned { s: std::str::from_utf8(b).unwrap().into(), } }
}

The full code on playground.

I incline to use &str whenever the modification on utf8 bytes is never needed.
But now I just feel so upset &str can't work as expected.


BTW. .fold("".into(), |acc, (i, Sowned { s })| format!("{}\n{}: {}", acc, i, s)) seems to reallocate many times. Is this true?
If true, how can we trust Rust to perform efficiently? If the iterator yields thousands of times, the format! reallocates exact thousands of times???

As far as I know, String or Vec with variable length is likely to reallocate many times, but what about that with fixed length like "a long str... maybe thousands of bytes".to_string() does allocate once?
When, personally speaking, could I use String fearlessly...
:frowning:

I'm not sure what you are trying to do. You can't hand out references of arbitrary lifetime (chosen by the caller) when they point to data owned to your YieldS struct. You could only hand out references tied to the lifetime of your owner, YieldS, but that's not possible to do with an Iterator because the signature of that trait doesn't allow for a relation between the lifetime of self and the lifetime of Item.

What you could do is make a separate Iter type that borrows from the owning type, like this:

// This fails
use std::marker::PhantomData;

fn main() {
    let s = S { s: std::str::from_utf8(&[65]).unwrap(), };
    println!("{:?}", s);
    let yield_s = YieldS { vec: (65..70).collect() };
    yield_s.iter().map(|S { s }| println!("{}", s)).last();
}

#[derive(Debug, Clone)]
struct S<'a> {
    s: &'a str,
}

struct YieldS {
    vec: Vec<u8>,
}

struct Iter<'a> {
    yields: &'a YieldS,
    pos: usize,
}

impl<'a> Iterator for Iter<'a> {
    type Item = S<'a>;

    fn next(&mut self) -> Option<Self::Item> {
        if self.pos < self.yields.vec.len() {
            self.pos += 1;
            Some(S::from_bytes(&self.yields.vec[self.pos - 1..self.pos]))
        } else {
            None
        }
    }
}

impl YieldS {
    fn iter(&self) -> Iter {
        Iter { yields: self, pos: 0 }
    }
}

impl<'a> S<'a> {
    pub fn from_bytes(b: &'a [u8]) -> S<'a> {
        S { s: std::str::from_utf8(b).unwrap(), }
    }
}
1 Like

You know that the string won't change while someone's got a reference to it. All you have to do is convince Rust that it won't change, which you do by changing YieldS::vec from a Vec<u8> to a reference:

struct YieldS<'a> {
    vec: &'a[u8],  // data can't change while this non-mut reference exists
    pos: usize,
    pht: PhantomData<&'a str>,
}

With that change, the example works as you originally wrote it: Rust Playground

2 Likes

Wow... You enlighten me! Thx :heart:

Much Thx :slight_smile:
This is straighforward.
But one more question about the lifetime:
see the playground

fn main() {
    let yield_s = YieldS { vec: &(65..70).collect::<Vec<u8>>(),
                           pos: 0,
                           chk: [0; 10], };
    yield_s.map(|s| println!("{:?}", s)).last();
}

#[derive(Debug, Clone)]
struct S<'a> {
    s: &'a str,
    i: u8,
}

struct YieldS<'a> {
    vec: &'a [u8],
    pos: usize,
    chk: [u8; 10], // chk is related to vec, but let's skip the irrelevant details
}

impl<'a> Iterator for YieldS<'a> {
    type Item = S<'a>;

    fn next(&mut self) -> Option<Self::Item> {
        if self.pos < self.vec.len() {
            self.pos += 1;
            let a = self.to_s(); // right lifetime parameter
            // Some(a) // failed due to lifetime, if only return this line
            // Some(self.to_s()) // failed due to lifetime, if only return this line
            Some(Self::from_bytes(&self.vec[self.pos - 1..self.pos])) // pass
        } else {
            None
        }
    }
}

impl<'a> YieldS<'a> {
    pub fn from_bytes(b: &'a [u8]) -> S { // pass, if only call this fn
        S { s: std::str::from_utf8(b).unwrap(),
            i: 0, }
    }

    pub fn to_s(&'a self) -> S<'a> { // pass, if only call this fn
        S { s: std::str::from_utf8(&self.vec[self.pos - 1..self.pos]).unwrap(),
            i: outer_op(&self.chk), }
    }
}

// some common but complicated ops in my crate
// here is a simplified version
fn outer_op(chk: &[u8]) -> u8 { chk[0] }

Both failed cases throw:

error[E0495]: cannot infer an appropriate lifetime for autoref due to conflicting requirements
  --> src/main.rs:28:23
   |
28 |             Some(self.to_s()) // failed, if only return this line
   |                       ^^^^
   |
note: first, the lifetime cannot outlive the anonymous lifetime defined on the method body at 23:13...
  --> src/main.rs:23:13
   |
23 |     fn next(&mut self) -> Option<Self::Item> {
   |             ^^^^^^^^^
note: ...so that reference does not outlive borrowed content
  --> src/main.rs:28:18
   |
28 |             Some(self.to_s()) // failed, if only return this line
   |                  ^^^^
note: but, the lifetime must be valid for the lifetime `'a` as defined on the impl at 20:6...
  --> src/main.rs:20:6
   |
20 | impl<'a> Iterator for YieldS<'a> {
   |      ^^
note: ...so that the types are compatible
  --> src/main.rs:23:46
   |
23 |       fn next(&mut self) -> Option<Self::Item> {
   |  ______________________________________________^
24 | |         if self.pos < self.vec.len() {
25 | |             self.pos += 1;
26 | |             let a = self.to_s(); // right lifetime parameter
...  |
33 | |         }
34 | |     }
   | |_____^
   = note: expected `Iterator`
              found `Iterator`

Well, I managed to figure it out, and the issue seems solved for now :

    pub fn to_s(&self) -> S<'a> {
        S { s: std::str::from_utf8(&self.vec[self.pos - 1..self.pos]).unwrap(),
            i: outer_op(&self.chk), }
    }

playground

Sometimes, lifetime in Rust feels so unobvious the moment I thought I had grasped it.

pub fn from_bytes(b: &'a [u8]) -> S
pub fn to_s(&self) -> S<'a>

Well, that's exactly the point of lifetime annotations. fn to_s(&'a self) -> S<'a> means that the lifetime inside S will be bound to the lifetime of the YieldS instance. Again, it's wrong for the very same reason I stated earlier.

You don't want to tie the return value to the lifetime of the YieldS, because (now that you changed it from owning to borrowing) it might go out of scope well before the borrow of lifetime 'a ends. So you simply want to assert that no matter what the lifetime of self is, the returned reference has a lifetime of 'a, i.e. a lifetime which is the same as that of the original borrow. This is what fn to_s(&self) -> S<'a> means: self has no explicit lifetimes, so it gets an implicit, fresh lifetime variable, which is not the same as 'a.

This is the exact same issue that I mentioned above. In particular, fn to_s<'a>(&'a self) -> S<'a> means exactly the same thing as fn to_s(&self) -> S due to lifetime elision. Given that lifetime elision produces working and correct lifetime annotations most of the time, in the rare cases when it does not, the solution is not to make the incorrect implicit ones into explicit, hand-written annotations. That changes nothing about the meaning of the code. In particular, you rarely, if ever, can solve lifetime problems by annotating &'a self. That's usually a mistake.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.