Stack of string iterators

I've been toying around with some simple macro expansions (as in strings, not language), and I encountered a situation where I end up with a lifetime issue that I don't know how to solve.

I want to do iterative/recursive macro expansions; each time I encounter a macro I expand it but I also keep parsing the expanded text for new macros.

I'm trying to use a stack (Vec<Chunk>) where each node keeps track of:

struct Chunk {
  text: String,          // The text segment
  it: std::String::Drain // Current offset in segment
}

The Drain iterator seems like an interesting fit for what I want to do. So each time I encounter a new macro I add a new Chunk node to the stack and process it, once its drain iterator returns None I remove the node and move back to the previous node.

The problem is that std::String::Drain requires an explicit lifetime. I understand why, but how do I tell it "It's ok, the buffer you are referring to lives in the same struct"?

Edit: Turns out this is more complicated than I thought. rust - Why can't I store a value and a reference to that value in the same struct? - Stack Overflow

Perhaps you could get away with something like this?

1 Like

You can handle your owned+iterator requirements storing an index into the text and implementing Iterator on Chunk itself like this.

struct Chunk {
  text: String,
  i: usize,
}

impl<'a> std::iter::Iterator for Chunk {
    type Item = char;

    fn next(&mut self) -> Option<char> {
        // Get the chars iterator from the `i`
        // and currently remaining # bytes.
        let (rest_len, mut chs) = {
            let rest = &mut self.text[self.i..];
            (rest.len(), rest.chars())
        };
        // Extract a single character and
        // get the resulting remaining # bytes.
        let ch = chs.next();
        let remaining_len = chs.as_str().len();
        ch.map(|ch| {
            // Increment by # bytes traversed.
            self.i += rest_len - remaining_len;
            ch
        })
    }
}

This avoids any intermediate heap allocations, but it does do a unicode boundary validation check every time we slice with self.i. If that's a performance concern, then you can use the unsafe get_unchecked instead since our indexes are safely constructed.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.