Parsing various ranges from &str and creating Iterators

Hello :upside_down_face:,
I'm currently writing a PDF commandline tool and for it I'd like to allow ranges patterns as

  • 3..15 for pages from 3 to 14 (as well as 3.., ..15, ..)
  • 3.2.15 for pages from 3 to 14 with step 2 (as well as 3.2., .2.15, .2.)
    and furthermore
  • 3.2.15*3 for pages from 3 to 14 with step 2 with each 3 pages (as well as all the others).

This shall enable me to write stupdf example.pdf addpage 3.2.15*3 meaning

add 3 blank pages to each index of 3..15.step_by(2) of the pdf example.pdf .

Now the problem I'm having is, I don't know how to exactly implement the parsing and the resulting Iterator.

use std::ops::{Range, RangeFrom, RangeTo, RangeFull};

enum Ranges {
    One(u32),
    RNormal(Range<u32>),
    RFrom(RangeFrom<u32>),
    RTo(RangeTo<u32>),
    RFull(RangeFull),
}

struct PageIter {
    range: Ranges,
    step: u32,
    amount: u32,
}

impl PageIter {
    fn iter(&self) -> impl Iterator<Item = (u32, u32)> { //yields an error since the types are not the same
        match self.range {
            Ranges::One(i) => std::iter::once((i, self.amount)),
            Ranges::RNormal(r) => r.step_by(self.step).map(|i| (i, self.amount)),
            Ranges::RFrom(r) => r.step_by(self.step).map(|i| (i, self.amount)),
            Ranges::RTo(r) => r.step_by(self.step).map(|i| (i, self.amount)),
            Ranges::RFull(r) => r.step_by(self.step).map(|i| (i, self.amount)),
        }
    }
}

impl std::str::FromStr for PageIter {
    type Err = Error; //TODO change Error Type

    /// Parses &str in the following form: 'a.b.c*n' where it stands for '(a..c).step_by(b)' and amounts 'n'. Every value can be omitted, but '*' likewise
    fn from_str(s: &str) -> std::result::Result<Self, Self::Err> {
        match &s.split('*').collect::<Vec<&str>>()[..] {
            [rangestr, amountstr] => {
                let amount = match amountstr.parse::<u32>() {
                    Ok(i) if i > 0 => i,
                    _ => return Err(Error::Parse {offset: 0}),
                };
                let (range, step) = if let Some((p, s)) = parse_range_and_step(rangestr) { (p, s) } else { return Err(Error::Parse {offset: 0}) };
                Ok(PageIter{ range, step, amount })
            },
            [rangestr] => {
                let (range, step) = if let Some((p, s)) = parse_range_and_step(rangestr) { (p, s) } else { return Err(Error::Parse {offset: 0}) };
                Ok(PageIter{ range, step, amount: 1 })
            }
            _ => Err(Error::Parse {offset: 0})
        }
    }
}

fn parse_range_and_step(s: &str) -> Option<(Ranges, u32)> {
    match &s.split('.').collect::<Vec<&str>>()[..] {
        ["", stepstr, ""] => Some((
            Ranges::RFull(..),
            if stepstr == &"" { 1 } else if let Ok(i) = stepstr.parse::<u32>() { i } else { return None }
        )),
        [begstr, stepstr, ""] => {
            let beg = if let Ok(i) = begstr.parse::<u32>() { i } else { return None };
            Some((
                Ranges::RFrom(beg..),
                if stepstr == &"" { 1 } else if let Ok(i) = stepstr.parse::<u32>() { i } else { return None }
            ))
        },
        ["", stepstr, endstr] => {
            let end = if let Ok(i) = endstr.parse::<u32>() { i } else { return None };
            Some((
                Ranges::RTo(..end),
                if stepstr == &"" { 1 } else if let Ok(i) = stepstr.parse::<u32>() { i } else { return None }
            ))
        },
        [begstr, stepstr, endstr] => {
            let beg = if let Ok(i) = begstr.parse::<u32>() { i } else { return None };
            let end = if let Ok(i) = endstr.parse::<u32>() { i } else { return None };
            Some((
                Ranges::RNormal(beg..end),
                if stepstr == &"" { 1 } else if let Ok(i) = stepstr.parse::<u32>() { i } else { return None }
            ))
        },
        [numstr] => Some((Ranges::One(if let Ok(i) = numstr.parse::<u32>() { i } else { return None }), 1)),
        _ => return None
    }
}

I hope to reduce and simplify the parsing bit of the code, I don't know if the enum Ranges is necessary and I'm not sure how to build the Iterator yielding (u32, u32).

Any help would really be appreciated!

Okay, I have simplified everything a little and solved some problems. Any Tips on how to improve the code would be great. The new (and working) code is

struct PageIter {
    range: RangeIter,
    amount: u32,
}

enum RangeIter {
    Once(   std::iter::Once<u32>),
    RNormal(std::iter::StepBy<std::ops::Range<u32>>),
    RFrom(  std::iter::StepBy<std::ops::RangeFrom<u32>>),
}

impl Iterator for PageIter {
    type Item = (u32, u32);

    fn next(&mut self) -> Option<Self::Item> {
        match &mut self.range {
            RangeIter::Once(r) => Some((r.next()?, self.amount)),
            RangeIter::RNormal(r) => Some((r.next()?, self.amount)),
            RangeIter::RFrom(r) => Some((r.next()?, self.amount)),
        }
    }
}

impl std::str::FromStr for PageIter {
    type Err = Error; //TODO change Error Type

    /// Parses &str in the following form: 'a.b.c*n' where it stands for '(a..c).step_by(b)' and amounts 'n'. Every value can be omitted, but '*' likewise
    fn from_str(s: &str) -> std::result::Result<Self, Self::Err> {
        match &s.split('*').collect::<Vec<&str>>()[..] {
            [rangestr, amountstr] => Ok(PageIter{ 
                range:  if let Some(r) = parse_range_and_step(rangestr) { r } else { return Err(Error::Parse {offset: 0}) },
                amount: match amountstr.parse::<u32>() {
                    Ok(i) if i > 0 => i,
                    _ => return Err(Error::Parse {offset: 0}),
                }}
            ),
            [rangestr] => Ok(PageIter{
                range:  if let Some(r) = parse_range_and_step(rangestr) { r } else { return Err(Error::Parse {offset: 0}) },
                amount: 1 }
            ),
            _ => Err(Error::Parse {offset: 0})
        }
    }
}

fn parse_range_and_step(s: &str) -> Option<RangeIter> {
    match &s.split('.').collect::<Vec<&str>>()[..] {
        ["", stepstr, ""] => Some(
            RangeIter::RFrom(
                (0u32..).step_by(
                    if stepstr == &"" { 1usize } else if let Ok(i) = stepstr.parse::<usize>() { i } else { return None }
                )
            )
        ),
        [begstr, stepstr, ""] => Some(
            RangeIter::RFrom(
                ((if let Ok(i) = begstr.parse::<u32>() { i } else { return None })..).step_by(
                    if stepstr == &"" { 1usize } else if let Ok(i) = stepstr.parse::<usize>() { i } else { return None }
                )
            )
        ),
        ["", stepstr, endstr] => Some(
            RangeIter::RNormal(
                (0u32..(if let Ok(i) = endstr.parse::<u32>() { i } else { return None })).step_by(
                    if stepstr == &"" { 1usize } else if let Ok(i) = stepstr.parse::<usize>() { i } else { return None }
                )
            )
        ),
        [begstr, stepstr, endstr] => Some(
            RangeIter::RNormal(
                ((if let Ok(i) = begstr.parse::<u32>() { i } else { return None })..(if let Ok(i) = endstr.parse::<u32>() { i } else { return None }))
                    .step_by(
                        if stepstr == &"" { 1usize } else if let Ok(i) = stepstr.parse::<usize>() { i } else { return None }
                    )
            )
        ),
        [numstr] => Some(RangeIter::Once(std::iter::once(
            if let Ok(i) = numstr.parse::<u32>() { i } else { return None }
        ))),
        _ => return None
    }
}

Seems like the perfect candidate for using a regex (playground).

1 Like

Thank you very much for your help, your code feels a lot more rusty:) Also is the inline function is a neat way to simplify the code.

Just looking at it I got two more questions:

  • Shouldn't the lazy static be outside the function, such that it is not build every time?
  • Hasn't the Regex a litlle to much overhead for such a simple pattern? Or is it actually less than collecting and matching?

These questions are more general and not specifically bound to this program, since parsing a few commandline argument indeed doesn't need to much optimisation, whereas readibility is a lot more desirable.

I don't understand that concern. A lazy_static is initialized once, upon the first access. That is the whole point.

I have no idea. You could measure it though. But I doubt that the parsing of a 10-20 character pattern, however slow the regex might be, would make any difference when you are set out to process dozens of PDF pages.

1 Like

If the regex API is used correctly, then parsing the regex and constructing its corresponding pattern matcher is only ever done once, so I'm also not sure what overhead you're worried about. Regexes tend to pay for themselves whenever you want to do a large number of matches against the same non-trivial pattern. It's certainly very unlikely to have noticeably more overhead than all of the non-regex code you just wrote.

Plus, I wouldn't necessarily call this a "simple" pattern. It looks simple when written in regex (which shows why regex is a good DSL for pattern matching), but look how much code you had to write to achieve its behavior. A truly trivial regex would be something like "abc" or "a|b" or"a*", and for patterns like that there often is a str method that gets the same job done.

1 Like

Okay thank you both for your answers, it really helped me seeing how to handle such situations.

I was just asking about regex, because I once wrote a latex tool parsing with regex, but I always thought of it as just a simple but slow solution.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.