A bit of help with this parser?

So I'm trying to port the RISC-V parse_opcodes parse_inputs function into Rust so I can incorporate it into a build system without involving Python (plus I get to learn too). I'm running into this weird problem where the hi and lo variables are never read, but the Python code does something that Rust I know wouldn't allow, so I've had to compensate for that. I'm also (somehow) shifting to the point where I overflow, which also shouldn't be happening. The code where this falls apart follows (original is first in Python):

      for token in tokens[1:]:
        if len(token.split('=')) == 2:
          tokens = token.split('=')
          if len(tokens[0].split('..')) == 2:
            tmp = tokens[0].split('..')
            hi = int(tmp[0])
            lo = int(tmp[1])
            if hi <= lo:
              sys.exit("%s: bad range %d..%d" % (name,hi,lo))
          else:
            hi = lo = int(tokens[0])

          if tokens[1] != 'ignore':
            val = int(tokens[1], 0)
            if val >= (1 << (hi-lo+1)):
              sys.exit("%s: bad value %d for range %d..%d" % (name,val,hi,lo))
            mymatch = mymatch | (val << lo)
            mymask = mymask | ((1<<(hi+1))-(1<<lo))

          if cover & ((1<<(hi+1))-(1<<lo)):
            sys.exit("%s: overspecified" % name)
          cover = cover | ((1<<(hi+1))-(1<<lo))

        elif token in arglut:
          if cover & ((1<<(arglut[token][0]+1))-(1<<arglut[token][1])):
            sys.exit("%s: overspecified" % name)
          cover = cover | ((1<<(arglut[token][0]+1))-(1<<arglut[token][1]))
          arguments[name].append(token)

        else:
          sys.exit("%s: unknown token %s" % (name,token))

      if not (cover == 0xFFFFFFFF or cover == 0xFFFF):
        sys.exit("%s: not all bits are covered" % name)

      if pseudo:
        pseudos[name] = 1
      else:
        for name2,match2 in match.items():
          if name2 not in pseudos and (match2 & mymask) == mymatch:
              sys.exit("%s and %s overlap" % (name,name2))

And I've done my best to idiomatically convert it to Rust and that follows:

            for token in tokens[1..].iter() {
                let (mut hi, mut lo) = (0u32, 0u32);
                if token.split('=').count() == 2 {
                    let tokens = token.split('=').collect::<Vec<_>>();
                    if tokens[0].split("..").count() == 2 {
                        let tmp = tokens[0].split("..").collect::<Vec<_>>();
                        hi = tmp[0].parse::<u32>().unwrap();
                        lo = tmp[1].parse::<u32>().unwrap();
                        if hi <= lo {
                            eprintln!("{}: bad range {}..{}", name, hi, lo);
                            exit(1)
                        }
                    } else {
                        hi = tokens[0].parse::<u32>().unwrap();
                        lo = tokens[0].parse::<u32>().unwrap();
                    }
                    if tokens[1] != "ignore" {
                        let val = tokens[1].parse::<u32>().unwrap();
                        if val >= (1 << (hi - lo + 1)) {
                            eprintln!("{}: bad value {} for range {}..{}", name, val, hi, lo);
                            exit(1)
                        }
                        my_match |= val << lo;
                        my_mask |= (1 << (hi + 1)) - (1 << lo);
                    }
                    if cover & ((1 << (hi + 1)) - (1 << lo)) > 0 {
                        eprintln!("{}: overspecified", name);
                        exit(1)
                    }
                    cover |= (1 << (hi + 1)) - (1 << lo);
                } else if arglut.contains_key(token) {
                    if cover & ((1 << (arglut[token].0 + 1)) - (1 << arglut[token].1)) > 0 {
                        eprintln!("{}: overspecified", name);
                        exit(1)
                    }
                    cover |= (1 << (arglut[token].0 + 1)) - (1 << arglut[token].1);
                    arguments
                        .entry(token.to_string())
                        .and_modify(|v| v.push(token.to_string()));
                } else {
                    eprintln!("{}: unknown token {}", name, token);
                    exit(1)
                }
            }
            if !(cover == 0xFFFFFFFF || cover == 0xFFFF) {
                eprintln!("{}: not all bits are covered", name);
                exit(1)
            }

I'm not really sure what I'm doing wrong, and I could probably do this a lot better with something like a parser combinator or PEG (but I'm not really sure how I would do that at the momentand I'm horrible with parser combinators). What am I missing? I've tried moving the hi and lo variables to the top of the loop because they should be being read, but that doesn't appear to change anything, and Clippy doesn't raise any red flags.
Edit: fixed spelling.

If you just mean this:

warning: value assigned to `hi` is never read
 --> src/lib.rs:8:18
  |
8 |         let (mut hi, mut lo) = (0u32, 0u32);
  |                  ^^
  |
  = note: `#[warn(unused_assignments)]` on by default
  = help: maybe it is overwritten before being read?

Then as the help says, it's just because the compiler can tell you never read those values you initialized with before overwriting them.

  • You only read them in the first if branch
  • In that branch, there's another if/else, and you set the values without reading them in both branches (before any reads)
  • You also never modify them after that

So you can

-        let (mut hi, mut lo) = (0u32, 0u32);
         if token.split('=').count() == 2 {
+            let (hi, lo);
             let tokens = token.split('=').collect::<Vec<_>>();

to get rid of that warning.

Hard to say what's going on with your shifts without knowing more about the values involved. It wouldn't surprise me if Python and Rust differ in default behavior. Rust's is sort of odd IMO: If you try to << a u32 by 34 for example, you'll shift by 2 (34 % 32) in release mode (panic in debug mode). Maybe you need lhs.checked_shl(rhs).unwrap_or(0) or whatever Python's behavior is.

1 Like

In python3 int has variable precision so you can lshift it whatever amount until the process die on OOM.

6 Likes

Figured it out -- had to widen the integers to 64/128-bits to get it to work. But it works now -- thanks for the help.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.