File Parsing With Enums, Abuse?

I've been learning Rust by building a simple file parser. I'm not using ASTs because I think it's a simple enough syntax to just build the final result while parsing.

There are a few different types of "blocks" which I've modeled with an enum. I have several structs to represent the data because they're not all exactly the same.

  1. Given the below example code, is this a bad use of enum?
  2. Is there a more elegant way?
  3. Does all the pattern matching and if let statements get optimized out for enum variants, being more performant than it looks?
enum Block {
  TypeA(BlockData),
  TypeB(BlockData),
  TypeC(SlightlyDifferentBlockData)
}

struct BlockData {
  key: String,
  settings: Vec<String>
}

struct SlightlyDifferentBlockData {
  key: String,
  settings: Vec<String>,
  modifier: usize
}

This has worked okay so far. However, as the program's gotten more complex I notice the enum usage is causing deeply nested conditionals and pattern matching just to get the value.

For example, to compare blocks from two files I'm doing something like:

let blocks1: Vec<Block> = parse("./path/to/file1");
let blocks2: Vec<Block> = parse("./path/to/file2");

for block_a in blocks1 {
  if let Block::TypeA(a) = block_a {
    for block_b in blocks2 {
      if let Block::TypeA(b) = block_b {
        // finally compare inner data for `a` and `b`
        if a.key = b.key {
          // ...
        }
      }
    }
  }
}

Additionally, to set a property on the inner value:

impl Block {
  fn set_settings(&mut self, input_settings: Vec<String>) {
    match self {
      Block::TypeA(BlockData {settings, ..})
      | Block::TypeB(BlockData {settings, ..})
      | Block::TypeC(BlockData {settings, ..})
      // ... keep going for all variants that have `settings`
       => {*settings = input_settings;}
     _ => ()
    };
  }
  fn set_key(&mut self, value: String) {
    // pretty much repeat the previous function, but for `key`
  }
}

This looks a little messy, I think you're right to question it. I'm afraid I don't know the best alternative.

It depends on why TypeA and TypeB are needed when their data inside is the same. If it's only to help parse two different representations of the same conceptual thing, then something feels off.
But if they are conceptually different (and you would later rely on being able to tell them apart) then I think you're approach is good.

My guess is that if let won't hurt performance.

The alternative to enums is probably something using trait objects, and that would probably slow it down a little bit.

To clean it up, you could try to use iterators. I think you could get somewhere with map and zip that is a bit cleaner. Oh hang on! Add methods on the Block enum to get key, setting and modifier (Option). I think that would help a lot too :slight_smile:

For the impls, I don't see any way that avoids a good long match. That's the way to go I think.

Turn the problem around 90 degrees:

struct Block {
  key: String,
  settings: Vec<String>,
  type: BlockType,
}

enum BlockType {
  TypeA,
  TypeB,
  TypeC { modifier: usize },
  // Or you can use a separate struct:
  TypeD(DataForTypeD),
  // ...
}

struct DataForTypeD {
  // ...
}
3 Likes

Sometimes it might, like in your set_settings, but the only real way to know for sure is to compile it and see. @jdahlstrom's suggestion looks promising. You can also reduce nesting by making methods to abstract away your matching and e.g. fn modifier(&self) -> Option<usize> or the like.

For your nested loop you could do something like

let blocks1: Vec<Block> = parse("./path/to/file1");
let blocks2: Vec<Block> = parse("./path/to/file2");

// This could be a method on `Block` 
let filter = |block| if let Block::TypeA(a) = block { Some(a) } else { None };

for block1 in blocks1.iter().filter_map(filter) {
  for block2 in blocks2.iter().filter_map(filter).filter(|b| block1.key == b.key) {
    // ...
  }
}

IterTools also has a cartesian_product function.

Thanks for all the responses!

@drmason13

These variants happen to reuse the same struct, but they may be treated differently when parsing depending on the outer type.

e.g. A ':' char may mean something different to TypeA than it does for TypeB

I thought about this, but it seems like traits are more meant for behavior. I wish I could share state the same way, without writing a bunch of "get/set" methods for each field.

Would be cool to instead of this
"loop through all instances with some trait"
do this
"loop through all instances with some field"

Sorry, off topic..

Thanks for the suggestion, I'll look into this option as well.

What do you mean?

@jdahlstrom
I thought about that, but your example brings more clarity. Sometimes each inner struct might have additional fields, but I still want to loop through them and assign to the common fields they share. With your example though it looks like that would be accounted for.

@quinedot
Thanks for the example on abstracting the nesting, that does look cleaner. I feel like it'd be a bit slower, but you're right, I really need to learn Assembly to compare.

I followed the link and read what it does, and the example... Still I'm lost on what this would do with that nested loop :sweat_smile:

These do pretty much the same thing:

for x in iter1 {
    for y in iter2.clone() {
        let x = x.clone();
        do_stuff(x, y);
    }
}
for (x, y) in iter1.cartesian_product(iter2) {
    do_stuff(x, y);
}

Applied to my earlier comment, it would go something like:

// n.b. both cloning one of these iterators and cloning the references
// which they produce are negligibly cheap
let iter1 = blocks1.iter().filter_map(filter);
let iter2 = blocks2.iter().filter_map(filter);
let pairs = iter1.cartesian_product(iter2).filter(|&(b1, b2)| b1.key == b2.key);

for (block1, block2) in pairs {
    // ...
}

(Maybe modulo some reference manoeuvring, I didn't test run it.)

Wow, that looks pretty neat; Thanks for clarifying. These look like some good options along with the above suggestions. Much to think about.

I was thinking something like this: Rust Playground