Using yaml_rust to parse a simple file

#1

Hello,

I am new to rust and new to this forum. As a starting learning project I am making a simple CFD solver in rust. I decided to use Yaml for the input file format like this:

nodes:
 1: [1.0, 2.0, 3.0]
 2: [4.0, 3.0, 2.0]
 3: [5.0, 4.0, 3.0]
 4: [6.0, 3.0, 2.0]

I want to load this into a vector of structs like below:

[
    Point {
        id: 1,
        x: 1.0,
        y: 2.0,
        z: 3.0
    },
    Point {
        id: 2,
        x: 4.0,
        y: 3.0,
        z: 2.0
    },
    Point {
        id: 3,
        x: 5.0,
        y: 4.0,
        z: 3.0
    },
    Point {
        id: 4,
        x: 6.0,
        y: 3.0,
        z: 2.0
    }
]

After much time banging my head against the wall I came up with this working code:

use yaml_rust::{Yaml, YamlLoader};
use yaml_rust::{Yaml, YamlLoader};

#[derive(Debug)]
struct Point {
    id: i64,
    x: f64,
    y: f64,
    z: f64
}

fn parse_deck(doc: &Yaml) -> Vec<Point> {
    let mut res = Vec::new();
    match *doc {
        Yaml::Hash(ref h) => {
            for (k, v) in h {
                match k.as_str() {
                    Some("nodes") => match v {
                        Yaml::Hash(ref g) => {
                            for (nid, xyz) in g {
                                let id = nid.as_i64().unwrap();
                                let w = match xyz {
                                    Yaml::Array(ref v) => v,
                                    _ => panic!("malformed input file: coordinates should be floating point numbers")
                                };
                                let t = w.iter().map(|x| x.as_f64().unwrap()).collect::<Vec<f64>>();
                                let p = Point {id:id, x:t[0], y:t[1], z:t[2]};
                                res.push(p);
                            }
                        },
                        _ => panic!("malformed input file: format should be [node-id, x, y, z]")
                    },
                    _ =>  panic!("malformed input file: unsuported section.")
                };
            }
        }
        _ => {
            panic!("malformed input file: top level must be in key: value format");
        }
    };
    res
}


fn main() {
    let s = "
nodes:
 1: [1.0, 2.0, 3.0]
 2: [4.0, 3.0, 2.0]
 3: [5.0, 4.0, 3.0]
 4: [6.0, 3.0, 2.0]
";
    let docs = YamlLoader::load_from_str(&s).unwrap();
    let doc = &docs[0];
    dbg!(parse_deck(&doc));

}

I am not at all happy with this code. There has got to be a more elegant way of doing this. Can you give me some pointers in the right direction?

EDIT: I streamlined my working example a bit to make it shorter and clearer.

1 Like

#2

If panicking if the way you want to treat the errors, I think expect() is something you should look at. That would avoid having so many nested blocks. By the way, following your logic, you should match the result of as_i64(). Or use expect:slight_smile:

0 Likes

#3

Thanks. Good idea with the expect. I do intend to use Result<T,Error> in the final version. And do proper error handling.

0 Likes

#4

To de-serialize a file a good starting point is the lib serde https://serde.rs/
From doc:
Serde is a framework for ser ializing and de serializing Rust data structures efficiently and generically.

The Serde ecosystem consists of data structures that know how to serialize and deserialize themselves along with data formats that know how to serialize and deserialize other things. Serde provides the layer by which these two groups interact with each other, allowing any supported data structure to be serialized and deserialized using any supported data format.

So all your tests and errors management would be significantly shorter.

0 Likes

#5

I looked into using serde_yaml. I could see that would be very straight forward if the input format mirrored the point struct more closely, like so:

Point:
    id: 1
    x: 1.0
    y: 2.0
    z: 3.0
Point:
    id: 2
    x: 4.0
    y: 3.0
    z: 2.0
...

Would serde be beneficial even if I keep the input format as is?

0 Likes

#6

Yes, if you absolutely need to keep the format it may be more complicated.
I see two approaches

  1. When the file struct is free. I start by creating Rust struct that fit my needs then create write function and see how serde serialize. It’s a bit a lazy way but it works fine for me
  2. When the file struct is fixed may be I would try to use intermediary struct and convert it to the final struct using Rust Trait to convert a type from to another.
1 Like

#7

Super. Thanks for this advise. In my case I am much more committed to the struct than to the file format. So I am in case 2 for sure.

However, I wanted to follow through on my original idea first through. After the project is complete the file format will be the user interface to the software. So it is important that it is user friendly. And there can be millions of nodes so it is an advantage if the file format is relatively compact. But I digress.

0 Likes

#8

Hey again!

I followed the advice of @kurdy and switched to serde_yaml. I created a data structure adapted to my input format and then did a conversion from this format to my vector of points. The end result is much simpler and cleaner than before. Thank you very much!

use std::collections::HashMap;
use serde::{Deserialize};

#[derive(Debug)]
struct Point {
    id: i64,
    x: f64,
    y: f64,
    z: f64
}

#[derive(Debug,Deserialize)]
struct Inputdeck {
    nodes: HashMap<i64,(f64,f64,f64)>,
}

impl Inputdeck {
    fn to_points(&self) -> Vec<Point> {
        let mut res = Vec::new();
        for (k, v) in &self.nodes {
            let (x, y, z) = *v;
            let id = *k;
            let p = Point {id, x, y, z};
            res.push(p);

        }
        res.sort_by_key(|x| x.id);
        res
    }
} 

fn main() {
    let s = "
nodes:
 1: [1.0, 2.0, 3.0]
 2: [4.0, 3.0, 2.0]
 3: [5.0, 4.0, 3.0]
 4: [6.0, 3.0, 2.0]
";

    let deck: Inputdeck  = serde_yaml::from_str(&s).unwrap();
    dbg!(deck.to_points());

}
1 Like

#9

Two more advices. First, you don’t have to push each value into the vector - you can build it directly from the input vector using iterators. Second, you’d probably want to consume Inputdeck when converting it into points, so that you won’t run into borrow checker issues later on.

I.e., I’d use the following:

impl Inputdeck {
    fn into_points(self) -> Vec<Point> {
        let mut res: Vec<_> = self.nodes
            .into_iter()
            .map(|(id, (x, y, z))| { Point { id, x, y, z } })
            .collect();
        res.sort_by_key(|x| x.id);
        res
    }
} 
1 Like

#10

Very nice with the map. I will use that. Also, now I finally understand the difference between .iter() and .into_iter().

1 Like

#11

@davidosterberg Or you could have written your own deserialize function to directly deserialize it into Point without .to_points(). Example of custom deserializing for chrono time can be seen in https://serde.rs/custom-date-format.html

0 Likes

#12

The more idomatic way in Rust would be to implement the From-trait, e.g.

impl From<Inputdeck> for Vec<Point> {}

playground

1 Like

#13

Thanks, missed this (tried to optimize the existing code, but yes, it’ll be more clear this way).

0 Likes