REview this small prog please

Dear All,
This is far fom pretty, even for my first bit of rustling. My little snippet reads some lines from a file and creates a collection of structs representing each record.

I know the logic is not pretty, but my approach was chosen to help me explore certain aspects of rust, not to create an efficient parser :slight_smile:

Please can you review the rust aspects of this? What could I improve and why?
In particular, I really don't like the nest of if/for/if/if starting with if let Ok(lines) ... somethng really feels wrong there.

use regex::Regex;
use std::fs::File;
use std::io::{self, BufRead};
use std::path::Path;

fn main() {
    let path = Path::new("./Data/cameras-defb.csv");
    let display = path.display();
    println!("Reding file: {}", display);
    let mut v: Vec<Camera> = Vec::new();

    if let Ok(lines) = read_lines(path) {
        for line in lines {
            if let Ok(l) = line {
                let fields: Vec<&str> = l.split(';').collect();
                if fields.len() == 3 && fields.get(0).unwrap().trim() != "Camera".to_string() {
                    let cam: Camera = Camera::from_vector(fields);
                    cam.dump();
                    v.push(cam);
                }
            }
        }
    }
}

fn read_lines<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>>
where
    P: AsRef<Path>,
{
    let file = File::open(filename)?;
    Ok(io::BufReader::new(file).lines())
}

struct Camera {
    id: u16,
    name: String,
    lat: f32,
    lng: f32,
}

impl Camera {
    fn from_vector(fields: Vec<&str>) -> Camera {
        let re = Regex::new("(\\d{3})").unwrap();
        let first_field = fields.get(0).unwrap().to_string();
        let newid: u16 = match re.find(&first_field) {
            Some(x) => first_field[x.start()..x.end()].parse().unwrap(),
            _ => 0,
        };

        Camera::from_fields(
            newid,
            fields.get(0).unwrap().to_string(),
            fields.get(1).unwrap().to_string().trim().parse().unwrap(),
            fields.get(2).unwrap().to_string().trim().parse().unwrap(),
        )
    }
}

impl Camera {
    fn from_fields(id: u16, name: String, lat: f32, lng: f32) -> Camera {
        Camera {
            id: id,
            name: name,
            lat: lat,
            lng: lng,
        }
    }
}

impl Camera {
    fn dump(&self) -> () {
        println!(
            "{} | {} | {} | {}",
            self.id.to_string(),
            self.name,
            self.lat.to_string(),
            self.lng.to_string()
        );
    }
}

You could start by using cargo clippy :wink:


I would probably inline the definition of display, i.e. just write

println!("Reading file: {}", path.display());

I’m feeling like the error “handling” of just ignoring all the errors that you’re doing with the if let is not optimal. You can use a Result-returning main function if you like and just propagate the errors with ?. Or you could handle them if there’s a sensible way to do so. Silently ignoring them feels bad though.

I guess the same holds for the if statement checking fields.len() == 3 && fields.get(0).unwrap().trim() != "Camera", though I’m not quite sure what this case of the field being "Camera" means in the context of your program. On the other hand, Camera::from_vector uses unwrap a lot; in this case it seems more appropriate to make Camera::from_vector return a Result-type, too. Error handling in Rust is not trivial if you want to deal with different error types. For an application such as your program, using anyhow might be a straightforward approach.

I haven’t used anyhow myself yet, but I guess this (playground link) is how it could be used…

(I fixed all the remaining clippy lints in the playground, too; there’s no point in explaining those, clippy does a good job on that by itself.)

I also changed things for the regex in that playground. You’ll usually want to make sure that regexes aren’t re-built unnecessarily. The typical solution is to use a lazily initialized static variable

static RE: Lazy<Regex> = Lazy::new(|| Regex::new(r"(\d{3})").unwrap());

This code also uses raw string literals to avoid the need for typing \\.

Your use of first_field[x.start()..x.end()] could be replaced by x.as_str().

In general, AFAIK it’s often good practice to only use unwrap in places where you know for some reason that the unwrapping will never fail, so in particular the .parse().unwrap() that you had in Camera::from_vector were problematic as they could make your program panic depending on the input. I left it in for the regex initialization and for parsing the digits after the regex already confirmed they’re digits. Actually… on the latter, the regex \d{3} is unicode-aware and supports digits other than [0-9]. I’m suspecting that for those, .parse() might still fail, I’m not quite sure though. So adding an a non-panicking error for the x.as_str().parse().unwrap() or changing the regex to use [0-9]{3} might be a sensible thing to do.

2 Likes

So here’s a list of all the characters that match the \d regex:

List of Unicode Characters of Category “Decimal Number”

Edit: Looked it up, seems like .parse() indeed only supports ASCII digits.

Thank you!
I'll read it more in depth when I'm properly awake (it's 05:20 and I couldn't sleep).
For now I've pretty much ignored error handling, just to get a grip on some of the other basics, but naturally I will be adding that in.
Re the len() == 3 and the "Camera", don't worry about the logic :slight_smile: I deliberately didn't include the data file to avoid people popping up with better ways to structure the logic and the parsing, for now, I need to focus on the bare rust and improve the problem solving approach in a later post.

I'll follow your suggestions this evening and post an update.

Thanks again
Tony

It's just a detail: I would merge the three impl Camera into one impl Camera.

2 Likes

I've had some time to alter my code a bit more then came back and read your reply in full. Thank you for the playground link, I looked at that after having made my changes and I'm glad I did it that way around as I've learned more along the way :slight_smile:

I've not made all the changes you've suggested just yet, partly because I want to re-read the chapter on error handling again, with me being one of those annoying people that likes to understand what I'm doing rather than just blindly applying advice.

In my current version, I've also tidied up the logic a bit. I still have a long way to go before I'm happy with this, but I'm really pleased with how much I'm learning!

The story so far... no need to comment, just added for info. Remember, I've not yet applied all the suggestions.

use lazy_static::lazy_static;
use regex::Regex;
use std::fs::File;
use std::io::{self, BufRead};
use std::path::Path;

fn main() {
    let path = Path::new("./Data/cameras-defb.csv");
    println!("Reding file: {}", path.display());
    let mut v: Vec<Camera> = Vec::new();

    for line in read_lines(path).unwrap().flatten().filter(|x| x.starts_with("UTR")) {
        let fields: Vec<&str> = line.split(';').collect();
            let cam: Camera = Camera::from_vector(fields);
            cam.dump();
            v.push(cam);
    }
}

fn read_lines<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>>
where
    P: AsRef<Path>,
{
    let file = File::open(filename)?;
    Ok(io::BufReader::new(file).lines())
}

struct Camera {
    id: u16,
    name: String,
    lat: f32,
    lng: f32,
}

impl Camera {
    fn from_vector(fields: Vec<&str>) -> Camera {
        lazy_static! {
            static ref RE: Regex = Regex::new(r"[0-9]{3}").unwrap();
        }

        let first_field = fields.get(0).unwrap().to_string();
        let newid: u16 = match RE.find(&first_field) {
            Some(x) => x.as_str().parse().unwrap(),
            _ => 0,
        };

        Camera::from_fields(
            newid,
            fields.get(0).unwrap().to_string(),
            fields.get(1).unwrap().parse().unwrap_or(0.0), 
            fields.get(2).unwrap().parse().unwrap_or(0.0),
        )
    }

    fn from_fields(id: u16, name: String, lat: f32, lng: f32) -> Camera {
        Camera { id, name, lat, lng }
    }
}

impl Camera {
    fn dump(&self) {
        println!(
            "{} | {} | {} | {}",
            self.id.to_string(),
            self.name,
            self.lat.to_string(),
            self.lng.to_string()
        );
    }
}

I'd be tempted to use an impl Trait here to avoid exposing the type of the iterator. It's not very important given that this isn't a public API, but it makes your code easier to read and understand.

fn read_lines<P>(filename: P) -> io::Result<impl Iterator<Item=String>>
where
    P: AsRef<Path>,
{

I'm not sure what the precise Item is that you need, and that is kind of the point. Your function doesn't say what it does in the type, unless you know what io::Lines is. In top of that, the function exposes its implementation in its type, which would be a serious issue in a public API, but is just a little ugly here.

1 Like