Proper Error propagation in Rust

I am trying to write a function that reads file content and return relevant data vector, but I am not sure how to return the error when the input file name is not found.

Not sure if the below code expression is correct, but I hope someone can make out, as I do not know how to handle error returning

pub fn func2_read_file(max_line:usize) -> Result<Vec<DataStruct>, io::Error>{
    let file = File::open(filename).unwrap(); // Ignore errors.
    let reader = BufReader::new(file);

    let mut vec_data:Vec<DataStruct>=Vec::new();
    for (index, line) in reader.lines().enumerate() {
        let line = line.unwrap(); // Ignore errors.       

        let split = line.split("\t");
        let vec = split.collect::<Vec<&str>>();
        vec_data.push(DataStruct {
            id: vec[0].parse::<i32>().unwrap(), // Ignore errors.
            name_acsii: vec[2].to_string(),
            latitude: vec[4].parse::<f32>().unwrap(), // Ignore errors.
            longitude: vec[5].parse::<f32>().unwrap(), // Ignore errors.
        });        
        if index>max_line{
            break;
        }
    }
    return vec_data; // Definitely not going to work here
}

So, as you can see, the code above use a lot of .unwrap(), which I assume lots of error has been ignored.
In Go, I can just have each error return as Null or various string, for various error.
I don't even know what type of Error Rust has. I just put io::Error because it's related to file reading, but obviously there are other errors, like convert the string to f32. I want to catch the error in other functions, and handle it accordingly.
So anyway, like in the above code for example, what is the rust way to return properly, and maybe catch the error in main function, as I am new to Rust so I am not 100% sure of what I am doing. I know, to return error, there are some long route and short route, but since I am still learning, I would prefer the long error returning method to understand everything clearly.
Or if I am doing something in a not-very-rust way, please enlighten me.
Thank you

This seems to be a simple program, where most of the errors should result in an error message, and that's mostly it. It's not a library that you expend hundreds of people to use, and get information from them.

Given that assumption, there are two easy recommendations that will at least help you get the job done, and advance you further in your learning of Rust.

  1. Rust has the ? operator for "if this is an error, return it." I personally consider this one of its best features. Several of the places you have unwrap can use it directly.
  2. The std::io::Error has an error kind called InvalidInput, which accompanies an error message. This seems like what you would want if a parse fails. The error types generated by a failure to parse integers, floats, and strings are not the same as io::Error, but they are easy to "convert" with an error message that suits your purpose.

With these two suggestions applied, your code would look like this. It is not "idiomatic" per se, but it does handle the errors more cleanly.

pub fn func2_read_file(max_line:usize) -> Result<Vec<DataStruct>, io::Error>{
    let file = File::open(filename)?; // return the error, otherwise continue on.
    let reader = BufReader::new(file);

    let mut vec_data:Vec<DataStruct>=Vec::new();
    for (index, line) in reader.lines().enumerate() {
        let line = line.map_err(|e| io::Error(io::ErrorKind::InvalidInput, "failed to read a string on a line"))?;

        let split = line.split("\t");
        let vec = split.collect::<Vec<&str>>();
        vec_data.push(DataStruct {
            id: vec[0].parse::<i32>()
                .map_err(|_| io::Error(io::ErrorKind::InvalidInput, "failed to parse column 1 as i32"))?; // map the result into a Result<T, io::Error>, and then perform the ? operator as above
            name_acsii: vec[2].to_string(),
            latitude: vec[4].parse::<f32>().map_err(|_| io::Error(io::ErrorKind::InvalidInput, "failed to parse column 4 as f32"))?,
            longitude: vec[5].parse::<f32>().map_err(|_| io::Error(io::ErrorKind::InvalidInput, "failed to parse column 5 as f32"))?,
        });        
        if index>max_line{
            break;
        }
    }
    Ok(vec_data) // idiomatic return style, plus it's a Result<Vec, E> not a Vec
}

With this design, however, you cannot do things like indicate line numbers of an error. You could format! an error message with index in it, but this type of information is usually conveyed better with a custom error type, or returning the original error types instead.

For returning multiple error types, you can use the anyhow crate. It will "fold together" all your errors, returning a single structure that makes operations like pretty-printing easy.

I would also recommend a recent thread or understanding errors in Rust.

As a broader statement, I would also suggest looking into the nom crate for parsing things from a stream or file. It is somewhat verbose, but structures your code to be efficient (as a recursive descent parser based on rules), and handle errors gracefully.

Hope this helps!

EDIT: clarity and extra suggestion

4 Likes

It is not related to your question, but forcing to handle all errors is a great rust feature, never use unwrap unless you want the program to panic on error deliberately, you can always use match to catch the error and handle it properly or use unwrap_or/unwrap_or_default to provide a default value.

As to your question, here is what I would do:

  1. Handle the error at where it happens, since you have to handle the errors, why delay it ? For instance, you can write a function to handle the parse errors:
fn func2_parse_line(vec_data: &mut Vec<DataStruct>, line: String) {
    let split = line.split("\t");
    let vec = split.collect::<Vec<&str>>();
    let id = match vec[0].parse::<i32>() {
        Ok(n) => n,
        Err(e) => {
            // handle parse id error here
            0 // or provide a default value
        }
    };

    let latitude = match vec[4].parse::<f32>() {
        Ok(n) => n,
        Err(e) => {
            // handle parse latitude error here
            0.0 // or provide a default value
        }
    };

    let longitude = match vec[5].parse::<f32>() {
        Ok(n) => n,
        Err(e) => {
            // handle parse longitude error here
            0.0 // or provide a default value
        }
    };
    vec_data.push(DataStruct {
        id,
        name_acsii: vec[2].to_string(),
        latitude,
        longitude
    });
}
  1. If you really want to return those errors and handle them elsewhere, you can define your own error struct to cover all kinds of errors you might encounter in this function. Something like this:
enum FieldParseError {
    IdParseError(/* some information the handler needs */),
    LatitudeParseError(/* some information the handler needs */),
    LongitudeParseError(/* some information the handler needs */)
}
enum UnifiedError {
    FileOpenError(/* some information the handler needs */),
    FileReadError(/* some information the handler needs */),
    LineParseError(FieldParseError)
}

But, to return UnifiedError, you still need to handle the original errors.

let id = match vec[0].parse::<i32>() {
        Ok(n) => n,
        Err(e) => {
            return UnifiedError::LineParseError(FieldParseError::IdParseError());
        }
    };

I hope this helps.

1 Like

So I suppose I should be handling error ASAP instead of propagating? I will take that advice. Thank you.

No, that is not the case! You can if that is the proper solution — but often it is not. The safer, more correct default is to propagate errors using ?, rather than blindly converting them to a value like zero or an empty string.

I'm not sure why @asingingbird assumed that a "default value" of 0 is appropriate for an arbitrary ID. I think that's a big red flag and I would definitely not recommend doing this in the name of handling the error "right away". In this case, this doesn't really handle the error, it hides it instead.

Did you notice there are several different error types in this function? what return type should it be ?
The dafault value is just an example, 0 might represent an invalid id, but you can choose any value that suits your program or just panic with some error messages.

As I write here, you can handle the error right here in your own way, call a handler function to panic, or you can provide a default value for this field to ignore this error (if that suits your case), these two options are a parser usually would take, I don't think it hides the error.

@asingingbird Considering all the available options, this is how I'd write the function:

fn func2_parse_line(line: &str) -> Result<DataStruct, IncompleteDataStruct> {
    let mut line_itr = line.split("\t");

    let id = line_itr.next().and_then(|value| value.parse::<i32>().ok());
    let name_ascii = line_itr.skip(1).next().map(ToString::to_string);
    let latitude = line_itr.skip(1).next().and_then(|value| value.parse::<f32>().ok());
    let longitude = line_itr.next().and_then(|value| value.parse::<f32>().ok());

    match (id, name_ascii, latitude, longitude) {
        (Some(id), Some(name_ascii), Some(latitude), Some(longitude))
            => Ok(DataStruct { id, name_ascii, latitude, longitude }),
        (id @ _, name_ascii @ _, latitude @ _, longitude @ _)
            => Err(IncompleteDataStruct { id, name_ascii, latitude, longitude })
    }
}

struct IncompleteDataStruct {
    id: Option<i32>,
    name_ascii: Option<String>,
    latitude: Option<f32>,
    longitude: Option<f32>,
}

EDIT: However, I do think, the marked solution does it overall better, because handling it at the file level allows printing the accurate error location, which makes debugging considerably easier. I would still like to point out, that my solution avoids the collection into a new vector, which I'd recommend regardless of the error handling approach.

Hmm... What is all that (id @ _, name_ascii @ _, latitude @ _, longitude @ _) about? I have not seen such a thing before.

It can be any type that implements From<E> for each of the different error types. Even something like Box<dyn Error>, that's roughly the Anyhow crate works.

See https://doc.rust-lang.org/book/appendix-02-operators.html (search the page for @) and https://stackoverflow.com/a/49906536/3816796.

1 Like

Thanks. So @ is "pattern binding".

The examples look messy, isn't there a more readable way to do that?

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.