I have written a function to read a file and convert the lines to Vec<String>. It works fine for UTF-8 but not for ANSI. My text files are user generated, so I can't gaurantee the encoding. On simple search, I could understand String is UTF-8. So, is it possible to covert this function to support both ANSI and UTF-8 in efficient way?
pub fn read_file(path: &String) -> Vec<String> {
let f = fs::File::open(path).expect("no file found");
let br = BufReader::new(f);
let lines: Vec<String> = br
.lines()
.collect::<Result<_, _>>()
.unwrap_or_else(|_| panic!("Failed converting file into lines. Path: {}", path));
return lines;
}
You can't use a String or String-based methods then, you'll have to use Vec<u8> like so:
pub fn read_file(path: &String) -> Vec<Vec<u8>> {
let f = fs::File::open(path).expect("no file found");
let br = BufReader::new(f);
let lines: Vec<Vec<u8>> = br
.split(b'\n')
.map(|mut line| {
// Remove the CR from CRLF Windows-style line breaks
if line.ends_with(b"\r") {
line.pop();
}
line
})
.collect::<Result<_, _>>()
.unwrap_or_else(|_| panic!("Failed converting file into lines. Path: {}", path));
return lines;
}
Small nit: You should never use &String; take &str instead. &String is far more restrictive and incurs a double indirection. Also, since it's the last statement you can replace return lines; with just lines.
With the caveat that you can't assume anything about what the individual bytes represent if using an arbitrary Windows code page. You can't even assume they are an ASCII superset or that they are a byte based encoding.
Decoding to Unicode, if possible, makes it easier to handle these strings consistently unless what you're doing really is encoding agnostic.