I am new to rust and new to programming and I am writing some simple programs to get the feel for the language. I am attempting to read into a file, in this case an xml file. I want to make match expressions that find the angle brackets in each line. My idea was to make a slice of the [u8] that is returned after the as_bytes() function is called. I'm sure there is a better way to do this, but I am getting the "mismatched type error" when trying to compare values in the match arms.
fn main() {
let file_path = Path::new(r"C:\Users\user\RustProjects\myproject\test.xml");
let reader = get_buf_reader(file_path);
for line in reader.lines() {
// read lines as bytes
let line = &line.unwrap().as_bytes();
// get slice
match lines[..0] {
// mismatched type error expected [u8] found u8
60u8 => println!("Open angle bracket found"),
_ => println!("No bracket found"),
}
}
}
fn get_buf_reader(file_path: &Path) -> BufReader<File> {
let file = File::open(file_path);
let file_success = match file {
Ok(file) => file,
Err(error) => panic!("Problem opening file: {error:?}"),
};
let reader = BufReader::new(file_success);
reader
}
A slice of bytes contains zero or more bytes. But it looks like you want to match on a single byte. To do that you would index into the slice of bytes like this: lines[0].
But since Strings contain unicode chars, it would be better to match on the first char, not the first byte. You can use the chars method to get an iterator of chars in the string, and then call next to get the first char. Note that the string may be empty and next may return None.
Also, lines[..0] will create an empty slice, since 0 is the last index plus one. All ranges in Rust are specified as the first index and the last index plus one. To specify a slice of one byte use: lines[..1].
Here is a version with these changes that compiles:
for line in reader.lines() {
// ignore errors for now
let line = line.unwrap();
// match on first char
match line.chars().next() {
Some('<') => println!("Open angle bracket found"),
_ => println!("No bracket found"),
}
}
A single Unicode char can be represented by more than one byte. Even when they are represented by one byte, they are not always ASCII.
Rust strings are in UTF8 format, where each char is represented by one to four bytes.
EDIT:
ASCII is a subset of Unicode. So you can compare char to ASCII values, and in fact '<' is an ASCII char. I just used the char literal syntax: '<'. I could have also used the hex syntax: '\x3c'.