[Solved] Latin1 versus utf8 testing


#1

I am a new Rust programmer. I have programmed before in other languages. My problem is with a small test for which I would like your advice.

I am writing a program to convert latin1 to utf8. I got this to work and the file encoding that I have shows the file is converted correctly. The code to the function byte2utf8 is not relevant to my query (I use “as char” to convert).

However, I want to have a simple test in my lib so that any regression should be noted. The test however, doesn’t seem to test anything, probably the “latin1” string I created is not really latin1. Can anyone shed some light on what I am doing wrong?

This is the test code:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn latin_string() {
        let a: &[u8] = r"\xe1\xe2".as_bytes();
        let mut b = Vec::new();

        println!("a = {:x?}", &a);
        println!("b = {:x?}", &b);
        match byte2utf8(a, &mut b) {
            Ok(arg) => arg,
            Err(e) => {
                eprintln!("{}", e);
                std::process::exit(1);
            }
        }

        println!("a = {:x?}", &a);
        println!("b = {:x?}", &b);

        assert_eq!(&a, &b.as_slice());
        assert_eq!(false, true);
    }
}

The output is get is:

running 1 test
test tests::latin_string ... FAILED

failures:

---- tests::latin_string stdout ----
a = [5c, 78, 65, 31, 5c, 78, 65, 32]
b = []
a = [5c, 78, 65, 31, 5c, 78, 65, 32]
b = [5c, 78, 65, 31, 5c, 78, 65, 32]
thread 'tests::latin_string' panicked at 'assertion failed: `(left == right)`
  left: `false`,
 right: `true`', src/lib.rs:137:9
note: Run with `RUST_BACKTRACE=1` for a backtrace.


failures:
    tests::latin_string

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

error: test failed, to rerun pass '--lib'

I made the test fail so I could compare the original string a with it’s hex output. The weird thing is that the println macro turns 2 bytes in to 2x4 bytes. So my suspicion is that the a string is not a latin1 string at all.

  1. Am I correctly coding a latin1 string in a?
  2. How to code a latin1 string if this is not the way?
  3. Any other comments?

#2

All string literals in Rust are utf8. The r prefix is for not having to escape stuff in your strings.

To create byte-strings, you want the b prefix, not r. And yes, you can combine them if you like.


#3

Thank you very much! The results are much better now:

a = [e1, e2]
b = []
a = [e1, e2]
b = [c3, a1, c3, a2]

A bit embarrassed by the simpleness: I will never forget the difference between r and b again!


#4

How does one set this post to solved?


#5

You can just edit the title. (There’s a pencil icon near it)