[solved] Rmp_serde and invalid utf-8 strings

I am trying to use rmp-serde (0.13) to decode a msgpack data item written by the python app Borg Backup (the repository keys to start with). It seems that Borg (and/or python's msgpack library) is perfectly happy to encode binary data as a "string" type in msgpack, rather than as binary data.

If I decode using rmpv to a Value, these fields decode as a string, and the bytes can be extracted with the as_bytes() method.

However, if I use rmp-serde, I can't come up with a field type that can possibly decode these. Even if I specify the type of the field as Vec<u8>, it still fails with a Utf8Error.

Any recommendations on how this should be handled? The rmpv documentation suggests that strings may indeed be invalid, so is this a problem with rmp-serde?

Check out serde_bytes::ByteBuf.

extern crate rmpv;
extern crate rmp_serde;
extern crate serde_bytes;

use std::io::Cursor;
use serde_bytes::ByteBuf;

fn main() {
    let bytes = [162, 255, 255];

    let de = rmpv::decode::read_value(&mut Cursor::new(bytes)).unwrap();
    println!("{:#?}", de);
    match de {
        rmpv::Value::String(s) => {
            assert!(s.is_err());
            assert_eq!(s.as_bytes(), b"\xff\xff");
        }
        _ => {
            panic!("expected string");
        }
    }

    let de: ByteBuf = rmp_serde::from_slice(&bytes).unwrap();
    println!("{:#?}", de);
    assert_eq!(Vec::from(de), b"\xff\xff");
}
1 Like

Thanks for the suggestion. It seems to work well to do something like:

struct Item {
    #[serde(with = "serde_bytes")]
    field: Vec<u8>;
}

and the field can still be used as a Vec<u8>, but it will then properly decode, even if the msgpack data mistakenly represents it as a string.

1 Like