[solved] Rmp_serde and invalid utf-8 strings

d3zd3z · September 4, 2017, 5:58pm

I am trying to use rmp-serde (0.13) to decode a msgpack data item written by the python app Borg Backup (the repository keys to start with). It seems that Borg (and/or python's msgpack library) is perfectly happy to encode binary data as a "string" type in msgpack, rather than as binary data.

If I decode using rmpv to a Value, these fields decode as a string, and the bytes can be extracted with the as_bytes() method.

However, if I use rmp-serde, I can't come up with a field type that can possibly decode these. Even if I specify the type of the field as Vec<u8>, it still fails with a Utf8Error.

Any recommendations on how this should be handled? The rmpv documentation suggests that strings may indeed be invalid, so is this a problem with rmp-serde?

dtolnay · September 4, 2017, 9:21pm

Check out serde_bytes::ByteBuf.

extern crate rmpv;
extern crate rmp_serde;
extern crate serde_bytes;

use std::io::Cursor;
use serde_bytes::ByteBuf;

fn main() {
    let bytes = [162, 255, 255];

    let de = rmpv::decode::read_value(&mut Cursor::new(bytes)).unwrap();
    println!("{:#?}", de);
    match de {
        rmpv::Value::String(s) => {
            assert!(s.is_err());
            assert_eq!(s.as_bytes(), b"\xff\xff");
        }
        _ => {
            panic!("expected string");
        }
    }

    let de: ByteBuf = rmp_serde::from_slice(&bytes).unwrap();
    println!("{:#?}", de);
    assert_eq!(Vec::from(de), b"\xff\xff");
}

d3zd3z · September 7, 2017, 1:43am

Thanks for the suggestion. It seems to work well to do something like:

struct Item {
    #[serde(with = "serde_bytes")]
    field: Vec<u8>;
}

and the field can still be used as a Vec<u8>, but it will then properly decode, even if the msgpack data mistakenly represents it as a string.

Topic		Replies	Views
Rmp-serde and deserialize_with problem help	3	1082	May 8, 2022
Rust MessagePack and Serde 1.0 announcements	2	6106	January 12, 2023
Encoding PathBuf containing path with invalid utf-8 characters using serde help	27	1089	December 1, 2022
Is it possible to generalize a type using rmp serde? help	3	510	August 3, 2019
Csv + serde vs non-utf8 (easily) help	5	1079	January 12, 2023

[solved] Rmp_serde and invalid utf-8 strings

Related Topics