[Solved] Serde deserialize str containig special chars

#1

I found this (to me) unexpected behaviour in the glorious serde/serde_json :heart: crates.

A str field in a struct with a lifetime &'a str is serialized and deserialized. If the str from the beginning contains special escaped charactes such as \" or \n it will end with an runtime error on example3 below
Err value: Error("invalid type: string \"hel\\\"lo\", expected a borrowed string"

Example2 will also give a runtime error during deserialize, but it should be valid json.

It seems like the characters in ex3 is serialized one by one so \ is serialized to \ and " is serialized to ", but I expected that the str would be unaffected and the deserializer to inverse serializer on a str.

I just like to understand :flushed: Is there special characters that is forbidden in str and serde?
Any clue on why this happens and how to work around? Is it more related to str in rust? Or is it just a bug…

extern crate serde_derive;

extern crate serde;
extern crate serde_json;

use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize)]
pub struct A<'a> {
    pub v: &'a str
}

impl<'a> std::fmt::Display for A<'a> {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "A{{v: {}}}", self.v)
    }
}

fn main() {
    // ex 1 works fine
    let a1 = A{v: "hello"};
    let v1 = serde_json::to_string(&a1).unwrap(); 
    let deser_value1 = serde_json::from_str::<A>(&v1).unwrap();
    println!("deser1: {}", deser_value1);
    
    // ex 2 
    let v2 = r#"{"v":"hel\"lo"}"#; 
    let deser_value2 = serde_json::from_str::<A>(&v2).unwrap();
    println!("deser2: {}", deser_value2);

    // ex 3 runtime error 
    let a3 = A{v: "hel\"lo"};
    let v3 = serde_json::to_string(&a3).unwrap(); 
    let deser_value3 = serde_json::from_str::<A>(&v3).unwrap();
    println!("deser3: {}", deser_value3);
}

(Playground)

#2

The problem is that if you’re trying to deserialize into a borrowed string, then the lifetime of that borrowed string is tied to the input of your JSON string, and moreover, presumably must correspond to a substring of the original JSON. But since your JSON string contains escape sequences, that isn’t possible. e.g., A "hel\"lo" in a JSON string should actually deserialize to hel"lo in your Rust &str, but this is not a substring of "hel\"lo", so it doesn’t work.

It sounds like you can fix this by using a Cow<'a, str> instead.

2 Likes
#3

Ah thank you very much! I updated the example to a working version for documentation

#[derive(Serialize, Deserialize)]
pub struct A<'a> {
    pub v: Cow<'a, str>
}

Playground

#4

This is a yet another ownership issue. You got JSON string with value {"v":"hel\"lo"}, and so your &str field SHOULD points to some memory location whose value is hel"lo. But who owns that memory location? String literals are defined at compile time and embedded directly to the excutable binary, thus owned by the binary itself. Strings manages its own heap allocation, thus it owns that memory. Cow<'a, str> suggested above is a enum value which is EITHER &str or String, so it can owns escaped data when the original data contains escape sequence so nobody actually has hel"lo data in memory.

2 Likes