How to avoid additional char escape from stdin?

Hello,

I have a file (actually a lot of files) with serialized JSON objects in format like (one object per line)

"{\"tm\":1540400396,\"b\":\"s297188\",\"i\":17776}"

so then I try pass them to rust program (actually a part of it)

use std::io::{self, BufRead};

fn main() {
    let stdin = io::stdin();
    for line in stdin.lock().lines().filter_map(Result::ok) {
        print!("{:?}", line); // Actually call to deserialization fn here
    }
}

I get String where some symbols additionally escaped like

"\"{\\\"tm\\\":1540400396,\\\"b\\\":\\\"s297188\\\",\\\"i\\\":17776}\""

To parse that line I have to make two calls to serde_json::from_str now, first to get unescaped String and then to deserialize that String to struct. Using serde to avoid extra escaping looks not natural from my point of view, is there any other (faster or better) way to avoid extra escape symbols from input?

What you're seeing here is just your input string, printed as you would enter it as a Rust string literal. It still contains just what you wrote in your first code block, i.e., a JSON object within a (probably also JSON) string. Since the input is in that format (looks like someone messed up producing that file), you can't deserialize it in one step.

If you change your print line to use {} (Display) formatting instead of {:?} (Debug) formatting, it will not escape special characters:

print!("{}", line);
2 Likes

This to me seems perfectly reasonable, assuming that the string itself actually is encoded as a JSON string (i.e. that the escape syntax used precisely matches the escape codes available in JSON).

To my understanding, serde_json is highly optimized, and deserializing a String does extremely little work beyond validating the outer " and unescaping the contents. (it might parse an integer or a bool if you give it one, just for better diagnostics, but that's about it)

Yes, but with slightly modified version (added serde_json) I get:

Err(Error("invalid type: string \"{\\\"tm\\\":1540400396,\\\"b\\\":\\\"s297188\\\",\\\"i\\\":17776}\", expected struct T", line: 1, column: 51))%

so, it seems extra escape characters is present but not displayed, right?

extern crate serde_json;
#[macro_use]
extern crate serde_derive;

use std::io::{self, BufRead};

#[derive(Debug, Deserialize)]
struct T {
  tm: u32,
  b: String,
  i: u32,
}

fn main() {
    let stdin = io::stdin();
    for line in stdin.lock().lines().filter_map(Result::ok) {
        print!("{:?}", serde_json::from_str::<T>(&line));
    }
}

Agree, but I talked about code like

extern crate serde_json;
#[macro_use]
extern crate serde_derive;

use std::io::{self, BufRead};

#[derive(Debug, Deserialize)]
struct T {
  tm: u32,
  b: String,
  i: u32,
}

fn main() {
    let stdin = io::stdin();
    for line in stdin.lock().lines().filter_map(Result::ok) {
        let tmp: String = serde_json::from_str::<String>(&line).unwrap();
        print!("{:?}", serde_json::from_str::<T>(&tmp));
    }
}

so if I remove tmp here and use line in second serde_json::from_str call, I get error

Err(Error("invalid type: string \"{\\\"tm\\\":1540400396,\\\"b\\\":\\\"s297188\\\",\\\"i\\\":17776}\", expected struct T", line: 1, column: 51))%

which is not what I expect. Can this code be simplified? I'd prefer deserealize data in one pass, not two. :slight_smile:

Your data simply is double-serialized, so you need two deserialize passes. Serde doesn't support this out of the box. One could probably hack up something, but it sure won't be any simpler than a call to from_str.

I see, thank you.