[serde]: Deserialization Lifetimes

Hello all,

I am trying to 'manually' parse HTTP requests, using nothing more than serde_json and std. I am doing this for learning purposes; I am not looking for Crate suggestions.

Issue

How can I handle the lifetimes in the following code, without just placing everything in the same scope?

Relevant Code

pub fn parse_request_header(request_string: String) -> Result<Request, ()> {
    if let Some(line) = request_string.lines().next() {
        match handle_method(line) {
            Methods::Get => {
                let route = get_route(line);
                if let Some(ref params) = get_query_params(line) {
                    match route.as_str() {
                        "/test" => {
                            Ok(Request::Test(deserialize::<Test>(params)))
                        }
...
fn get_query_params<'a, T>(s: &'a str) -> Option<T>
where
    T: Deserialize<'a>,
{
    if let Some(i) = s.find('?') {
        let json_string = create_json_from_query(&s[i + 1..]);
        match serde_json::from_str(json_string.as_str()) {
            Ok(serd) => Some(serd),
            Err(_) => None,
        }
    } else {
        None
    }
}
fn create_json_from_query<'a>(query: &'a str) -> String {
    let kv = query
        .split('&')
        .map(|q| {
            let mut x = q.split('=');
            let k = x.next().expect("can parse key");
            let v = x.next().expect("can parse value");
            format!("{k}: {v}")
        })
        .reduce(|acc, curr| format!("{acc}, {curr}"))
        .expect("can join iter");
    format!("{{ {kv} }}")
}

Error

`json_string` does not live long enough
borrowed value does not live long enough

Example Request

GET /test?my_key=1234 HTTP

I understand why the error exists, but I am unsure how to do what I want to do.

Link to Rust Playground Example

Any help is appreciated.

If you're deserializing from temporary data [1], you should use the DeserializeOwned trait in your trait bound, instead of Deserialize.

Changing the signature of get_query_params to this makes your code compile:

fn get_query_params<T>(s: &str) -> Option<T>
where
    T: DeserializeOwned,

There's more info on the difference between the two traits here: Deserializer lifetimes · Serde


  1. Which you are, in this case, because create_json_from_string allocates a new String, and it then gets dropped at the end of get_query_params. ↩︎

1 Like

@17cupsofcoffee Thank you very much.

I had read most of that chapter in the Serde book, but clearly misunderstood the use of the DeserializeOwned trait.

I do wonder if this is the "recommended" approach?

I am still trying to wrap my head around how serde can provide strict typing, but still reference the slice data :man_shrugging: ... one day

Yes.

I don't understand why those two would be at tension with each other. Rust is a statically-typed language, so all code, including libraries, has to be statically-typed, and conventions and idioms in the ecosystem are such that the type system will be used for catching as many mistakes as possible.

This by no means contradicts borrowing from a slice if that is possible. (It is not always possible.)

I suppose my awe comes from the ability to perform post-compilation type checks, as the data serde is getting is dynamically generated. I realise this is not entirely accurate, as the program still panics, when given data that does not match the given types.

Specifically, it is a mystery to me how Deserialize::from_str does this:

Zero-copy deserialization means deserializing into a data structure, like the User struct above, that borrows string or byte array data from the string or byte array holding the input. This avoids allocating memory to store a string for each individual field and then copying string data out of the input over to the newly allocated field. Rust guarantees that the input data outlives the period during which the output data structure is in scope, meaning it is impossible to have dangling pointer errors as a result of losing the input data while the output data structure still refers to it.

A high level (probably slightly oversimplified) explanation - the 'de lifetime on Deserialize is tied to the input data, via some clever trait bounds.

When you derive Deserialize on a type that contains references, the generated implementation will have lifetime bounds on it to ensure that 'de outlives them - for example:

// This is a simplified version of an example from the Serde docs:

#[derive(Deserialize)]
struct S<'a> {
    a: &'a str,
}

// The generated code: ("'de: 'a" means "'de outlives 'a")

impl<'de: 'a, 'a> Deserialize<'de> for S<'a>
where
    T: Deserialize<'de>,
{
    /* ... */
}

serde_json::from_str just calls Deserialize under the hood.

1 Like

I'm still not sure I understand the question. The lifetime of the string is determined at compile time by the compiler. The contents of the string are checked by the deserializer, at runtime. Neither of these depends on the other. The lifetime of a value is determined by the lexical structure of the code (what scope it is declared in, what other variables it borrows from, etc), and this is not influenced by what exact bits happen to make it up once the program is running.

See also:

1 Like

For the most part, I have the answer to my questions.

Thank you, both.


Ignore my original statements. For the most part, I am impressed by Serde's "zero-copy deserialization" - I did not even think this was a thing, until I read the docs, and realised why I was getting the lifetime error.

I just assumed, like my create_json_from_query implementation, Serde just copied the data, and would only need the &str to live as long as the "scope" serde_json::from_str function.

Indeed! I personally would have preferred if 'de had conventionally been named 'input:

…<'input> … <Output> …
where
    // an instance of type `Output` can be deserialized off input borrowed for the `'input` duration.
    Output : Deserialize<'input>,

In this example, @Sky020, since <'input> is in scope of <T>'s introduction, Output can name 'input.
That is, Output may, itself, be of the form &'input str:

  • impl<'input> Deserialize<'input> for &'input str {
    

Whereas DeserializeOwned is a way to require that the impl for Output necessarily be of the form:

  • //           Don't. Care.     And thus, `MyThing` is guaranteed
    //               vv           not to borrow from the `'input`.
    impl Deserialize<'_> for MyThing {
    

That is, you could see DeserializeOwned as Deserialize<'_>, with an explicitly "discarded" / dismissed mention of Deserialize's lifetime parameter.

  1. T : Deserialize<'_> is not valid Rust (and even if it were, it would be a bit ambiguous, exact-semantics-wise);
  2. But for our desired semantics, it can actually be written as:
    T : for<'any> Deserialize<'any>
    
  3. Since that looks a bit complex for less seasoned rustaceans, serde offers the convenience trait alias:
    trait DeserializeOwned = for<'any> Deserialize<'any>;
    
    So that T : DeserializeOwned stands for T : for<'any> Deserialize<'any>, i.e.,
    "T : Deserialize<'_>" in pseudo-code.

Then, back to your example, as @17cupsofcoffee put it, since the create_json_from_query yields a (locally-)owned String, which is thus locally-dropped before the function returns, that 'input(-to-the-deserialization) lifetime we were talking about is necessarily expired (by the time the function returns).

Thus, your T needs to "deserialize from the input in a lifetime-agnostic / non-borrowing fashion", i.e., it needs to be "T : Deserialize<'_>".

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.