Zero-copy deserialization in generic function

I'm trying to deserialize a generic type T in a function without requiring that T: DeserializeOwned, but just that T: Deserialize, i.e. I would like to permit it to borrow from the input. However I don't see a way to make this work. Consider this code:

use std::fmt;

use serde::Deserialize;

#[derive(Debug, Deserialize)]
struct Foo<T> {
    #[allow(dead_code)]
    bar: T,
}

fn main() {
    deserialize_and_print();
    deserialize_generic_and_print::<Foo<&str>>();
}

fn deserialize_and_print() {
    let s = r#"{ bar: "baz" }"#.to_string();
    let foo: Foo<&str> = serde_json::from_str(&s).unwrap();
    println!("{foo:?}");
}

fn deserialize_generic_and_print<T: fmt::Debug + Deserialize>() {
    let s = r#"{ bar: "baz" }"#.to_string();
    let foo: Foo<T> = serde_json::from_str(&s).unwrap();
    println!("{foo:?}");
}

The non-generic function deserialize_and_print works perfectly fine, but this code does not compile because the Deserialize bound in deserialize_generic_and_print is missing a lifetime.

But how can I fill out this lifetime? I don't want the caller to choose a lifetime and it certainly can't just be any lifetime. The way I think about it, I want to specify the lifetime of a local variable of the function, i.e. the lifetime of s. But I'm not aware of any way to do that. Conceptually I want to write this basically:

// Note the lack of a generic lifetime parameter.
fn deserialize_generic_and_print<T: fmt::Debug + Deserialize<'s>>() {
    's: {
        let s = r#"{ bar: "qux" }"#.to_string();
        let foo: Foo<T> = serde_json::from_str(&s).unwrap();
        println!("{foo:?}");
    }
}

If I try to just fill in a generic lifetime like this...

fn deserialize_generic_and_print<'s, T: fmt::Debug + Deserialize<'s>>() {
    let s = r#"{ bar: "qux" }"#.to_string();
    let foo: Foo<T> = serde_json::from_str(&s).unwrap();
    println!("{foo:?}");
}

I get borrowed value does not live long enough and s dropped here [at the end of the function] while still borrowed, because the caller could in principle choose a lifetime that is longer than the lifetime of s - but is there no way to restrict the caller to not allow any lifetime longer than a local variable of the function?

I'd like to know whether I am running up against some kind of misunderstanding or fundamental limitation of the borrow checker, or whether the issue is a lack of a way to specify that "local variable lifetime".

I think you understand the problem correctly.

The only way for the caller to specify "local variable lifetimes" is via a higher-ranked trait bound that says something about all lifetimes.[1]

where
    for<'a> [Some bound that can involve 'a]

If you limit yourself to a particular type constructor -- say &_ -- then you can write the required bounds directly...

fn deserialize_generic_and_print<U>() 
where
    U: ?Sized + fmt::Debug,
    for<'a> &'a U: Deserialize<'a>,
{ /* ... */ }

// This could probably be done with having a reference as a field,
// but I went with `&Foo<_>` being the target transient type instead
#[repr(transparent)]
struct Foo<T: ?Sized> {
    bar: T,
}

impl<'a, T: ?Sized> From<&'a T> for &'a Foo<T> {
    // requires a dash of unsafe, see playground
}

impl<'a, U> Deserialize<'a> for &'a Foo<U> where
    U: ?Sized, &'a U: Deserialize<'a>,
{ /* ... */ }

...but this is not a complete solution, because it only works with references -- not other type constructors, and also not DeserializedOwned types like String.


If Rust had generic type constructors, you could write something like

//    vvvvvvvvvv hypothetical "generic type constructor" feature
fn ex<TyCons<'*>>() // ' (just fixing syntax highlighting)
where
    for<'a> TyCons<'a>: Deserialize<'a>,

But it does not. Instead we emulate them with GATs or parameterized traits.

trait DeserializeMarker {
    type Ty<'a>: Deserialize<'a>;
}

impl DeserializeMarker for str {
    type Ty<'a> = &'a str;
}

fn deserialize_generic_and_print<Marker>() 
where
    for<'a> Marker: DeserializeMarker<Ty<'a>: fmt::Debug>,
{
    let s = r#" "bar" "#.to_string();
    let foo: Foo<Marker::Ty<'_>> = serde_json::from_str(&*s).unwrap();
    println!("{foo:?}");
}

Unfortunately this also is not a complete solution, as there's no way to generically implement DeserializeMarker for all applicable type constructors. You can extend the playground to handle all references without nested lifetimes and owned types. But every new type constructor you want to support would need a new implementation similar to that of Ref.

But it's the closest thing to a solution I could think of.


More far-future day-dreaming

Another future possibility is if functions could declare "existential lifetimes", where the caller can't name the concrete lifetime and instead the compiler must try to infer it post-monomorphization...

// This hypothetical version only works with reference types though
fn ex_1<'?x, U>() // '
where
    U: ?Sized,
    &'?x U: Deserialize<'?x>
{}

// So it's another version of "can't work with `DeserializeOwned` types"
fn caller_1() { 
    ex_1::<str>();

    // Doesn't work
    // ex_1::<String>();
}

...but to support non-references we need to also have existential types...

fn ex_2<'?x, ?T>() // '
where
    ?T: Deserialize<'?x>

...but that is useless for this use case without some way for the caller to drive inference somehow, even though they can't name the concrete lifetime...

fn caller_2() {
    ex_2::<Foo<&str>>()
    // ?T = Foo<&'?x str>>

    // Would also work (?T = String)
    ex_2::<String>();
}

...so I think that just brings us back to another form of generic type constructors.


  1. One could argue "specify" is no longer the correct word, since we're no longer talking about a single concrete lifetime at the call site. ↩︎

1 Like

Nit: not only is that not possible, the opposite is currently always assumed. That is, lifetime parameters of a function are always assumed to last for at least the function call itself, which is obviously longer than any of its local variables.

1 Like

So if I understand correctly, this doesn't sound like an inherent limitation, but just something that Rust doesn't currently allow or provide any mechanism for. Would it be possible to add a new special lifetime variable name (like how 'static is special) to cover this use case? For example it could be called 'local. That lifetime would last until the end of the function, so I could define the function like so:

fn deserialize_generic_and_print<T: fmt::Debug + Deserialize<'local>>() {
    let s = r#"{ bar: "qux" }"#.to_string();
    let foo: Foo<T> = serde_json::from_str(&s).unwrap();
    println!("{foo:?}");
}

Would there be some issue with that, or could that work? Would it be sensible?


As a followup, what would be a good workaround? The only realistic option I see right now is to manually monomorphize the function, i.e. basically write it for each type that I need. I guess I could use a macro to cut down on the repetition, but it doesn't feel great.

Without knowing what the actual code looks like, it's going to be difficult to provide a "good" solution. The general solution is simple: the lifetime of the input needs to last at least as long as lifetime in Deserialize; thus something like this will work:

fn foo() {
    deserialize_generic_and_print::<'_, '_, Bar>(r#"{ bar: "baz" }"#.to_owned().as_str());
}
fn deserialize_generic_and_print<'a: 'b, 'b, T: fmt::Debug + Deserialize<'b>>(input: &'a str) {
    let foo: Foo<T> = serde_json::from_str(input).unwrap();
    println!("{foo:?}");
}

Yea that's fair - in my situation, I am doing a network call to fetch a JSON, so I am not taking the input as an argument to the function. I am creating (fetching) the input string inside the function (i.e. same as that .to_string call in the dummy code).

So define a separate function that calls deserialize_generic_and_print as my revised example shows.

The specifics of this again depends on the actual structure of the code, but the general idea is that you must have a function/method that takes the input at least indirectly.

For example, I have an HTTP/2 server based on hyper that defines a type JsonData that contains a Bytes. I have another type PostJson that implements Future and returns a JsonData. JsonData has two methods:

    pub fn json<'a: 'b, 'b, T: Deserialize<'b>>(&'a self) -> Result<T, JsonErr> {
        serde_json::from_slice(&self.body)
    }
    pub fn json_owned<T>(&self) -> Result<T, JsonErr>
    where
        for<'a> T: Deserialize<'a>,
    {
        serde_json::from_slice(&self.body)
    }

Here the input is passed (indirectly) via &self.

An async fn that uses this would look like:

pub async fn post_foo(method: PostJson) -> Result<(), E> {
    let req = method.await?;
    let foo = req.json::<'_, '_, Foo>()?;
    // ⋮
}

See the collapsed "far-future day-dreaming" part of my reply.

"Infer the lifetime" would be more useful than "lasts from first use through all reachable return paths of the function". The latter would be fine for your example... except that T is not nameable from the caller either, so (like I mention in my other reply) that's a new language capability too.

I don't think we're getting such a feature any time soon.