About Higher-Rank Trait Bounds

For several years I've been writing code in Rust. I thought I had the understanding of how the hardest part of Rust works - lifetimes. But developing complex projects with a large number of files and modules, sometimes I was faced with unresolvable compilation errors. The reason for these errors was lifetimes checking. At such moments, I thought deeply and decided that I didn't understand this topic well enough. Well, can't it be that complicated? I'll give a specific example. I tried to simplify my real case as much as possible and leave the gist.

I want to make a generic loader that loads json files and caches parsed structs in self so as not to download the same files again. Since this is already responsible for loading, why not store a buffer inside it, into which files will be read? So the struct will be

struct Loader<A> {
    buf: String,
    vec: Vec<A>,
}

Since I'm going to read and parse files, I need the error type

#[derive(Debug)]
enum Error {
    Io(std::io::Error),
    Serde(serde_json::Error),
}

I'll write the load method. It has no data caching yet, just downloading

impl<A> Loader<A> {
    fn load<'a, L, T>(&mut self, name: &str, loader: L) -> Result<A, Error>
    where
        // High level function to abstract some conversion logic `T` -> `A`
        L: FnOnce(T) -> Result<A, Error>,
        T: Deserialize<'a>,
        // Remark:
        // I do not write this explicitly, but I am sure that
        A: 'static,
        // This is important because I need to get the `Result` without any borrowing
    {
        let buf = &mut self.buf; // <- Compile error here
        buf.clear();

        // Build path and read json from file here
        buf.push_str(r#"{ "x": 1.0, "y": 2.0 }"#);
        serde_json::from_str(buf.as_str())
            .map_err(Error::Serde)
            .and_then(loader)
    }
}

(Playground)

Compilation results in an error as the compiler cannot infer an appropriate lifetimes. And it's pretty easy to understand the reason if you look at the serde_json::from_str definition

pub fn from_str<'a, T>(s: &'a str) -> Result<T>
where
    T: de::Deserialize<'a>,
// ..

Ok. It looks like I just need to specify my buffer lifetime. But the buffer is in self, which means I need to specify the lifetime on the self reference? I'll try

fn load<'a, L, T>(&'a mut self, name: &str, loader: L) -> Result<A, Error>
// ..

Now it works! For now..
Yes, this code works great, but it can deadlock whole next development. What if I want to write a method that calls load internally? For example now I'll use the vec

fn load_and_save<'a, L, T>(&'a mut self, name: &str, loader: L) -> Result<(), Error>
where
    L: FnOnce(T) -> Result<A, Error>,
    T: Deserialize<'a>,
{
    let a = self.load(name, loader)?;
    self.vec.push(a);
    Ok(())
}

Then I get the message "cannot borrow self.vec as mutable more than once at a time".
(Playground)

But why? I just used another method, which returns a structure with static lifetime. For some reason, after calling this method, I can no longer use self. My intuition about lifetimes has failed me :pensive:

If I think deeply about this code, it seems that I'm missing something. Back to the load method, why bother to specify a lifetime to its input arguments? In fact, I need to get any lifetimes at the input, because the output is only a structure with a static lifetime. Let's say if you simplify this signature, you get something like

fn to_string(s: &str) -> String

Something that borrows data and returns data by value. But the problem is I have to tell this to compiler somehow. But I can't just omit the 'a parameter in

    T: Deserialize<'a>,

For example like this

    T: Deserialize<'_>,
    // or
    T: Deserialize,

In this case, the compiler swears and advises me to consider introducing a named lifetime parameter. If only it could be possible to create a temporary local parameter that would only apply to Deserialize bound. Although it looks like I've seen something like this

    T: for<'a> Deserialize<'a>,

Then the load signature will be simply

fn load<L, T>(&mut self, name: &str, loader: L) -> Result<A, Error>
where
    L: FnOnce(T) -> Result<A, Error>,
    T: for<'a> Deserialize<'a>,
// ..

And it works great! It would solve all my problems if I knew about it earlier. And I think that not only mine. I'm sure that many people will find this feature useful, especially those who like to program in a functional style. So happened that I didn't know about it, only rarely saw it in random code snippets and didn't understand what it was and why it exists. Only in the Rustonomicon I found a short description about this "Higher-Rank Trait Bounds". But I haven't read Rustonomicon yet. I planned to read it only after I finally understand the safe Rust. Why is HRTB in the Rustonomicon if it's a regular code, without any unsafe line? Is it really some kind of secret knowledge that only those who know the true essence of Rust can learn? This is an invaluable useful thing that puts a lot of things in their places. Answers many questions and solves many everyday tasks. It's much better than writing &'a mut self and trapping yourself without understanding what's going on. Perhaps it should be taught in beginner books and pay more attention to it? Or is it possible to teach compiler to advise this solution?

I admit I could easily have missed something. If there is a more suitable solution for this problem, then I would be glad to see it. I really understand those who find Rust is difficult. After all, lifetimes are really the hardest and most important topic in the language studying and HRTB will help clarify a lot.

In Serde's case you can use DeserializeOwned as a shortcut for for<'a> Deserialize<'a>. And it's also worth mentioning that for<'a> Deserialize<'a> has limitations when deserializing data in a function and then storing it temporarily, as discussed in this other thread.

The Rustonomicon isn't just for unsafe code, despite what it advertises; it's more of a place for topics too advanced for the book, but written in a tutorial style unlike the Reference. This is mostly unsafe but also includes features like variance, dropck, the details of coercions, et cetera. This could be document better though - maybe you could open an issue on rust-lang/nomicon?

3 Likes

The reference also has a brief section on HRTB, just not using that initialism:
https://doc.rust-lang.org/reference/trait-bounds.html#higher-ranked-trait-bounds

1 Like

For the record, serde itself has pretty decent documentation about this. Look at the docs for Deserialize. There’s a big “Lifetime” heading that mostly just links to a dedicated page called “Understanding deserializer lifetimes”. On there, you’ll find the answer to your problem

Trait bounds

There are two main ways to write Deserialize trait bounds, whether on an impl block or a function or anywhere else.

  • <'de, T> where T: Deserialize<'de> This means "T can be deserialized from some lifetime." The caller gets to decide what lifetime that is. Typically this is used when the caller also provides the data that is being deserialized from, for example in a function like serde_json::from_str . In that case the input data must also have lifetime 'de , for example it could be &'de str .
  • <T> where T: DeserializeOwned This means "T can be deserialized from any lifetime." The callee gets to decide what lifetime. Usually this is because the data that is being deserialized from is going to be thrown away before the function returns, so T must not be allowed to borrow from it. For example a function that accepts base64-encoded data as input, decodes it from base64, deserializes a value of type T, then throws away the result of base64 decoding. Another common use of this bound is functions that deserialize from an IO stream, such as serde_json::from_reader .To say it more technically, the DeserializeOwned trait is equivalent to the higher-rank trait bound for<'de> Deserialize<'de> . The only difference is DeserializeOwned is more intuitive to read. It means T owns all the data that gets deserialized.

Note that <T> where T: Deserialize<'static> is never what you want. Also Deserialize<'de> + 'static is never what you want. Generally writing 'static anywhere near Deserialize is a sign of being on the wrong track. Use one of the above bounds instead.

Since your use case doesn’t have any &'de str input arguments (or anything comparable), it’s reasonable to try out the T: DeserializeOwned bound, even without understanding everything that this page tries to explain here. And it turns out it works for your example without any further complications or the need to even mention any lifetime explicitly.

5 Likes

Thank you, this is what I needed. The reason I didn't use DeserializeOwned is because sometimes I need to deserialize non-owning structures. But now I understand that it is the same.

I realized that the problem is different. I'll create a new thread.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.