Trouble with lifetimes in serde format implementation

Hello. I'm implementing Redbin format for serde. So far It went fairly well, despite I'm new to Rust. Although, for 4 days already I'm struggling with lifetimes, when trying to implement deserialization into &str. I'd be grateful for anyone to give me any guidance.

My idea is to pass an empty Vec<u8> to the deserializer, so that It could live after deserialization, together with slices created from the input. I cannot just return slices of input, because it has to be transcoded. Perhaps there could be a better approach?

I managed to distill a minimal example from my code, illustrating the error (please run cargo test to see):


////////////////////////////////////////////////////////////////////////////////
/// Redbin format minimal example:

pub struct Deserializer<'de> {
    input: &'de [u8],
    buf: &'de mut Vec<u8>,
}

impl<'de> Deserializer<'de> {

    fn parse_string(&mut self) -> Result<String> {
        Ok(String::from("transcoded from Deserializer::input"))
    }
}


impl<'de, 'a: 'de> DeDeserializer<'de> for &'a mut Deserializer<'de> {

    fn deserialize_str<V>(self, visitor: V) -> Result<V::Value>
    where
        V: Visitor<'de>,
    {
        let start = self.buf.len();
        let mut bytes = self.parse_string()?.into_bytes();
        self.buf.append(&mut bytes);
        let b: &'de [u8] = &self.buf[start..];
        visitor.visit_borrowed_str(unsafe { std::str::from_utf8_unchecked(b) })
    }

    fn deserialize_string<V>(self, visitor: V) -> Result<V::Value>
    where
        V: Visitor<'de>,
    {
        visitor.visit_string(self.parse_string()?)
    }

    fn deserialize_map<V>(self, visitor: V) -> Result<V::Value>
    where
        V: Visitor<'de>,
    {
        let value = visitor.visit_map(BlockData::new(self, 0))?;
        Ok(value)
    }

}

struct BlockData<'a, 'de: 'a> {
    de: &'a mut Deserializer<'de>,
    elements: i32,
}

impl<'a, 'de> BlockData<'a, 'de> {
    fn new(de: &'a mut Deserializer<'de>, len: i32) -> Self {
        BlockData { de, elements: len }
    }
}

impl<'de, 'a: 'de> MapAccess<'de> for BlockData<'a, 'de> {

    fn next_value_seed<V>(&mut self, seed: V) -> Result<V::Value>
    where
        V: DeserializeSeed<'de>,
    {
        let v = seed.deserialize(&mut *self.de)?;
        Ok(v)
    }
}

////////////////////////////////////////////////////////////////////////////////
/// Extracted from serde sourcecode:

pub trait DeDeserializer<'de>: Sized {

    fn deserialize_str<V>(self, visitor: V) -> Result<V::Value>
    where
        V: Visitor<'de>;

    fn deserialize_string<V>(self, visitor: V) -> Result<V::Value>
    where
        V: Visitor<'de>;

    fn deserialize_map<V>(self, visitor: V) -> Result<V::Value>
    where
        V: Visitor<'de>;
}

pub trait MapAccess<'de> {

    fn next_value_seed<V>(&mut self, seed: V) -> Result<V::Value>
    where
        V: DeserializeSeed<'de>;
}

pub trait Visitor<'de>: Sized {
    type Value;

    fn visit_borrowed_str(self, v: &'de str) -> Result<Self::Value> {
        Err(Error::Dummy)
    }

    fn visit_string(self, v: String) -> Result<Self::Value> {
        Err(Error::Dummy)
    }

    fn visit_map<A>(self, map: A) -> Result<Self::Value> {
        let _ = map;
        Err(Error::Dummy)
    }
}

pub trait DeserializeSeed<'de>: Sized {
    type Value;

    fn deserialize<D>(self, deserializer: D) -> Result<Self::Value>
    where
        D: DeDeserializer<'de>;
}

pub trait Deserialize<'de>: Sized {

    fn deserialize<D>(deserializer: D) -> Result<Self>
    where
        D: DeDeserializer<'de>;
}

pub type Result<T> = std::result::Result<T, Error>;

pub enum Error {
    Dummy
}

My understanding is that the visit_borrowed_str() method is used when the visitor can borrow directly from the original input, therefore letting it store a &'de str instead of needing to make a new allocation. That's obviously incompatible with transcoding because you need to give the visitor access to a mutated string.

As you've mentioned, you are trying to work around this by first parsing into a newly allocated string then copying the string into a buffer and giving the visitor references to part of that buffer. However, what happens when the buffer reaches its capacity and needs to grow? Your visitor would be left with a reference that no longer points to anything useful, which is not going to end well.

You should actually call visitor.visit_str() instead of visitor.visit_borrowed_str() because there is no way for the string in the deserialized value to keep a reference to the original input.

You might be able to make this work by giving your Deserializer two lifetimes such that buf outlives input. However, if we take a step back we see these buf shenanigans will probably make using the deserializer quite awkward (you need to allocate a Vec on the stack and make sure the deserialized data never outlives it), and it was all pointless anyway because in order to call visitor.visit_borrowed_str() we needed to make two extra allocations/copies (one when transcoding in parse_string(), then again when using buf.append() to copy into the buffer). So by trying to take a zero-copy approach we actually made more allocations and copies.

1 Like

Thank you a lot for your detailed review! That's the best I could ever hope for. Rust community is great, despite what people write in many places.

It came to my mind, but then I also ran into problems. Seems like a better path to go, I'll try it.

I thought, that Rust handles it somehow, so that ownership is transferred to a new range automagically.

I've actually written a version of parse_string() with transcoding directly into buf, but abandoned it to not introduce additional complexity, that could make nailing the error difficult.

You were right. But also, there were problems with lifetimes. There were 3 places with some relations between lifetimes declared, and seems I messed it up: 'a: 'de and 'de: 'a. After removing them and using visitor.visit_str(), all is ok in this minimal example. Now I'll try this with full version.

1 Like

Na, Rust goes to great effort to avoid that sort of magic.

At the end of the day, references are just implemented as pointers and there is no runtime (or code injected at compile time) for tracking who references what or patching up references when something is moved.

Actually, it seems it's not possible:

Looks like it's not possible to deserialize into borrowed strings (or byte arrays), if you cannot refer directly to input bytes. Therefore, I'm just leaving this. Thanks a lot for your effort!

Implemented only for ASCII strings:

https://github.com/loziniak/redbin/commit/e12301896be66c4b1eb521bac823f1de97cee658

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.