"Owning" structure of a parsed type. Do I need Pin? How to I get rid of the lifetime?

Hey,

The problem while not widely popular seems common enough, but I couldn't find any satisfying answer online. Maybe I didn't search for the right stuff or couldn't understand the answer. Overall, my goal is the following. I have a bytes that I parsed in a bunch of slices, similar to the snippet under.

pub struct Parsed<'a> {
    a: &'a [u8],
    b: &'a [u8],
}

pub fn parse<'a>(bytes: &'a [u8]) -> Parsed<'a> {
    Parsed { a: &bytes[..1], b: &bytes[1..] }
}

This is fairly typical when parsing binary format, since it allows for zero-copy parsing. Now, I want to be able to combine this Parsed structure with the ownership of the bytes in order to store it, say in a Vec. I looked at different possibility, and found the crate owning_ref.

I couldn't get the crate to working with my struct, but I also wanted to understand how to actually do it. So I came up with:

pub struct OwningParsed<'a> {
    owner: Box<[u8]>,
    reference: *const Parsed<'a>,
}

pub fn parse_owned<'a>(owner: Box<[u8]>) -> OwningParsed<'a> {
    let ptr: *const [u8] = &*owner;
    OwningParsed {
        owner,
        reference: &parse(unsafe { &*ptr }),
    }
}

impl<'a> Deref for OwningParsed<'a> {
    type Target = Parsed<'a>;

    fn deref(&self) -> &Parsed<'a> {
        return unsafe { &*self.reference };
    }
}

Question 1: Do I need Pin<Box<[u8]>>?
From my understanding it's not needed, because OwningParsed is not self referenced (I can move it without problem since the slice is on the heap) and even if the type was Pin<Box<[u8]>>, I could move it since it's Unpin.

Question 2: Can I define OwningCert without the lifetime?
The reason being that in this context, the lifetime is somewhat meaningless and it forces anybody that uses this structure to also define a lifetime. I tried setting it to 'static which seems a bit hacky, but had issue implementing Deref.

Thank you in advance for your replies.

No.

As you've noticed with how Unpin would undermine your OwningParsed, Pin doesn't help you here. It's mainly for self-referential types and other very exotic use cases.

You could, but there is a better way.

The key insight is that you don't want to be using references in this case, instead you want something where you've got shared access to subsections of a larger owned buffer.

The bytes crate is designed for exactly this sort of thing. It provides a bytes::Bytes type which is kinda like an Arc<[u8]>, except slice operations can give you a Arc<[u8]> which points into part of the original buffer and shares the reference count.

Here is an example:

use bytes::Bytes;

#[derive(Debug)]
pub struct Parsed {
    first: Bytes,
    second: Bytes,
}

pub fn parse(input: impl Into<Bytes>) -> Parsed {
    let input = input.into();
    let first = input.slice(..10);
    let second = input.slice(10..);
    
    Parsed { first, second } 
}

You can use the bytestring crate if you need some sort of string that is backed by a Bytes buffer.

3 Likes

If you need to ask whether you need Pin, then you don't.

Oh, it absolutely is, as far as the borrow checker is concerned. Since memory can be anywhere, and references can point anywhere, borrowck doesn't stand a chance of differentiating between heap allocations and other regions. For the borrow checker, it's as if Box stored its referent inline. Thus, your structure is considered self-referencing.

2 Likes

Thanks a lot for the answer.

The byte crates looks quite interesting and might work in my case, but it seems there is few problems. Maybe the solution already exist in the crate (I would need to look a bit more), but from the premise of the crate, I don't think so.

The reason being that in this case Parsed now has the requirement of the input type, which is roughly "on the heap and maybe a Arc". Moreover, this isn't my case, but I think it's quite realistic that this structure comes from a library you use. For instance, parsing a X509 certificate with webpki.

To some extent, the byte crate allows to create an other OwningParsed that duplicate the non-copy version, but with the duplication of the code required to parse it. It seems to really shine there, because it avoids to have a bunch of Vec<u8> which would require individual memory allocations.

If that's the case, there is no way to avoid the lifetime and you'll need to create a self-referential type. I would use a crate like ouroboros to define OwningParsed instead of writing your own unsafe code.

#[ouroboros::self_referencing]
struct OwningParsed {
    buffer: Vec<u8>,
    #[borrows(buffer)]
    parsed: Parsed<'this>,
}

fn main() {
  let buffer = vec![..];
  let parsed = OwningParsed::new(buffer, |buf: &Vec<u8>| parse(buf));
}

Indeed, I assumed a bit too quickly said "If it can move with a memcpy then it's not", but of course, Box could change it's implementation (theoretically at least) and break that, so it has to be considered self-referencing.

The crate ourroboros looks appropriate. I wanted to look at the internal, but cargo expand has an error, so still need some time.

It turns out that searching a bit more I found this blog post which has a lot of similarities and insight, especially at the end, there is a link to a video which is exactly my problem.

I mentioned something with 'static in my first post and that's what is done in the blog post I linked above. That said, I had a problem, especially with deref. I don't think you can implement the trait deref (maybe something will correct me here), but I came up with this. (Start from the code in my first message and add)

// replace OwningParsed from the first message with
pub struct OwningParsed {
    // The order of the fields matter here, see blog post.
    reference: Parsed<'static>,
    owner: Box<[u8]>,
}

impl<'a> Parsed<'a> {
    pub fn get_a(&self) -> &'a [u8] {
        self.a
    }
}

impl OwningParsed {
    fn my_deref<'a>(&'a self) -> &'a Parsed<'a> {
        &self.reference
    }
}

That works well, for instance, the following fails:

let a = {
    let parsed = parse_owned(Box::new([1, 2, 3, 4, 5]));
    parsed.my_deref().get_a()
};

The reason why the trait deref doesn't work as far as I could find is that I needed to return Parsed<'static> which would then create a dangling pointer in the previous example. i.e., it would be similar to implementing my_deref as:

// Left the 'a notation for comparaison, but they are not needed here.
fn my_deref<'a>(&'a self) -> &'a Parsed<'static> {
    &self.reference
}

Anyway thanks a lot for your help, it was quite insightful.

Yeah, if you are using transmute() to cast the lifetime away it's really important that you limit the scope as much as possible and that the 'static lifetime never makes its way into the public API. A coworker made that mistake and we ended up publishing several versions of code which could trigger a trivial use-after-free segfault.

1 Like

In order to have the needed stable addresses, every solution to self-reference in Rust will involve one of:

  • a heap allocation
    • that is owned by the same struct that owns the references
    • or that is leaked
  • Pin (but you would have to write unsafe code to take advantage of Pin's guarantee; I'm not aware of a library that provides Pin-based self-reference safely)
1 Like

/me tries to figure out how “leaking memory” could ever happen without also involving “a heap allocation”.

:sweat_smile: I was trying to be thorough and the case of a struct owning a Box-like and references into it (what ouroboros does) is different from the case of just using Box::leak and taking references into the result, but yes, both involve heap allocations.

Edited to clarify.

1 Like

Since nobody's mentioned it yet: this shape (owning version of zero-copy parsed views) is essentially exactly what the yoke crate exists to enable. (Although it's more aimed/directed at the Cow-like usage for zerovec/icu4x, it also trivially works for always-borrowed views. And honestly, maybe more easily, since IIRC you might need to request a "manual" lifetime covariance proof when using Cow, but don't for &_.) With yoke, you'd have type OwningParsed = Yoke<Parsed<'static>, Box<[u8]>> (or whatever[1] owning cart type you want to use).

If you can get away with using Bytes (or Cow and a fn into_owned(self) -> Self<'static>, or just having a Path/PathBuf style pair), though, doing so is much simpler. You should also consider the tradeoff involved — if you keep the whole source buffer around to borrow from, you're keeping the whole source buffer in memory, whereas if you make new, smaller owning buffers for only the parts your view actually needs, you can release the larger source buffer at the cost of copying to the new buffer(s) and some potential memory fragmentation caused of the smaller allocations.


  1. The yoke crate currently allows &mut T as a cart type, but the use of such is to be discouraged because of potential aliasing issues, pending further understanding of the Rust Abstract Machine's requirements, and language extensions to manipulate them. ↩︎

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.