Is this deserialization API safe?

I'm designing a low-overhead serialization and deserialization library, similar to Flatbuffers or FIDL, but rust native and without an external schema definition language.

In my current design, the user would annotate an empty struct and an impl block with empty methods:

#[data::structure]
struct Foo;

#[data::structure_impl]
impl Foo {
  fn get_u32(&self) -> u32;
}

What would then expand to something like this:

use std::{mem, ptr, pin::Pin, marker::PhantomPinned};

struct Foo {
    _phantom_pinned: PhantomPinned,
}

impl Foo {
  fn new(data: &[u8]) -> Pin<&Foo> {
    if data.len() != mem::size_of::<u32>() {
        panic!();
    }
    unsafe {
        Pin::new_unchecked(mem::transmute(data.as_ptr()))
    }
  }

 #[inline(never)]
  fn get_u32(self: Pin<&Foo>) -> u32 {
    unsafe {
        let mut value: u32 = 0;
        let src: *const u8 = mem::transmute(self.get_ref());
        let dst: *mut u8 = mem::transmute(&mut value);
        ptr::copy_nonoverlapping(src, dst, mem::size_of::<u32>());
        value
    }
  }
}

(Not shown is code that would swap the bytes in value for big-endian platforms, but which would compile into a no-op on little-endian platforms.)

Foo::get_u32 is unsafe if it can be called on arbitrary &Foo, but I think is safe if called on a Pin<&Foo> from Foo::new, because Foo::new checks that the data slice is large enough to make the ptr::copy_nonoverlapping safe.

Is this safe, and is this the right way to do this?

In addition to safety concerns, is Pin the right way to do this? All I need to do is make sure that users can't call Foo::get_u32 without going through Foo::new or an unsafe block.

It seems like &Path does what I want &Foo to do, i.e. I can't construct or copy a Path directly, but it doesn't use Pin. And, Pin seems to usually be used for self-referential things, which this isn't. However, since this is a proc-macro that expands in the users's crate, I can't use visibility to prevent Foo from being constructed, since they could write code alongside the generated code in the same module.

(Here's a playground link where I was experimenting with this: https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=42ff3b35294907291d1e7e54601fe3b5)

Your code seems to use Pin and transmute while it could have been accomplished with simpler operations. Maybe it just makes it harder to analyze :slight_smile:

Using &Foo to store what's in reality a *const u8 doesn't look right. And it should not be necessary to use Pin to just accomplish what you wanted - just use privacy rules to avoid direct construction of Foo; maybe for example generate an inline module and let have Foo have private fields and public methods.

You could let Foo just be masquerading as a u8 pointer, for example like this:

    #[derive(Copy, Clone)]
    pub struct Foo<'a> {
        ptr: *const u8,
        life: PhantomData<&'a u8>,
    }

that seems like the simplest approach, with a thin pointer?

2 Likes

There's ptr.cast() that's a little bit safer than transmute.

Casting u8 pointer to some other struct type is risky.

  • I'm not sure if zero-sized types even guarantee having unique pointers (what if the optimizer decides that you can't ever read anything from behind that pointer, so the pointer value can be ignored?)

  • If the struct contained some other fields, it could require bigger alignment. It's not safe to make incorrectly-aligned reference, even if you don't plan to dereference it (the optimizer could make assumptions about the "unused" bits of the pointer).

I think using a *const u8 is definitely the sensible approach. Although, I think that Foo<'a> is slightly less nice type to expose to users than &'a Foo, but if I can't get that to work, I'll definitely go with the *const u8 approach.

Ahh, thanks! I didn't know about that.

Those are definitely good points, and it seems like the compiler would be allowed to do a lot behind the scenes to break this. I did find this thread: https://github.com/rust-lang/rfcs/pull/2040 and a lot of the comments seem to indicate that they're aware of the use of references to zero sized types as opaque pointers, so it seems like it might continue working.

If you absolutely want a &Foo it seems like Foo should be #[repr(transparent)] struct Foo { value: u8, whatever_markers: PhantomData } i.e. you can interchange a pointer to Foo and u8 that way. But I can't recommend it; the USG seem to suggest that this will not properly give you access to the whole slice, just one u8 location!

With that said, are there more interesting parts of your deserialization API that have not been shown? Does it support more than just reading single values, like the u32 shown here?

1 Like

I think using something like Foo<'a> is completely idiomatic. It's less so to have something pretending to be a reference when it's really not.

1 Like

I'm definitely curious what the USG have to say about situations like this, do you have a link?

Yeah, I'm working a derive-like proc macro that allows low overhead serialization and deserialization of user-defined structs.

I don't have a link - read from conversations across the forums (there aren't that much formalized rules yet, it's a work in progress). I think it's related to miri and how it would treat the validity of the pointer after a reference-to-pointer cast.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.