Reader from *const u8?

I'm trying to call this function:

pub fn from_read<R, T>(rd: R) -> Result<T, Error>
where R: Read,
      T: DeserializeOwned
{
    Deserialize::deserialize(&mut Deserializer::new(rd))
}

where my input is a *const u8. Is there any way to construct a reader from a raw pointer? (The buffer points to MessagePack data, which can be read in a streaming fashion, hence why I don't want to pass in the length).

The approach you're describing sounds like it's going to be difficult or impossible to implement soundly. Supposing you could construct such a reader, what happens if Deserialize::deserialize wants to read more bytes than are currently in the streaming buffer? How are you going to stop it from reading past the end of the buffer and causing undefined behavior?

(Normally this would be handled by Read::read returning Ok(0) when the whole buffer has been read, and then from_read returning an Err if not enough input was available, but if you're not passing in the buffer length then your reader can't tell when to stop reading bytes.)

Wrap your pointer in your own type, and then you'll be able to implement any trait you want for it.

pub struct MsgPackPtr(const *u8);

impl Read for MsgPackPtr {}

As pointed out in other replies, there's no way to implement this soundly without knowing the length of the allocation, but this should get you started.

1 Like

I think the idea is that I assume the pointer points to MessagePack data. Because MessagePack data describes its own length, I can trust the MessagePack deserializer to do the right thing.

The problem is since you've mentioned it's in streaming fashion it's possible that only part of the msgpack data is received, which makes the correct msgpack parser to try to read beyond the initialized buffer.

1 Like

I don't know how MessagePack works, but what if the size of the buffer happens to be zero? The parser is going to try to read something, even if it's just one byte, and that's UB.

If Deserialize::deserialize stops after reading one message, and if you can guarantee that the buffer always contains (at least) one complete message, then it would be possible to do this soundly, but it seems fragile. (What if you get an ill-formed message over the wire?)

This is for C FFI (passing complex / nested data), so I can trust the caller will give a valid buffer of MessagePack data.

But in the end, I decided the risk of unsoundness / UB if the caller passed an invalid buffer wasn't worth it, so I made a length parameter mandatory. Thanks for explaining the tradeoffs.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.