Am I triggering undefined behavior here?

One of the cases that has been brought forward for why we should require &T to point to a valid T is that people want &! to be uninhabited so that functions taking &! can be just removed from the binary.

Ah, that's a good point! ! has size 0, so my current API would happily create a &Bytes<!> reference, which isn't good. It's a pity the size of ! couldn't be defined as, say, usize::MAX instead. Though I can already think of a lot of reasons why that would cause problems. :confused:

Unfortunately, I think that issue alone is a good enough reason to just stick with the obviously correct implementation where Bytes<T> et al. are just wrappers around a pointer.

Also, you seem to assume that a &[u8] can point to any data.

Actually I don't: note how my example at the top makes Bytes::from_ref() be unsafe because padding bytes don't have defined values. It's only creation from a slice that's safe.

At this point I have to insert the mandatory warning that mem-mapping interacts very badly with Rustโ€™s reference guarantees. See this discussion for further details.

I think for my specific use-case I think I'm OK with that, as this would be basically for an object database where other processes writing to the to the file is already a disaster.

FWIW the big picture motivation of mem-map is predictable memory usage: I want it to be easy to write code that accesses data stored persistently with "normal" Rust API's based on returning references and the like. If deserialization creates a copy of the data being deserialized, you can't easily do that without also risking unbounded memory usage.

For a simplified example, imagine an API for a persistently stored key-value tree:

impl KeyValueTree<T> {
    fn get(&self, idx: u64) -> Option<&T>;
}

Who owns &T if it had to be deserialized? You can put Ref<T> wrappers everywhere that deallocate the deserialized T when it goes out of scope, but that gets ugly fast in my particular use-case (I've tried a prototype along those lines before). Mem-mapping appears to have the potential to neatly side-steps this problem, essentially letting the OS's page cache do all the work. (in fact, I suspect memmap will be a bit slower than the alternative for various reasons, but I'm more concerned about having a persistant data API that's easy to use and "just works")

1 Like