Designing API based on slices of mmap


#1

I am using a mmap’ing crate to rewrite some file system code and I need help planning out my data structures.

If you want to follow along, the applicable docs: https://danburkert.github.io/mmap/mmap/struct.Mmap.html

mmaping a file yields a Mmap value. This value encapsulates a pointer into virtual memory, the length of that mapped data, and operations for flushing changes to disk. The value also implements deref to &[u8] which is what I really want to leverage.

Here’s the struct I’m proposing:

pub struct WhisperFile<'a> {
    pub path: PathBuf,
    pub mmap: Mmap,
    pub header: Header,
    pub archives: Vec< Archive<'a> >,
}

I want the WhisperFile to own the Mmap so I can flush. But I want to split out ownership of the mmap data to slices held by Archive. I know the slices are non-overlapping, so maybe something like split_at_mut would be useful? The problem is that I can’t get the slice deref to live long enough (which makes sense). Should I be using runtime reference counting to make sure slices are attached to their mmap? If so, are there any examples of this long-lived slice that people can point me to?


#2

You can just put a Rc<Mmap> in Archive.

If you’re just splitting the file up (and thus wouldn’t waste too much memory this way), another possibility would be modifying the mmap library to allow (a) mapping directly from a File rather than always opening a path, and (b) mapping part of a file rather than the whole thing, neither of which should be a problem with the relevant underlying OS calls.


#3

I’m not sure I could just mmap parts of the file. Reading over some documentation for mmap (beej’s guide on mmap) it looks like:

void *mmap(void *addr, size_t len, int prot,
           int flags, int fildes, off_t off);

where off_t off must be a multiple of page size (usually 4k). I could be dealing with multiple Archives inside of one page. It really depends on the layout of the WhisperFile which can vary wildly. Fine-grain borrowing from the one mmap memory region may be a requirement.

Edit: another issue is how to store the slice &[u8] which has the offsets in to the mmap? Even if I use an Rc<Mmap> in the Archive it won’t have the offset and length of the slice available. Will I need to create my own deref impl for tracking those values and hiding them behind the scenes?

&archive[..] should yield the same byte slice as &mmap[ archive_start .. archive_end].


#4

Given this limitation on page offsets (and even then, it would be expensive to make the mmap syscall repeatedly). I thought I could do some lifetime annotation magic but Rust can’t track the data like I thought. Rc with a reimplementation of the Archive deref to &[u8] will work.

pub struct Archive {
	seconds_per_point: u32,
	points: usize,

	mmap: Rc<Mmap>,
	begin: usize,
	end: Option<usize>
}
impl Deref for Archive {
	type Target = [u8];

	fn deref(&self) -> &[u8] {
		match self.end {
			Some( end ) => &self.mmap[ self.begin .. end ],
			None        => &self.mmap[ self.begin .. ]
		}
	}
}

Gets me what I want!

This could probably be encapsulated as a “MmapView” or similar idea.