Hi!
I am still learning Rust and currently working on an ndarray+mmap example to have something similar that NumPy provides by default, a mmap-backed ndarray. I do this for learning, so please don't argue the usecase.
The following code opens a memmap2 instance and wires it with an ndarray mutable view so that I can access the mmap using the ndarray API.
So far, so good. This code works. But, I am wondering, if my approach of having a struct with two fields is the best approach here.
I need to carry the _mmap instance with me as I haven't found another way to tie it's lifetime directly to the ndarray instance.
Either way, the downside is that I currently have to call array.view.* to access the array but I would like to hide this impl. detail to the user.
All ideas welcome, it's really for brainstorming how to solve such use cases. Maybe there is some smart pointer already solving this trivially?
Thanks in advance!
Chris
use memmap2::MmapMut;
use memmap2::MmapOptions;
use ndarray::prelude::*;
use std::fs::OpenOptions;
use std::path::Path;
pub struct MmapArray1<'a> {
_mmap: MmapMut,
pub view: ArrayViewMut1<'a, u8>,
}
impl<'a> MmapArray1<'a> {
pub fn open(p: impl AsRef<Path>, l: usize) -> anyhow::Result<Self> {
let file = OpenOptions::new()
.create(true)
.read(true)
.write(true)
.open(p)?;
file.set_len(l as u64)?;
Ok(unsafe {
let mut mmap = MmapOptions::new().map_mut(&file)?;
let view = ArrayViewMut1::from_shape_ptr((mmap.len()).f(), mmap.as_mut_ptr());
Self { _mmap: mmap, view }
})
}
}
I saw that you suggested to use Deref and DerefMut before editing the post. I actually haven't and indeed this solves the indirection. Thanks! If there is any further idea for improvement let me know but this solves the additional field accessor :-).
EDIT: Just realized that ops like += then don't work anymore. Probably because deref_mut() is not compatible with it?
use memmap2::MmapAsRawDesc;
use memmap2::MmapMut;
use memmap2::MmapOptions;
use ndarray::prelude::*;
use std::fs::OpenOptions;
use std::io;
use std::ops::Deref;
use std::ops::DerefMut;
use std::path::Path;
pub struct MmapArray1<'a> {
mmap: MmapMut,
view: ArrayViewMut1<'a, u8>,
}
impl<'a> Deref for MmapArray1<'a> {
type Target = ArrayViewMut1<'a, u8>;
fn deref(&self) -> &Self::Target {
&self.view
}
}
impl<'a> DerefMut for MmapArray1<'a> {
fn deref_mut(&mut self) -> &mut Self::Target {
&mut self.view
}
}
impl<'a> MmapArray1<'a> {
pub fn new(r: impl MmapAsRawDesc) -> io::Result<Self> {
Ok(unsafe {
let mut mmap = MmapOptions::new().map_mut(r)?;
let view = ArrayViewMut1::from_shape_ptr((mmap.len()).f(), mmap.as_mut_ptr());
Self { mmap, view }
})
}
pub fn open(path: impl AsRef<Path>, length: u64) -> io::Result<Self> {
let file = OpenOptions::new()
.create(true)
.read(true)
.write(true)
.open(path)?;
file.set_len(length)?;
Self::new(&file)
}
pub fn flush(&self) -> io::Result<()> {
self.mmap.flush()
}
}
There is a hazard here as safe code could replace the memory map and cause the view to point at invalid memory. Array views are cheap to create, so the "proper" approach would be to give the container which holds the memory map a view_mut
method which creates a new view each time you call it. This won't give you the API that you want however.
If ndarray let you create custom storage types for arrays, then you could do this by creating a new storage type which is backed by mmap-ed memory. Unfortunately its RawData
storage trait is not intended to be implemented outside the crate.
There is a hazard here as safe code could replace the memory map and cause the view to point at invalid memory.
Edit: I've realized this will actually happen already when your struct is dropped, because fields are dropped in the order they are declared, and the mmap field comes before the view.
2 Likes
Uh, the dropping-order. Good catch! Alright, this helps and I think I will go the adhoc-create way as you proposed. Thanks!
I deleted my comment. There are two problems. The first is that you have a self-referential struct. There shouldn't be a view
field, there should be a view
method that creates a view and takes &self
. This would tie the lifetime together in the correct way. The other problem is that if you want to not write .view()
then Deref
isn't going to work because it needs to return a reference to something, but the ndarray type you'd create with a method wouldn't live long enough.