Storing reference together with owner

My application memory-maps a large block of binary data into memory. That binary data consists of fixed structures with 20 bytes each. I have a function that performs all the unsafe reinterpreting:

fn reinterpret_hashblock<'a>(s: &'a [u8]) -> &'a [[u8; 20]] {
    const ELEM_SIZE: usize = 20;
    assert_eq!(ELEM_SIZE, mem::size_of::<[u8; 20]>());
    let elem_count = s.len() / ELEM_SIZE;
    if s.len() % ELEM_SIZE != 0 {
        panic!("input is not a multiple of the hash size");
    }
    unsafe { slice::from_raw_parts(s.as_ptr() as *const _, elem_count) }
}

If I just call reinterpret_hashblock() on the Mmap Rust borrows from the Mmap and everything is fine. But I cannot encapsulate this in a struct as I cannot return a reference to hash_block because it borrows from hash_block which is "owned" by the local function:

struct HashSet<'a> {
    mmap: Mmap,
    sorted_hashes: &'a [[u8; 20]],
}

impl HashSet<'_> {
    fn new(file: &File) -> Self {
        let hash_block_data: Mmap =
            unsafe { Mmap::map(&file).expect("failed to map hash block into memory") };
        let hash_block = reinterpret_hashblock(&hash_block_data);
        Self {
            mmap: hash_block_data,
            sorted_hashes: hash_block,
        }
    }
   // other accessor functions working with sorted_hashes
}

Basically what should end up happening is that Rust keeps the Mmap around for as long as the PasswordSet is kept around. Is there any way to solve this nicely?

This is usually called a self-referential type, which Rust doesn't safely support, but it is possible with the help of something like rental or owning_ref.

1 Like

I've seen owning_ref (probably should've mentioned that in my initial post), but I was kind of put off by the fact that it doesn't seem maintained and that people found lots of unsoundness within it and nobody seemed to actually deal with it. rental seems abandonend (its GitHub is archived and the author says he won't maintain it anymore).

The best way to fix this issue is to restructure your code. For example:

struct HashSet<'a> {
    mmap: &'a Mmap,
    sorted_hashes: &'a [[u8; 20]],
}

impl<'a> HashSet<'a> {
    fn new(hash_block_data: &'a Mmap) -> Self {
        let hash_block = reinterpret_hashblock(hash_block_data);
        Self {
            mmap: hash_block_data,
            sorted_hashes: hash_block,
        }
    }
   // other accessor functions working with sorted_hashes
}

This moves the ownership question to the level where it can be solved.

5 Likes

Another way is to separate out the checks and the creation of the slice, something like

// This can be used in `[u8; HASHBLOCK_ELEM_SIZE]` too, to elide some checks
// against `[u8; 20]`
pub const HASHBLOCK_ELEM_SIZE: usize = 20;

pub struct HashSet {
    mmap: Mmap,
    count: usize,
}

// TODO: Proper error type
fn hashblock_count(s: &[u8]) -> Result<usize, ()> {
    match s.len() % HASHBLOCK_ELEM_SIZE {
        0 => Ok(s.len() / HASHBLOCK_ELEM_SIZE),
        _ => Err(()),
    }
}

impl HashSet {
    // Only create `HashSet` via this function to preserve invariants
    pub fn new(file: &File) -> Self {
        let mmap: Mmap = unsafe { Mmap::map(&file).expect("failed to map hash block into memory") };
        let count = hashblock_count(&mmap).expect("input is not a multiple of the hash size");
        Self { mmap, count }
    }

    // This is cheap, just call it when you need the blocks
    pub fn blocks(&self) -> &[[u8; HASHBLOCK_ELEM_SIZE]] {
        // Safety: Invariants enforced at struct creation
        unsafe { slice::from_raw_parts(self.mmap.as_ptr() as *const _, self.count) }
    }
}

Playground.

2 Likes

Thanks everyone for your answers. I ended up just moving the Mmap out of the structure entirely since I only really needed the sorted_hashes field, the mmap field was just there because I knew that I needed it in that struct for lifetime purposes. I didn't want to go for enforcing invariants at creation time since the tradeoff of separating out invariants from their point of use versus just putting one additional call at the call site doesn't seem to be worth it to me in this case.

Out of curiosity: Is this something Rust eventually wants to support or is this in some way fundamentally incompatible with Rust's way of doing lifetimes?

Self referential types are difficult to reconcile with how Rust's current best effort aliasing model: Stacked Borrows (see why here).

As far as I know there is an interest, but no dedicated team and it would be a large effort. I.e. if it does come to be, it won't be anytime soon.

Rental isn't maintained anymore, Ouroboros is its successor.

4 Likes

See also Saving variables interwoven by reference in one structure

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.