Manual allocation timing for arrays

Hi there,

I'm currently developping a little in-RAM storage driver for a project. I want to be able to represent a storage of several gigabytes even when there's not enough RAM for that.
In order to make this work, I represent the storage as a list of 256-bytes sector, which are not allocated in memory until they are used.
Here the current implementation:

struct RAMStorage {
    sectors: usize,
    content: Vec<Option<[u8; 256]>>
}

When one wants to read a sector, if it's a None value a zeroed array. This means that, until a sector is written, it is not allocated in memory.
The problem is that, the way Rust's enums work, each entry in the vector takes 257 bytes, as the size of an enum is (approximately) the size of the largest field in it.

What I'd like is a way to store an array of 256 bytes which is by default not allocated in memory (so we can't access the data in it) but can be allocated whenever I want (and then we can access the data in it).

The main problem is that I don't want to use any unsafe code. The crate I'm working on has #[forbid(unsafe_code)] and I don't want to remove it just for this part.

So, is there any native way to achieve this without using unsafe code, and if possible without any external crate?

Thanks in advance for your help :smiley:

EDIT 1 : I've thought about using a HashMap this way:

use std::collections::HashMap;

struct Storage {
    sectors: usize,
    content: HashMap<u64, [u8; 256]>
}

Where an u64 key is the sector's number. But it's a lot slower to use an HashMap than a simple Vec.

You could add a field called initialized that used one bit per sector to store if it's initialized or not. Note that with the current design, only the sectors at the end of the vector can be uninitialized without using up memory. Similarly resizing the vector will be expensive and involve a full copy that requires double the amount of memory of the size of the vector during the resize.

The problem is that I need to be able to keep any sector unitialized (the storage can be highly fragmented).

Also I've thought about this structure:

struct Storage {
    sectors: usize,
    content: Vec<Sector>
}

struct Sector {
    address: u64,
    content: [u8; 256]
}

But the problem is that sectors will be disordered, which would result in very high latency.
And if I use a HashMap to store the sectors depending on their addresses, there will be a cost for both allocating new sectors and accessing them, which I'd like to avoid.

I mean at this point you're implementing a memory allocator.

So there is no way to make this using std?
I was thinking of a native type like this:

impl<T> TheNativeType<T> {
  pub fn allocate(&mut self, value: T) { ... }
  pub fn unallocate(&mut self) { ... }
  pub fn get_content_if_allocated(&self) -> Option<T> { ... }
}

Such a type would be perfectly safe, and just perform internal (unsafe) allocation.

You can just use the inbuilt allocator for that

struct TheNativeType<T> {
    inner: Option<Box<T>>,
}
impl<T> TheNativeType<T> {
    pub fn allocate(&mut self, value: T) {
        self.inner = Some(Box::new(value));
    }
    pub fn unallocate(&mut self) {
        self.inner = None;
    }
    pub fn get_content_if_allocated(&self) -> Option<&T> {
        self.inner.as_ref().map(|inner| *inner)
    }
}

That's exactly what I was looking for - I totally forgot Box<T> was performing allocations on the heap and so we can use it for manual allocation.

Thanks a lot for your answer :smiley:

1 Like