Using ManuallyDrop to avoid copying

I am having an enum variant like,

Decode {
    #[allow(dead_code)] // block is only meant to drop offsets and entries
    block: Vec<u8>,
    offsets: ManuallyDrop<Box<[u8]>>, // unsafe ownership into `block`
    entries: ManuallyDrop<Box<[u8]>>, // unsafe ownership into `block`
},

which is constructed using a block of bytes, that is made up of offsets
followed by entries.

On way to do is:

let offsets = block[..n].to_vec();
let entries = block[n..].to_vec();

which involves copying. To avoid this, I am doing

let (offsets, entries) = unsafe {
    let ptr = block.as_mut_ptr();
    let len = block.len();
    let cap = block.capacity();

    let offsets = Vec::from_raw_parts(ptr, n, n).into_boxed_slice();
    let p = ptr.add(adjust);
    let entries = Vec::from_raw_parts(p, l-n, c-n).into_boxed_slice();
    (ManuallyDrop::new(offsets), ManuallyDrop::new(entries))
};

I want to know whether it is okay to it this way. Note that along with
offsets and entries, the original block is owned by the
type. It is my understanding that when the type's value go out of
scope, drop will be called only for block.

into_boxed_slice will trim excess capacity, which doesn't matter for your first Box, but I would worry about confusing the allocator on the second.

You also have a possibility of mutable aliasing between the Vec and the two boxes, which would be undefined behavior.

Instead, how about storing just one Box and the first length? This will be smaller overall, and you can add trivial methods for offsets and entries.

struct Decode {
    blocks: Box<[u8]>,
    n: usize,
}
2 Likes

"by confusing the allocator" is because of the following realloc() call
inside RawVec::shrink_to_fit() ?

    match self.a.realloc(NonNull::from(self.ptr).cast(),
                         old_layout,
                         new_size) {

I will try suggested shape for my type. I think having accessor methods should
make it clean and simple.

Btw, after constructing the value, above code path is entirely readonly.

Yes, realloc is the problem. Even the first one would be a problem if RawVec didn't bother checking that the capacity already matches the length. I wouldn't recommend relying on such implementation details.

1 Like

Instead of ManuallyDrop<Box<_>>, to afterwards never drop, you should use &'static _ or NonNull<_> to express the borrowing idea (forbidden without unsafe because Rust sees it as self-referential).

Here is an implementation using Pin to enforce the implicit invariant: the bytes on the heap will not move / be mutated as long as block is not dropped.

This is very related to this other thread, by the way, except for the fact that you have two borrows instead of one.

2 Likes

Thanks @cuviper, for pointing out the mistake with my into_boxed_slice() code. Meanwhile I will try @Yandros suggestion.

Definition:

Decode {
    block: Vec<u8>,
    count: usize,
    offsets: &'static [u8], // point into block
}

Value construction:

let offsets = &block[4..adjust] as *const [u8];
Ok(MBlock::Decode {
    block,
    count: count.try_into().unwrap(),
    offsets: unsafe { offsets.as_ref().unwrap() },
})

If you only have one self-reference, ::owning_ref allows not having to use unsafe:

mod lib {
    use ::core::{
        convert::TryInto,
        fmt,
    };
    use ::owning_ref::{ // 0.4.0
        BoxRef,
    };

    pub
    struct Decode {
        block_offsets: BoxRef<[u8]>,
        count: usize,
    }
    
    pub
    struct DecodeNew {
        pub block: Vec<u8>,
        pub count: u32,
        pub adjust: usize,
    }
    
    impl From<DecodeNew> for Decode {
        fn from (DecodeNew { block, count, adjust }: DecodeNew) -> Self
        {
            let block = BoxRef::from(block.into_boxed_slice());
            let block_offsets = block.map(|block| {
                &block[4 .. adjust]
            });
            Self {
                block_offsets,
                count: count.try_into().unwrap(),
            }
        }
    }
    
    impl Decode {
        #[inline]
        pub
        fn block (self: &'_ Self) -> &'_ [u8]
        {
            self.block_offsets.as_owner()
        }
        
        #[inline]
        pub
        fn offsets (self: &'_ Self) -> &'_ [u8]
        {
            self.block_offsets.as_ref()
        }
    }

    impl fmt::Debug for Decode {
        fn fmt (
            self: &'_ Self,
            stream: &'_ mut fmt::Formatter<'_>,
        ) -> fmt::Result
        {
            stream
                .debug_struct("Decode")
                .field("count", &self.count)
                .field("offsets", &self.offsets())
                .field("block", &self.block())
                .finish()
        }
    }
}

fn main ()
{
    use lib::*;

    let decode: Decode = DecodeNew {
        block:  (0 .. 4).chain((0 ..= 42).rev()).collect(),
        adjust: 12,
        count:  27,
    }.into();
}
  • Playground

  • I have decided to use a "named args" constructor pattern because why not :slight_smile:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.