Understanding slices, iterators and pointers

In an effort to understand how Rust vectors, slices and iterators might work behind the scenes, I’ve tried to make my own readonly versions using raw pointers. This is what I’ve got so far (Playground link):

mod my {
    use std::isize; // for isize::MAX
    use std::marker::PhantomData;

    pub struct ByteMem {
        start: *const u8,
        end: *const u8,
    }
    impl ByteMem {
        pub fn from_vec(data: Vec<u8>) -> Option<Self> {
            // Because I'm lazy I'll punt on the edge cases for now.
            if data.is_empty() || data.capacity() != data.len() {
                None
            } else {
                // Rip apart the vec and get some pointers.
                let ptr = data.as_ptr();
                let size = data.len();
                assert!(size < isize::MAX as usize);
                
                std::mem::forget(data);
                unsafe {
                    Some(Self {
                        start: ptr,
                        end: ptr.add(size),
                    })
                }
            }
        }
        // Get an arbitrary byte
        pub fn get(&self, index: usize) -> u8 {
            unsafe {
                let ptr = self.start.add(index);
                if ptr < self.end {
                    *ptr
                } else {
                    panic!("Out of bounds");
                }
            }
        }
        // Create a cursor for reading the bytes sequenctially
        pub fn cursor(&self) -> Cursor {
            Cursor {
                ptr: self.start as *mut _,
                end: self.end,
                _phantom: PhantomData,
            }
        }
    }
    impl Drop for ByteMem {
        // Rebuild the vec and let it drop.
        fn drop(&mut self) {
            unsafe {
                let len = self.end as usize - self.start as usize;
                Vec::<u8>::from_raw_parts(self.start as *mut _, len, len);
            }
        }
    }
    pub struct Cursor<'a> {
        ptr: *mut u8,
        end: *const u8,
        _phantom: PhantomData<&'a u8>,
    }
    impl<'a> Cursor<'a> {
        // Gets the item that's currently being pointed at.
        pub fn get(&self) -> u8 {
            unsafe { *self.ptr }
        }
        // Acts like an iterator
        pub fn next(&mut self) -> Option<u8> {
            if !std::ptr::eq(self.ptr, self.end) {
                unsafe {
                    let val = *self.ptr;
                    self.ptr = self.ptr.add(1);
                    Some(val)
                }
            } else {
                None
            }
        }
    }
}

fn main() {
    let v = vec![0u8, 1, 2, 3];
    let mymem = my::ByteMem::from_vec(v).unwrap();
    println!("{}", mymem.get(3));

    let mut cursor = mymem.cursor();
    println!("{}", cursor.get());
    for _ in 0..6 {
        println!("{:?}", cursor.next());
    }
}

Is this horribly unsound or have I got the right idea?

You’ve got the right idea :slight_smile: … but Cursor::get is unsound :sweat_smile:

It assumes ptr points to valid memory, but this is not the case once ptr == end. Come to think about it, your get is actually peeking into the future, and as such should return an Option<u8>.

by the way, you do not need the pointer to be *mut _, since you are only read-dereferencing it (your .next() mutates the pointer itself, not the pointee).

Finally, in case you are curious, Rust’s ByteMem's equivalent, that is, a fat pointer to a slice of bytes, uses a length as the 2nd field instead of an end address, but the iterator version does use an end pointer (better symmetry for a DoubleEndedIterator)

You may be interested in Example: implementing Vec from the Rustonomicon.

2 Likes

Thanks for the pointers! I’m glad I’m on the right track at least because I feel more confident using slices etc when I have some idea of what’s going on.

I’m not sure how I missed get's unsoundness :disappointed:. But you’re right, if I’d thought of it as “peek” then I might not have made that error.

Thanks for the pun :smile:

3 Likes

Lol, agreed!

add safety rule is being broken.

1 Like

Ah, it took my a minute to work out why. So it looks like I either need to use wrapping_offset or be checking the index before I call add. Otherwise it’s UB.

In case anyone is following along, here’s an updated version that makes the suggested changes (hopefully): Link

Now I’m off to read the nomicon and learn how to do it properly.