Unsafe Code Review for Growable Non-Moving Array

lalaithion · January 18, 2022, 6:44pm

I've written a data structure that behaves like a Vector, but never moves its underlying storage or allows modification of existing indices, so it's legal to grow it while iterating over it, for example.

However, this is my first time writing unsafe Rust, and so I'm not quite sure I've correctly maintained the invariants Rust expects. I'm looking for unsafe Rust code review for safety and correctness first and foremost. I'm also happy to receive generic code review, but I'm not really looking for API design critiques. I know the API of this structure is really weird, but I think it's pretty minimal and would be adapted by a wrapper type for specific use cases.

Code: NonMovingGrowableVec - Rust Repl - Replit

kornel · January 19, 2022, 12:40am

I'm not a repl person, so I'm not sure how am I supposed to view the code.

But try cargo miri test to check if your tests don't invoke obvious UB, and cargo fuzz to shake bugs out of it.

lalaithion · January 19, 2022, 2:13am

The code is a bit long, but I've copied and pasted it here. You should also be able to hit "show files" and then navigate to "src/main".

use std::ops::Index;
use std::cell::UnsafeCell;
use std::result::Result;
use std::fmt;

/// UnsafeOption is a less-optimized but safer version
/// of MaybeUninit, because whether or not it is initialized
/// can be inspected.
struct UnsafeOption<T>(UnsafeCell<Option<T>>);

impl<T: fmt::Debug> fmt::Debug for UnsafeOption<T> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        unsafe{
          match self.as_ref() {
            Some(x) => f.debug_tuple("UnsafeSome")
              .field(&x)
              .finish(),
            None => f.debug_tuple("UnsafeNone")
              .finish(),
          }
        }
    }
}

impl<T> UnsafeOption<T> {
  fn none() -> Self {
    UnsafeOption(UnsafeCell::new(None))
  }

  unsafe fn as_ref(&self) -> Option<&T> {
    (&*self.0.get() as &Option<T>).as_ref()
  }

  unsafe fn set(&self, val: Option<T>) {
    *self.0.get() = val;
  }
}

/// NonMovingGrowableVec<T> is a variant of Vec<T> which does
/// not ever move it's contents, meaning that it's safe to hold
/// interior references while appending to it.
///
/// The performance guarantees of this structure are asymptotically
/// equivalent to a standard resizing vector using size doubling, except
/// that indexing requires two pointer dereferences rather than one, and
/// growing the vector only requires allocation, not copying.
#[derive(Debug)]
struct NonMovingGrowableVec<T> {
  // the capacities of the vectors in this array are
  // 1, 1, 2, 2^1, 2^2, ..., 2^63, giving it a total
  // potential capacity of 2^64, and the capacity of
  // vectors before each index 2^(i-1)
  data: [UnsafeOption<Vec<UnsafeOption<T>>>; 65],
}

fn items_before_outer_index(index: usize) -> u64 {
  if index == 0 {
    0
  } else {
    2_u64.pow((index - 1) as u32)
  }
}

fn items_at_outer_index(index: usize) -> u64 {
  if index == 0 {
    1
  } else {
    2_u64.pow((index - 1) as u32)
  }
}

impl<T> NonMovingGrowableVec<T> {
  const NONE: UnsafeOption<Vec<UnsafeOption<T>>> = UnsafeOption(UnsafeCell::new(None));

  const fn new() -> Self {
    NonMovingGrowableVec {
      data: [Self::NONE; 65],
    }
  }

  fn get(&self, index: u64) -> Option<&T> {
    let outer_index = (u64::BITS - index.leading_zeros()) as usize;
    let inner_index = (index - items_before_outer_index(outer_index)) as usize;

    unsafe {
      match self.data[outer_index].as_ref() {
        Some(v) => v.index(inner_index).as_ref(),
        None => None,
      }
    }
  }

  fn set_silent(&self, index: u64, item: T) {
    let _ignored = self.try_set(index, item);
  }

  fn try_set(&self, index: u64, item: T) -> Result<(), &T> {
    let outer_index = (u64::BITS - index.leading_zeros()) as usize;
    let inner_index = (index - items_before_outer_index(outer_index)) as usize;

    unsafe {
      match self.data[outer_index].as_ref() {
        Some(v) => match v[inner_index].as_ref() {
          Some(x) => return Result::Err(x),
          None => v[inner_index].set(Some(item)),
        }
        None => {
          self.data[outer_index].set({
            let size = usize::try_from(items_at_outer_index(outer_index)).expect("index too big for architecture!");
            let mut v = Vec::with_capacity(size);
            v.resize_with(size, || UnsafeOption::none());
            Some(v)
          });
          self.data[outer_index].as_ref().unwrap()[inner_index].set(Some(item));
        }
      }
    }

    Result::Ok(())
  }

  fn iter<'a>(&'a self) -> NonMovingGrowableVecIterator<'a, T> {
    return NonMovingGrowableVecIterator {
      underlying: self,
      index: 0,
    }
  }
}

struct NonMovingGrowableVecIterator<'a, T> {
  underlying: &'a NonMovingGrowableVec<T>,
  index: u64,
}

impl<'a, T> Iterator for NonMovingGrowableVecIterator<'a, T> {
  type Item = &'a T;

  fn next(&mut self) -> Option<Self::Item> {
    let it = self.underlying.get(self.index);
    self.index += 1;
    return it;
  }
}

fn main() {
  println!("main");

  let vec = NonMovingGrowableVec::new();
  vec.set_silent(0, 10);
  vec.set_silent(1, 8);
  vec.set_silent(2, 3);
  vec.set_silent(3, 18);
  let mut end = 4;

  for x in vec.iter() {
    if x % 2 == 0 {
      vec.set_silent(end, x / 2);
      end += 1;
    }

    println!("{}", x);
  }

  // dbg!(vec);
}

kornel · January 19, 2022, 3:25am

~~You're allowing concurrent access due to combination of &self and .set on the UnsafeCell, but you don't have any synchronization primitive. That's UB.~~

Nevermind, I forgot that UnsafeCell isn't Send, so it won't allow concurrent access. It's fine.

alice · January 19, 2022, 10:34am

I agree that this is ok.

Yandros · January 19, 2022, 5:36pm

Starting Playground

I've reduced the unsafe boundary, and thus the necessary cognitive burden to review the invariants, down to an UnsafeOption<T> which only features a .get_or_insert_with() and a as_ref() getters:

#![deny(unsafe_code)]

use safety_boundary::UnsafeOption;
mod safety_boundary {
    #![allow(unsafe_code)] // <- Safety boundary
    use super::*;

    /// UnsafeOption is a less-optimized but safer version
    /// of MaybeUninit, because whether or not it is initialized
    /// can be inspected.
    pub
    struct UnsafeOption<T> (
        UnsafeCell<Option<T>>, // Never `!Sync`
    );
    
    /// The following could be added, although it's not necessary.
    unsafe
    impl<T : Send> Send for UnsafeOption<T>
    {}
    
    impl<T> UnsafeOption<T> {
        pub
        const
        fn none ()
          -> UnsafeOption<T>
        {
            UnsafeOption(UnsafeCell::new(None))
        }
    
        pub
        fn as_ref (self: &'_ UnsafeOption<T>)
          -> Option<&'_ T>
        {
            unsafe {
                (&*self.0.get() as &Option<T>)
                    .as_ref()
            }
        }
        
        pub
        fn try_init_with (
            self: &'_ UnsafeOption<T>,
            f: impl FnOnce() -> T,
        ) -> Result<&'_ T, &'_ T>
        {
            // SAFETY & invariants:
            //   - The moment `.as_ref()` yields a `Some`, it will never be mutated ever again.
            //   - Lack of `Sync` guarantees lack of parallelism, here.
            //   - We need to be mindful of re-entrancy, though. Here, the calls to `f()` or `drop(f)`.
            if let Some(already_present) = self.as_ref() {
                // We witnessed a `Some`: no longer mutated, all is fine.
                drop(f); // <- do not forget about this side-effect.
                return Err(already_present);
            }
            // We may be tempted to do `….write(f())`, but that
            // `f()` could re-entrantly init the value, and let
            // the obtained `&T` escape. It is thus better to just delegate
            // to a by-value try_init:
            self.try_init(f())
                .map_err(|(my_val, already_present)| {
                    // warn!("re-entrant init!");
                    drop(my_val); // <- do not forget about this side-effect.
                    already_present
                })
        }
        
        pub
        fn try_init (
            self: &'_ UnsafeOption<T>,
            value: T,
        ) -> Result<&'_ T, (T, &'_ T)>
        {
            if let Some(already_present) = self.as_ref() {
                // We witnessed a `Some`: no longer mutated, all is fine.
                return Err((value, already_present));
            }
            // Up until this point, no `Some` can have been witnessed,
            // so there is no `&T` yet: we can finally write the value
            Ok(unsafe {
                let p = self.0.get();
                p.write(Some(value)); // <- The only write happens here.
                (&*p).as_ref().unwrap_unchecked()
            })
        }
    }
}

Playground

This is easier to review than something involving nested vecs and indexing and complex branches. And it is indeed sound: the "astute reader" will recognize here:

`::once_cell::unsync::OnceCell<T>`

.as_ref() as their .get();
.try_init() as their try_insert();
.try_init_with() similar to their .get_or_insert_with().

So, after the learning exercise of reimplementing unsync::OnceCell ourselves, it is important to trash that code and use the one from that very popular library: indeed, there is always a chance to have made a mistake in any kind of code, but when unsafe is involved, such mistakes can lead to UB/memory corruption, which in turn can way more easily make your code suffer from security vulnerabilities.

Thus, it is critical to reduce the chances of an allow(unsafe_code) codebase to feature bugs , which can be achieved by having as many pairs of eyes reviewing the same "canonical" implementation, here: ::once_cell::unsync::OnceCell<T>

In this instance, the try_set function then ends up rewritten as:

    fn try_set (
        self: &'_ NonMovingGrowableVec<T>,
        index: u64,
        item: T,
    ) -> Result<(), &'_ T>
    {
        let outer_index =
            (u64::BITS - index.leading_zeros())
               as usize
        ;
        let inner_index =
           (index - items_before_outer_index(outer_index))
               as usize
        ;
        let at_outer =
            self.data[outer_index]
                .get_or_init(|| {
                    let size =
                        usize::try_from(
                            items_at_outer_index(outer_index)
                        )
                        .expect("index too big for architecture!")
                    ;
                    (0.. size).map(|_| UnsafeOption::new()).collect()
                })
        ;
        match at_outer[inner_index].try_insert(item) {
            | Ok(_) => Ok(()),
            | Err((already_present, _item)) => Err(already_present),
        }
    }

This is % unsafe-free (but for stdlib and once_cell, of course), so that we can proudly tweak the beginning of our src/{lib,main}.rs with:

+ #![forbid(unsafe_code)] // ✅

Final Playground

system · April 19, 2022, 5:36pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Unsafe Array Indexing help	7	565	November 8, 2023
Unsafe code review request with UnsafeCell	13	556	March 14, 2023
Unsafe Rust is... tough? help	11	2387	December 27, 2020
Write only buffer code review	2	446	August 6, 2021
Trying to remove unsafe from the code help	4	678	January 10, 2021

Unsafe Code Review for Growable Non-Moving Array

::once_cell::unsync::OnceCell<T>

Final Playground

Related Topics

`::once_cell::unsync::OnceCell<T>`