Requesting review for `unsized-vec`

unsized-vec provides UnsizedVec<T>, which is like Vec<T>, except that T has no Sized bound. This is meant as an experiment to play with unsized types and see how far I can push them. I would appreciate feedback on:

  • Soundess issues—unfortunately, the crate uses unstable features that Miri doesn't support
  • API suggestions—how would you see yourself using a crate like this? What APIs would make those things easier?
  • Performance—any tricks to make it faster or use less memory?
  • Code style

I tried taking a look, and my main impression is that this code appears overly complicated.

1 Like

There is a lot of complexity inherent to the problem. unsized-vec aims to support three cases:

  • T: Sized
  • T: ?Sized + Aligned
  • T: ?Sized + ?Aligned

The goal is to have one Vec type that does the optimal thing for all three cases. The way I've chosen to do this is to have each individual method of UnsizedVec use specialization internally. That leads to complicated code. Alternatively, I could write 3 separate Vec types with completely independent implementations, and use specialization to choose between the 3. That would mean simpler code, but come at the cost of lots of duplication.

How does it work? What's the overhead? Could you give a quick overview?

The crate docs give a quick overview of the memory layout. As for implementation, it's simplest to explain by starting with the T: ?Aligned case (when T 's alignment isn't known at compile time; for example, T = dyn Debug). In this case, the UnsizedVec looks more or less like this:

// Following is somewhat simplified

pub struct UnsizedVec<T: ?Sized> {
    // Pointer to start of heap allocation
    // (dangling if `cap` is 0)
    ptr: NonNull<()>,
    // length of heap allocation (0 if we haven't allocated)
    cap: usize,
    // See later explanation for why this is needed.
    // `alloc::vec::Vec` is the standard library's `Vec`.
    // This is distinct from the heap allocation
    // used to store the elements of the` UnsizedVec`.
    metadata_list: alloc::vec::Vec<MetadataStorage<T>>,
    // See later explanation for why this is needed.
    align: usize,
}

struct MetadataStorage<T: ?Sized> {
    metadata: <T as Pointee>::Metadata, 
    offset_next: usize,
}

The elements of the UnsizedVec are laid out sequentially in memory, just as with normal Vec. The one caveat is that every element is padded so that the size it takes up in the vec, in bytes, is a multiple of the number in the align field. align is equal to the largest alignment (as returned by core::mem::align_of_val) out of all the elements in the vec. It can increase when a new element is added, which triggers re-padding and reallocation. The UnsizedVec's backing allocation is itself aligned to align.

The metadata field stores a list of MetadataStorage structs, one for each element of the vec, and laid out in the same order. This MetadataStorage struct has two fields. For the element at index i of vec v, v.metadata_list[i].metadata stores the pointer metadata of the element, needed to reconstruct a wide pointer to it. The offset_next field stores the offset of the byte one past the end of the element, compared to the start of the vec (but not padded to a multiple of align). For example, if the third element occupies bytes 4 to 7 of the UnsizedVec's heap allocation, v.metadata_list[i].offset_next will equal 8.

This allows indexing into the UnsizedVec in O(1) time. To get the offset of the start of the element at index index of vec myvec, we do index.checked_sub(1).map_or(0, |prev_index| myvec.metadata_list[prev_index].offset_next.next_multiple_of(myvec.align)). And to get the offset of one past the end of the element, we do unsized_vec.metadata_list[index].offset_next. This gives us enough information to find our element in the unsized_vec's heap allocation, and copy it out or return a reference to it.

To get the length of myvec, we just do myvec.metadata_list.len().

Now, when T: ?Sized + Aligned (for example, T = [u32]), we do mostly the same thing, except we don't need to worry about storing an align field or padding to it. So we use specialization to store a () in that field instead, and skip all the extra padding logic.

When T: Sized, we use more specialization so that MetadataStorage's two fields both become (). This makes MetadataStorage a ZST, so metadata_list never allocates and is basically just a usize length field. This allows UnsizedVec to behave basically identically to the standard library Vec, and to be trivially convertible to and from it.

2 Likes

emplacer::box_new_with() does not appear to handle allocation failure correctly. To illustrate, this program attempts to write to a null pointer, despite satisfying all the safety preconditions:

// unsized-vec v0.0.1-alpha.7

use std::{
    alloc::{GlobalAlloc, Layout, System},
    ptr,
    sync::atomic::{AtomicBool, Ordering},
};
use unsized_vec::emplace;

struct ToggleAlloc(AtomicBool);

// SAFETY: Wraps `System`'s methods, possibly indicating failure.
unsafe impl GlobalAlloc for ToggleAlloc {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        if self.0.load(Ordering::Relaxed) {
            System.alloc(layout)
        } else {
            ptr::null_mut()
        }
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        System.dealloc(ptr, layout)
    }
}

#[global_allocator]
static ALLOC: ToggleAlloc = ToggleAlloc(AtomicBool::new(true));

fn main() {
    ALLOC.0.store(false, Ordering::Relaxed);
    emplace::box_new_with::<u8>(|emplacer| {
        // SAFETY: The closure writes a valid `u8` to its pointer.
        let emplacer = unsafe { emplacer.into_inner() };
        emplacer(Layout::new::<u8>(), (), &mut |ptr| {
            // SAFETY: `ptr` is valid to write a `u8` into.
            unsafe { *ptr.cast::<u8>() = 0 };
        });
    });
    // Segmentation fault (core dumped)
}

A few more issues:

  • If the allocation fails, and closure provided to box_new_with() panics, PointerDeallocer's destructor will call alloc::dealloc() with a null pointer, which is UB.
  • If the layout has size 0, box_new_with()'s closure will call alloc::alloc() with size 0, which is UB.
  • If the user provides the wrong layout to the Emplacer's closure, it will not create an allocation with the correct layout for T. The safety preconditions for Emplacer::into_inner() say nothing about the layout or metadata, only that the pointer must be initialized by the inner closure. (Arguably, this means that box_new_with() is violating Emplacer::new()'s safety precondition.)
  • Not a soundness issue, but if the closure provided to box_new_with() calls the Emplacer's closure more than once, the previous values will be leaked.
2 Likes

Good catch, thank you! I've pushed updates to fix the soundness problems.

I've done a complete rewrite of the crate, that separates the three Vec implementations instead of mashing them all together. Hopefully it should make the code more readable.

Emplacable::into_closure() does not correctly handle the closure under Stacked Borrows: if it contains a Unique tag (from a captured &mut or Box), then mem::forget(self); will invalidate the tag in closure. To illustrate, Miri produces an error on this program:

// emplacable v0.1.0-alpha.2
// MIRIFLAGS=-Zmiri-retag-fields cargo +nightly miri run

use emplacable::{Emplacable, Emplacer};

fn main() {
    let mut hello = "Hello, world!";
    // SAFETY: The closure always diverges.
    let emplacable = unsafe {
        Emplacable::from_closure(|_: &mut Emplacer<'_, ()>| {
            let _ = &mut hello;
            panic!();
        })
    };
    let _ = emplacable.into_closure();
}
error: Undefined Behavior: trying to retag from <3063> for Unique permission at alloc1594[0x0], but that tag does not exist in the borrow stack for this location
   --> /home/lm978/.cargo/registry/src/github.com-1ecc6299db9ec823/emplacable-0.1.0-alpha.2/src/lib.rs:439:9
    |
439 |         closure
    |         ^^^^^^^
    |         |
    |         trying to retag from <3063> for Unique permission at alloc1594[0x0], but that tag does not exist in the borrow stack for this location
    |         this error occurs as part of retag at alloc1594[0x0..0x10]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <3063> was created by a Unique retag at offsets [0x0..0x10]
   --> src/main.rs:15:13
    |
15  |     let _ = emplacable.into_closure();
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^
help: <3063> was later invalidated at offsets [0x0..0x10] by a Unique retag
   --> src/main.rs:15:13
    |
15  |     let _ = emplacable.into_closure();
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: BACKTRACE:
    = note: inside `emplacable::Emplacable::<(), [closure@src/main.rs:10:34: 10:60]>::into_closure` at /home/lm978/.cargo/registry/src/github.com-1ecc6299db9ec823/emplacable-0.1.0-alpha.2/src/lib.rs:439:9
note: inside `main` at src/main.rs:15:13
   --> src/main.rs:15:13
    |
15  |     let _ = emplacable.into_closure();
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to previous error

One solution is to use an outer ManuallyDrop in place of mem::forget():

    #[must_use]
    #[inline]
    pub fn into_closure(self) -> C {
        let mut this = ManuallyDrop::new(self);
        // SAFETY: `this` is within a `ManuallyDrop`, so no double drop.
        unsafe { ManuallyDrop::take(&mut this.closure) }
    }
2 Likes

There seem to be a couple issues with the safety preconditions for Emplacer::new().

    /// Also, `e` can run that closure at most once, and must do so before returning,
    /// if it returns. It is allowed to unwind or otherwise diverge, but once it runs
    /// the closure it is no longer allowed to unwind.

The forgetting_emplacer_closure closure in Emplacable::forget() returns without running the closure it is given, violating this precondition. However, the documentation for Emplacer itself clarifies that the emplacer is permitted to do nothing. It would make sense to update the Emplacer::new() precondition accordingly.

    /// The `Emplacer` can't assume that is has received full ownership of
    /// the value written to the pointer of the inner closure, until the moment it returns.
    /// Specifically, it is not allowed to drop the value if it unwinds.

It is somewhat ambiguous what "it" is referring to, between the Emplacer and the inner closure. Does it mean that the Emplacer must wait until the inner closure returns before modifying the value, or until the Emplacer itself returns?

3 Likes

@LegionMammal978 I've pushed an update that fixes the mem::forget issue you found, and rewrote the safety docs to hopefully make them more clear. Thanks for the feedback!

    /// `emplacer_fn` is allowed to unwind or otherwise diverge. But if it runs the closure it receives as its third argument,
    /// then once it does so it is no longer allowed to unwind.s

Do you mean to say here that the emplacer_fn must not unwind after the inner closure returns? The current wording could suggest that the emplacer_fn must not unwind even if the inner closure unwinds.

EDIT: Another discrepancy. On Emplacer::from_fn():

    /// The `emplacer_fn`, if it runs the closure that it recieves as a third argument, must
    /// pass in a pointer that is ether null, or has the alignment of the `Layout` passed to it,
    /// and is valid for writes of the number of bytes corresponding to the `Layout`.

This suggests that the pointer passed to the inner closure must meet at least the received Layout's size and alignment. However, the preconditions for Emplacer::into_fn() contradict this, saying that the pointer must only be valid for the layout of T with the received metadata:

    /// If `T`'s size or alignment can be known at compile-time, or can be determined from the
    /// pointer metadata alone, and you pass in an oversized/overaligned `Layout`, then you are not guaranteed
    /// to get an allocation that matches the `Layout`'s stronger guarantees.
1 Like

There appears to be a typo in the IntoEmplacable<str> for &str impl: instead of copying bytes from addr_of!(*self), it copies them from addr_of!(self), where self is the &str reference. To illustrate, Miri reports an out-of-bounds read for strings longer than 16 bytes:

// emplacable v0.1.0-alpha.4
// cargo +nightly miri run

use emplacable::Emplacable;

fn main() {
    let s = "Hello, world! Hello, Rust!";
    let emplacable: Emplacable<str, _> = s.into();
    let _ = emplacable.into_string();
}
error: Undefined Behavior: memory access failed: alloc1607 has size 40, so pointer to 26 bytes starting at offset 24 is out-of-bounds
   --> /home/lm978/.cargo/registry/src/github.com-1ecc6299db9ec823/emplacable-0.1.0-alpha.4/src/lib.rs:820:25
    |
820 | /                         ptr::copy_nonoverlapping(
821 | |                             addr_of!(self).cast::<u8>(),
822 | |                             out_ptr.cast(),
823 | |                             self.len(),
824 | |                         );
    | |_________________________^ memory access failed: alloc1607 has size 40, so pointer to 26 bytes starting at offset 24 is out-of-bounds
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
    = note: BACKTRACE:
    = note: inside closure at /home/lm978/.cargo/registry/src/github.com-1ecc6299db9ec823/emplacable-0.1.0-alpha.4/src/lib.rs:820:25
    = note: inside closure at /home/lm978/.cargo/registry/src/github.com-1ecc6299db9ec823/emplacable-0.1.0-alpha.4/src/lib.rs:1428:13
    = note: inside closure at /home/lm978/.cargo/registry/src/github.com-1ecc6299db9ec823/emplacable-0.1.0-alpha.4/src/lib.rs:816:13
    = note: inside `<[closure@<&str as emplacable::IntoEmplacable<str>>::into_emplacable::{closure#0}] as std::ops::FnOnce<(&mut emplacable::Emplacer<'_, str>,)>>::call_once - shim` at /home/lm978/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:251:5
    = note: inside `emplacable::box_new_with::<str, [closure@<&str as emplacable::IntoEmplacable<str>>::into_emplacable::{closure#0}]>` at /home/lm978/.cargo/registry/src/github.com-1ecc6299db9ec823/emplacable-0.1.0-alpha.4/src/lib.rs:1445:5
    = note: inside `emplacable::<impl std::convert::From<emplacable::Emplacable<str, [closure@<&str as emplacable::IntoEmplacable<str>>::into_emplacable::{closure#0}]>> for std::string::String>::from` at /home/lm978/.cargo/registry/src/github.com-1ecc6299db9ec823/emplacable-0.1.0-alpha.4/src/lib.rs:1538:9
    = note: inside `<emplacable::Emplacable<str, [closure@<&str as emplacable::IntoEmplacable<str>>::into_emplacable::{closure#0}]> as std::convert::Into<std::string::String>>::into` at /home/lm978/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:726:9
    = note: inside `emplacable::Emplacable::<str, [closure@<&str as emplacable::IntoEmplacable<str>>::into_emplacable::{closure#0}]>::into_string` at /home/lm978/.cargo/registry/src/github.com-1ecc6299db9ec823/emplacable-0.1.0-alpha.4/src/lib.rs:1527:9
note: inside `main` at src/main.rs:9:13
   --> src/main.rs:9:13
    |
9   |     let _ = emplacable.into_string();
    |             ^^^^^^^^^^^^^^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to previous error

EDIT: The same typo exists in IntoEmplacable<CStr> for &CStr, IntoEmplacable<OsStr> for &OsStr, and IntoEmplacable<Path> for &Path. Also, I don't think the standard library guarantees that the metadata of these types will always match the byte length, which these impls assume.

1 Like

IntoEmplacable<[T]> for Vec<T> does not appear to correctly forget the original Vec, causing an unconditional double free.

// emplacable v0.1.0-alpha.4
// cargo +nightly miri run

use emplacable::Emplacable;

fn main() {
    let v = vec!["Hello, world!"];
    let emplacable = Emplacable::from(v);
    let _ = emplacable.into_vec();
}
error: Undefined Behavior: pointer to alloc1556 was dereferenced after this allocation got freed
   --> /home/lm978/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:117:14
    |
117 |     unsafe { __rust_dealloc(ptr, layout.size(), layout.align()) }
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pointer to alloc1556 was dereferenced after this allocation got freed
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
    = note: BACKTRACE:
    = note: inside `std::alloc::dealloc` at /home/lm978/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:117:14
    = note: inside `<std::alloc::Global as std::alloc::Allocator>::deallocate` at /home/lm978/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:254:22
    = note: inside `<emplacable::alloc::raw_vec::RawVec<&str> as std::ops::Drop>::drop` at /home/lm978/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:479:22
    = note: inside `std::ptr::drop_in_place::<emplacable::alloc::raw_vec::RawVec<&str>> - shim(Some(emplacable::alloc::raw_vec::RawVec<&str>))` at /home/lm978/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
    = note: inside `std::ptr::drop_in_place::<std::vec::Vec<&str>> - shim(Some(std::vec::Vec<&str>))` at /home/lm978/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
    = note: inside `std::ptr::drop_in_place::<[closure@<std::vec::Vec<&str> as emplacable::IntoEmplacable<[&str]>>::into_emplacable::{closure#0}]> - shim(Some([closure@<std::vec::Vec<&str> as emplacable::IntoEmplacable<[&str]>>::into_emplacable::{closure#0}]))` at /home/lm978/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
    = note: inside `<[closure@<std::vec::Vec<&str> as emplacable::IntoEmplacable<[&str]>>::into_emplacable::{closure#0}] as std::ops::FnOnce<(&mut emplacable::Emplacer<'_, [&str]>,)>>::call_once - shim` at /home/lm978/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:251:5
    = note: inside `emplacable::box_new_with::<[&str], [closure@<std::vec::Vec<&str> as emplacable::IntoEmplacable<[&str]>>::into_emplacable::{closure#0}]>` at /home/lm978/.cargo/registry/src/github.com-1ecc6299db9ec823/emplacable-0.1.0-alpha.4/src/lib.rs:1445:5
    = note: inside `emplacable::<impl std::convert::From<emplacable::Emplacable<[&str], [closure@<std::vec::Vec<&str> as emplacable::IntoEmplacable<[&str]>>::into_emplacable::{closure#0}]>> for std::vec::Vec<&str>>::from` at /home/lm978/.cargo/registry/src/github.com-1ecc6299db9ec823/emplacable-0.1.0-alpha.4/src/lib.rs:1471:9
    = note: inside `<emplacable::Emplacable<[&str], [closure@<std::vec::Vec<&str> as emplacable::IntoEmplacable<[&str]>>::into_emplacable::{closure#0}]> as std::convert::Into<std::vec::Vec<&str>>>::into` at /home/lm978/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:726:9
    = note: inside `emplacable::Emplacable::<[&str], [closure@<std::vec::Vec<&str> as emplacable::IntoEmplacable<[&str]>>::into_emplacable::{closure#0}]>::into_vec` at /home/lm978/.cargo/registry/src/github.com-1ecc6299db9ec823/emplacable-0.1.0-alpha.4/src/lib.rs:1460:9
note: inside `main` at src/main.rs:9:13
   --> src/main.rs:9:13
    |
9   |     let _ = emplacable.into_vec();
    |             ^^^^^^^^^^^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to previous error
1 Like

@LegionMammal978 I've added some Miri testing to the emplacable CI for the issues you found. Thanks!

I don't see how the impls assume anything at all about the metadata? They use Layout::for_value and then Layout::size to get the byte length.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.