Is my use of std::mem::zeroed() sound?

I have this trait implementation for arrays in my code:

impl<T, const SIZE: usize> FromLeStream for [T; SIZE]
where
    T: FromLeStream,
{
    fn from_le_stream<I>(bytes: &mut I) -> Option<Self>
    where
        I: Iterator<Item = u8>,
    {
        #[allow(unsafe_code)]
        // SAFETY: We will initialize all elements of the array in the for loop below.
        let mut result: [T; SIZE] = unsafe { zeroed() };

        for item in &mut result {
            // Initialize all elements of the array with valid values of `T`
            // as returned by `FromLeStream::from_le_stream`.
            // If `FromLeStream::from_le_stream` returns an error, we will return early
            // and discard the uninitialized array.
            *item = <T as FromLeStream>::from_le_stream(bytes)?;
        }

        // At this point the array is fully initialized by the for loop above,
        // so it's safe to return it.
        Some(result)
    }
}

Is this code sound? Are my safety considerations correct? Or may it cause undefined behavior?

At first look, probably not; if FromLeStream is an unsafe trait that requires the all-zeroes bit pattern to be valid for T: FromLeStream, then this is safe, otherwise it is possible that the all-zeroes bit pattern is not valid, and this code can cause UB as a result (for example, when result is dropped on an error).

6 Likes

I see a couple of issues with this:

  1. It assumes that 0 is a valid value of T, but I don't see whether that's explicitly required by the trait, nor is the trait unsafe to implement.
  2. If there's an error, the uninitialized array elements will run their Drop functions, but they aren't necessarily in a valid state.

You may want to check out MaybeUninit if you haven't.

Edit: In fact, there's even an example of this use case MaybeUninit in std::mem - Rust

7 Likes

Thank you both. I know why I try to avoid unsafe {} in my code. It's so easy to get wrong.
After reading a bit more of MaybeUninit I now updated the implementation to:

impl<T, const SIZE: usize> FromLeStream for [T; SIZE]
where
    T: FromLeStream,
{
    fn from_le_stream<I>(bytes: &mut I) -> Option<Self>
    where
        I: Iterator<Item = u8>,
    {
        let mut result: [MaybeUninit<T>; SIZE] = [const { MaybeUninit::uninit() }; SIZE];

        for elem in &mut result[..] {
            elem.write(<T as FromLeStream>::from_le_stream(bytes)?);
        }

        #[allow(unsafe_code)]
        // SAFETY: At this point the array is fully initialized by the for loop above,
        // so it's safe to return it.
        Some(unsafe { result.as_ptr().cast::<[T; SIZE]>().read() })
    }
}

Which should avoid using uninitialized memory.

Note that I could not use std::mem::transmute() as in the example, because of the generic T in the array.

Here’s a possible implementation using arrayvec.

impl<T, const SIZE: usize> FromLeStream for [T; SIZE]
where
    T: FromLeStream,
{
    fn from_le_stream<I>(bytes: &mut I) -> Option<Self>
    where
        I: Iterator<Item = u8>,
    {
        std::iter::from_fn(|| T::from_le_stream(bytes))
            .take(SIZE)
            .collect::<arrayvec::ArrayVec<_, SIZE>>()
            .into_inner()
            .ok()
    }
}
4 Likes

Thanks. I actually use heapless::Vec in the crate behind a feature flag, so I could've used this as well. But I wanted to challenge myself to get it done without third-party crates.

Note (and I don't know if it's important in your use case) that if any of the individual <T as FromLeStream>::from_le_stream(bytes) returns an error, you will leak all the previous elements in the array without calling their Drop implementations.

This may, or may not, matter, but it's one of the things that arrayvec handles for you correctly.

3 Likes

Fixed it as per the MaybeUninit docs:

impl<T, const SIZE: usize> FromLeStream for [T; SIZE]
where
    T: FromLeStream,
{
    fn from_le_stream<I>(bytes: &mut I) -> Option<Self>
    where
        I: Iterator<Item = u8>,
    {
        let mut result = [const { MaybeUninit::uninit() }; SIZE];

        for (index, item) in result.iter_mut().enumerate() {
            if let Some(elem) = T::from_le_stream(bytes) {
                item.write(elem);
            } else {
                // Drop all elements that we've initialized so far.
                for item in &mut result[0..index] {
                    #[allow(unsafe_code)]
                    // SAFETY: We counted the number of initialized elements,
                    // so it's safe to drop those.
                    unsafe {
                        item.assume_init_drop();
                    };
                }

                // We cleaned up the partially initialized array and can nor return early.
                return None;
            }
        }

        #[allow(unsafe_code)]
        // SAFETY: At this point the array is fully initialized by the for loop above,
        // so it's safe to return it.
        Some(unsafe { result.as_ptr().cast::<Self>().read() })
    }
}

Next challenge in doing this without relying on other crates: what happens if a T::from_le_stream call panics?

4 Likes

This appears to be quite challenging, since I cannot use catch_unwind(), because I cannot transfer the mutable reference to the Iterator across an unwind boundary.

PS: That would require additional trait bounds... What a rabbit hole to go down into.

If you want to see how the standard library does this, there's an unstable method https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.next_chunk that does essentially what you're doing here.

ArrayVec is definitely the best answer on stable for this.

2 Likes

You could probably wrapp each T::from_le_stream call in catch_unwind. That said, for such code, using destructors is usually preferred over catch_unwind, as it doesn’t need to interrupt and later resume the unwinding process.

A common approach here is to store handles that have access to stuff that needs clean-up in a struct with a destructor, at some level, and possibly add into this struct additional information needed to do the clean-up correctly (you “only” need to make sure that at any point that can panic).

W.r.t. interacting with the iterator, I can imagine this can be difficult to keep within the loop, but outside of the loop, you can have a level of re-borrowed mutable. Access. Making a guard that holds a &mut [MaybeUninit<T>; SIZE] field, then let the for-loop re-borrow from that field. As additional information, you also need some usize-type field that’s kept up-to-date appropriately to tell the destructor the amount of items that need to be destructed. Finally, you also need to consider how to “defuse” the guard on the successful path. (Either by making the Drop implementation effectively a no-op, e.g. setting the number-of-initialized-items back to zero, or by using something like mem::forget.)

For inspiration, feel free to check out some code in the standard library for comparison, in implementing array::from_fn and the unstable array::from_fn_mut. (They have a Guard struct that they use in two different places, which is why it’s not defined right next to – or even inside – the relevant function.)

Of course note that the any mechanism that handles your panic-case can easily handle your None case as well, so you won’t need some code you already have.


A last follow-up question/challenge: What’s the intended behavior if some of the T’s being dropped in the None case has a panicing destructor? It’s a very niche question, but still interesting IMO. Very niche because really destructors are generally not actually supposed to panic. One could consider if not perhaps the remaining items in the array should still be dropped (giving more chance to a double-panic abort), or if one wants to leak them (risking that the overall program continues running with an avoidable data leak, which could accumulate over a long time).

If you do want to handle by dropping the remaining items, it’s also a challenge not to accidentally introduce any possibility for double-free to happen. (E.g. also the item whose destructor caused the panic must not be re-executed.) To make your life simpler, if you manage to call ptr::drop_in_place, it automatically does the right thing of continuing to drop further items if earlier ones panicked, and it’s what the standard library uses. It’s also what arrayvec uses and what Drop for Vec<…> uses, and it has the additional benefit of skipping the whole potential loop reliably if the type in question doesn’t have any destruction logic (nor any of its fields).

3 Likes

Sorry if this has been mentioned. I didn't read the whole thread, but it doesn't matter what you do with the zeroed values. As long as 0 is not a valid bit pattern for T: FromLeStream then this is immediate UB at the callsite of zeroed().

From the documentation of the zeroed()

There is no guarantee that an all-zero byte-pattern represents a valid value of some type T . For example, the all-zero byte-pattern is not a valid value for reference types (&T , &mut T ) and functions pointers. Using zeroed on such types causes immediate undefined behavior because the Rust compiler assumes that there always is a valid value in a variable it considers initialized.

So the following line would already be UB for some types

1 Like

It was a fun learning experience, but I took the safe route and let heapless::Vec do the heavy lifting:

impl<T, const SIZE: usize> FromLeStream for [T; SIZE]
where
    T: FromLeStream,
{
    fn from_le_stream<I>(bytes: &mut I) -> Option<Self>
    where
        I: Iterator<Item = u8>,
    {
        let mut vec = heapless::Vec::<T, SIZE>::new();
        (0..SIZE)
            .map_while(|_| T::from_le_stream(bytes))
            .for_each(|item| vec.push(item).unwrap_or_else(|_| unreachable!()));
        vec.into_array().ok()
    }
}
5 Likes

One slight criticism of your final code (and it's minor); I put a unique string inside unreachable! invocations for two purposes:

  1. If I'm wrong, and that code is reachable at runtime, the unique string makes it very easy to see what's panicked, without firing up a code editor.
  2. If it's important that this code cannot panic at runtime, a user can search the binary for that unique string and become confident that the optimizer agrees that this code does not panic.
2 Likes

Thanks. In the meantime I already refactored it to:

impl<T, const SIZE: usize> FromLeStream for [T; SIZE]
where
    T: FromLeStream,
{
    fn from_le_stream<I>(mut bytes: I) -> Option<Self>
    where
        I: Iterator<Item = u8>,
    {
        iter::from_fn(|| T::from_le_stream(&mut bytes))
            .take(SIZE)
            .collect::<heapless::Vec<_, SIZE>>()
            .into_array()
            .ok()
    }
}

(Stolen from @steffahn) :wink:

1 Like

TBH, I wouldn't bother doing this. If I think it's unreachable and it's not, then it's a bug, and that's going to need opening a code editor to fix. And if it's really unreachable, then the extra noise of the custom string is just distracting.

(The other reason makes more sense.)

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.