How to create a long array with non-copyable element?

I'm trying to do this:

let dict: [Vec<usize>; 26] = [vec![]; 26];

However it doesn't works because vec is not implementd Copy.
So I have to write like this:

let dict: [Vec<usize>; 26] = [vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![]];

is there a better way? thanks

The answer probably depends on your context, but by far the easiest way is to just use a Vec<Vec<usize>> with 26 elements and initialize it in a loop. Arrays in Rust aren't as useful as in other languages.

I think the problem is that there's no (safe) way to create an array of Vec<usize> on the stack in one operation... Normally to initialize a [T; n] array it'd use a memcpy to create n copies of a valid bit pattern of T, but because Vec<usize> isn't Copy, that memcpy operation wouldn't be sound. That means you've got to initialize your array of items one-by-one which isn't really a safe operation to do (i.e. safe Rust assumes all variables contain a valid copy of the type they contain, but if only half your array is initialized then the other half isn't, and that assumption is invalid).

Another answer is to use std::mem::MaybeUninit and some unsafe code. This is my attempt (playground) at initializing an array using non-Copy items.

I know @nikomatsakis is coordinating the unsafe code workgroup, so hopefully he may know a better way of doing this or be able to point you in the right direction, because my answer using unsafe code kinda sucks...

1 Like

There’s the array-init crate.

6 Likes

use mem::uninitialized and properly fill in array.

Sadly there is no good way to do it otherwise, other than doing some macro

That's what I was getting at with MaybeUninit. From what I've heard mem::uninitialized() is insta-UB...

3 Likes

it's enlightening!
Although I can't use third-part lib now.
Thanks anyway.

::array-init crate is not sound, since it relies on a generic and unbounded mem::uninitialized.

Now that MaybeUninit is stable, working with generic uninit buffers can finally be done soundly (although it still requires great care).

See the following stand-alone solution:

macro_rules! array {(
    $closure:expr; $N:expr
) => ({
    use ::core::{
        mem::{
            forget,
            MaybeUninit,
        },
        ptr,
        slice,
    };

    const N: usize = $N;
    
    #[inline(always)]
    fn gen_array<T> (mut closure: impl FnMut(usize) -> T) -> [T; N]
    {
        unsafe {
            let mut array = MaybeUninit::uninit();
    
            struct PartialRawSlice<T> {
                ptr: *mut T,
                len: usize,
            }
            
            impl<T> Drop for PartialRawSlice<T> {
                fn drop (self: &'_ mut Self)
                {
                    unsafe {
                        ptr::drop_in_place(
                            slice::from_raw_parts_mut(
                                self.ptr,
                                self.len,
                            )
                        )
                    }
                }
            }

            let mut raw_slice = PartialRawSlice {
                ptr: array.as_mut_ptr() as *mut T,
                len: 0,
            };
            
            (0 .. N).for_each(|i| {
                ptr::write(raw_slice.ptr.add(i), closure(i));
                raw_slice.len += 1;
            });
    
            forget(raw_slice);
            array.assume_init()
        }
    }

    gen_array($closure)
})}


fn main ()
{
    // init by providing a FnMut closure mapping each index to each value
    let dict: [Vec<usize>; 26] = array![|idx| vec![]; 26];
    dbg!(&dict[..]);
}

AFAIK as long as you correctly use ptr::write to initially write value it is safe. The same as MaybeUninit.
It is just MaybeUninit makes a clear distinguish between uninitialized value and regular one as you can use assume_init to retrieve initialize T

But both require careful write as you must not replace value inside uninitialized memory (i.e. requires to use pointers to write into)

mem::uninitialized will in fact be insta-UB, if the type in question has incorrect values, regardless of how carefully you're initializing this memory afterwards.

1 Like

Insta UB! No matter what. That's the reason it has been deprecated in favor of MaybeUninit.

1 Like

Then initialize with correct value, I don't understand the problem.

You can wrongly initialize even MaybeUninit and then just assume it is correct, and get the same UB.
So to me it is the same shit, you need to write raw pointers to initialize such memory.

It is only question of safer semantics with MaybeUninit, it's nature is the same.

P.s. I'm coming from C background so don't scare me with non-issue UB

Compared to C, Rust requires far much stricter requirement for the sake of aggressive optimization. Those are auto-proved by compiler in safe context, but in unsafe context it's your responsibility to provide all of them. Unsafe Rust is more unsafe than C.

2 Likes

@RalfJung posted a very neat article about this. Maybe give it a try :wink:

Look, pal, I started with C and C++ so I'm pretty well aware of what are risks of uninitialized memory, especially in C++ with its object model.
So you don't need to explain it to me, I just want people stop disregarding any solution that has unsafe or UB

P.s. especially when there is no safe option

"pal"... :roll_eyes:

You can't tell a person new to rust: "Don't worry about unsafe, just use this mem::uninitialized and you'll be fine". That's not how it works. That's not how it ever should work!
Sure, you might have a C background, but OP doesn't. He may now think: unsafe? Seems like it is the solution to all my problems! Let's get starting!
Hell no! unsafe should only be the solution, if you know how to deal with a summoned daemon.
Also: Always disregard a solution that has UB, because it is not defined behavoir. Look at the very first example of ralfs code:

fn always_returns_true(x: u8) -> bool {
    x < 150 || x > 120
}

fn main() {
    let x: u8 = unsafe { mem::uninitialized() };
    assert!(always_returns_true(x));
}

this always returns false no matter what. You can't explain this to a person new to rust. (This returns false every time in this specific environment with the exact same compiler). This is UB. Period.
When you are using unsafe you have to make certain guarantees to the compiler and if you fail to do so, it is UB. There is nothing like: Let's take this UB and take it to our advantage. That's not how it works (Also not in C).

5 Likes

Between two solutions, which are both unsafe, but:

  • the first can trigger insta-UB in several cases, and it's not entirely clear when it will and when it won't,
  • the second can be proved (not by compiler, but by programmer) not to trigger UB at all,

I think the second is clearly preferable, and that's the point of insisting on MaybeUninit over mem::uninitialized.

The problem is that Vec implement drop so if the program is stop for whatever reason (multi treading) your array of not initialized vec will make your program enter in UB state. Maybeunit is a tool that prevent Drop implementation to be call. That why it's much better and not UB.

Sure, you might have a C background, but OP doesn’t. He may now think: unsafe? Seems like it is the solution to all my problems! Let’s get starting!

People learn from their mistakes, eventually we all would need to use unsafe code.

Hell no! unsafe should only be the solution, if you know how to deal with a summoned daemon

If that's the case, then Rust's unsafe is one big flaw and not feature.
But that's false, and you know it.
So suggesting unsafe is fine.

this always returns false no matter what. You can’t explain this to a person new to rust.

Not true?
UB means it can be false or true.
But that's not the problem actually, because code is perfectly safe with uninitialized integer as it doesn't have dtor.

@Cerber-Ursi

Between two solutions, which are both unsafe , but:

If you mean between MaybeUninit then yes, of course.

It has better semantics to avoid obvious pitfals

@Stargateur

Sure, but you can properly initialize it and there will be no problem.
MaybeUninit is not good for usage as type since it is needed only for one time initialization.
So use case would be initialize MaybeUninit<[T; N]> and then call assume_init to get proper array

UB means that there has been a contract violation with the compiler; there is no such thing as "non-issue UB". Maybe some compiler does not exploit something resulting in an implementation-based platform-based definition of behavior. Meaning that you would have a "safe" crate for a specific version of rustc and a specific architecture. In other words: you might as well just be sharing a binary release of the program.

Taking the uninitialized integer example, maybe some version of the compiler and on some architecture the function always returns true because the compiler does not exploit uninitalized integers even though it could. Later on, a new version of the compiler realises it can exploit it for more efficient binaries, resulting in the always_return_true function breaking. Whose fault is that? The programmer's.


Now, coming back to mem::uninitialized, when the type is inhabited, using only ptr::write on it could be seen as fine (e.g., for integer types this is being deliberated),

but there are cases when even this is clearly not fine.

Generic mem::uninitialized<T> is unsound

Take, for instance, ::array-init, with a generic (over Array and its Array:Item) usage of mem::uninitalized:

  • EDIT: this comment targeted the version of ::array-init as of its writing: 0.0.4.
    ::array-init has since been patched to correctly use MaybeUninit :slight_smile:
pub
fn array_init<Array, F> (mut initializer: F) -> Array
where
    Array : IsArray,
    F : FnMut(usize) -> Array::Item,
{
    let mut ret: NoDrop<Array> = NoDrop::new(unsafe { mem::uninitialized() });
    // <At this point Rust knows that **we have elements of type Array::Item**>
    for i in 0 .. Array::len() {
        Array::set(&mut ret, i, initializer(i));
    }
    ret.into_inner()
}
  • Knowing that in Rust it is perfectly valid to define:

    enum Uninhabited {}
    

    then there is literally no value that can be of type Uninhabited (this is not something you can know coming from a C background, since C does not have uninhabited types).
    Meaning that if some code were to witness such an element, then that code cannot possibly be reached. So, if reaching that branch was based on some condition, then Rust is allowed to skip checking the condition altogether.

But the code shown above is able to create values of type Uninhabited:

enum Uninhabited {}

fn trust_me_this_cannot_be_false (condition: bool)
{
    if !condition {
        let unreachable: [Uninhabited; 1] = ::array_init::
            array_init(|_| -> Uninhabited {
                loop {} // an infinite loop typechecks with everything
            })
        ;
    }
}

If the above function is given a false condition, you may think that it may loop indefinitely. But it so happens that Rust is allowed to instead assume that the condition is never false, without even checking it (c.f., the code of array-init: the closure is called after having created uninitialized inhabitants, i.e, too late).

And now we can have memory unsafety:

fn main ()
{
    let slice: &mut [u8] = &mut [];
    trust_me_this_cannot_be_false(slice.len() == usize::MAX);
    for i in 0 .. usize::MAX {
        // Since array.len() == usize::MAX, and i < usize::MAX, bound checking can be skipped
        slice[i] = 0x42; // memory corruption
    }
}

I am not saying that the above program will always corrupt the memory (it could loop indefinitely, abort, or whatever), I am just saying that it would be legal for the compiler to corrupt the memory with it.

All this just because mem::uninitialized was used on a generic type. I wouldn't call this "non-issue UB" :wink:


The difference with MaybeUninit, by the way, is that MaybeUninit<Uninhabited> is inhabited.
Only when calling assume_init, after the closure is called, would he have unreachable code.
Which is fine, since a closure forging an element of an uninhabited type cannot possibly return (it must loop {} indefinitely, or die / end the thread of execution (e.g., aborting)).

11 Likes

I would like to point out the obvious (I don't mean to offend, it just seemed to get lost in this discussion). Unsafe Rust is not the same as C. So even if somethings look similar (uninitiated memory, pointers) that doesn't mean that what appies to C also applies to Rust. Rust makes more guarantees about its types to allow more aggressive optimizations. Since you seem to be new to Rust, you will need to learn some more about Rust to understand the differences from C and how they help. For example, how uninhabited types interact with control flow as @Yandros pointed out.

2 Likes