How to create a long array with non-copyable element?

pearzl · July 14, 2019, 9:11am

I'm trying to do this:

let dict: [Vec<usize>; 26] = [vec![]; 26];

However it doesn't works because vec is not implementd Copy.
So I have to write like this:

let dict: [Vec<usize>; 26] = [vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![],vec![]];

is there a better way? thanks

Michael-F-Bryan · July 14, 2019, 9:57am

The answer probably depends on your context, but by far the easiest way is to just use a Vec<Vec<usize>> with 26 elements and initialize it in a loop. Arrays in Rust aren't as useful as in other languages.

I think the problem is that there's no (safe) way to create an array of Vec<usize> on the stack in one operation... Normally to initialize a [T; n] array it'd use a memcpy to create n copies of a valid bit pattern of T, but because Vec<usize> isn't Copy, that memcpy operation wouldn't be sound. That means you've got to initialize your array of items one-by-one which isn't really a safe operation to do (i.e. safe Rust assumes all variables contain a valid copy of the type they contain, but if only half your array is initialized then the other half isn't, and that assumption is invalid).

Another answer is to use std::mem::MaybeUninit and some unsafe code. This is my attempt (playground) at initializing an array using non-Copy items.

I know @nikomatsakis is coordinating the unsafe code workgroup, so hopefully he may know a better way of doing this or be able to point you in the right direction, because my answer using unsafe code kinda sucks...

steveklabnik · July 14, 2019, 2:21pm

There’s the array-init crate.

DoumanAsh · July 14, 2019, 3:02pm

use mem::uninitialized and properly fill in array.

Sadly there is no good way to do it otherwise, other than doing some macro

Michael-F-Bryan · July 14, 2019, 3:06pm

That's what I was getting at with MaybeUninit. From what I've heard mem::uninitialized() is insta-UB...

pearzl · July 14, 2019, 3:39pm

it's enlightening!
Although I can't use third-part lib now.
Thanks anyway.

Yandros · July 14, 2019, 7:27pm

::array-init crate is not sound, since it relies on a generic and unbounded mem::uninitialized.

Now that MaybeUninit is stable, working with generic uninit buffers can finally be done soundly (although it still requires great care).

See the following stand-alone solution:

macro_rules! array {(
    $closure:expr; $N:expr
) => ({
    use ::core::{
        mem::{
            forget,
            MaybeUninit,
        },
        ptr,
        slice,
    };

    const N: usize = $N;
    
    #[inline(always)]
    fn gen_array<T> (mut closure: impl FnMut(usize) -> T) -> [T; N]
    {
        unsafe {
            let mut array = MaybeUninit::uninit();
    
            struct PartialRawSlice<T> {
                ptr: *mut T,
                len: usize,
            }
            
            impl<T> Drop for PartialRawSlice<T> {
                fn drop (self: &'_ mut Self)
                {
                    unsafe {
                        ptr::drop_in_place(
                            slice::from_raw_parts_mut(
                                self.ptr,
                                self.len,
                            )
                        )
                    }
                }
            }

            let mut raw_slice = PartialRawSlice {
                ptr: array.as_mut_ptr() as *mut T,
                len: 0,
            };
            
            (0 .. N).for_each(|i| {
                ptr::write(raw_slice.ptr.add(i), closure(i));
                raw_slice.len += 1;
            });
    
            forget(raw_slice);
            array.assume_init()
        }
    }

    gen_array($closure)
})}


fn main ()
{
    // init by providing a FnMut closure mapping each index to each value
    let dict: [Vec<usize>; 26] = array![|idx| vec![]; 26];
    dbg!(&dict[..]);
}

Playground

DoumanAsh · July 14, 2019, 8:52pm

AFAIK as long as you correctly use ptr::write to initially write value it is safe. The same as MaybeUninit.
It is just MaybeUninit makes a clear distinguish between uninitialized value and regular one as you can use assume_init to retrieve initialize T

But both require careful write as you must not replace value inside uninitialized memory (i.e. requires to use pointers to write into)

Cerber-Ursi · July 15, 2019, 2:44am

mem::uninitialized will in fact be insta-UB, if the type in question has incorrect values, regardless of how carefully you're initializing this memory afterwards.

hellow · July 15, 2019, 5:38am

Insta UB! No matter what. That's the reason it has been deprecated in favor of MaybeUninit.

DoumanAsh · July 15, 2019, 5:59am

Then initialize with correct value, I don't understand the problem.

You can wrongly initialize even MaybeUninit and then just assume it is correct, and get the same UB.
So to me it is the same shit, you need to write raw pointers to initialize such memory.

It is only question of safer semantics with MaybeUninit, it's nature is the same.

P.s. I'm coming from C background so don't scare me with non-issue UB

Hyeonu · July 15, 2019, 6:35am

Compared to C, Rust requires far much stricter requirement for the sake of aggressive optimization. Those are auto-proved by compiler in safe context, but in unsafe context it's your responsibility to provide all of them. Unsafe Rust is more unsafe than C.

hellow · July 15, 2019, 7:25am

@RalfJung posted a very neat article about this. Maybe give it a try

DoumanAsh · July 15, 2019, 8:28am

Look, pal, I started with C and C++ so I'm pretty well aware of what are risks of uninitialized memory, especially in C++ with its object model.
So you don't need to explain it to me, I just want people stop disregarding any solution that has unsafe or UB

P.s. especially when there is no safe option

hellow · July 15, 2019, 8:52am

"pal"...

You can't tell a person new to rust: "Don't worry about unsafe, just use this mem::uninitialized and you'll be fine". That's not how it works. That's not how it ever should work!
Sure, you might have a C background, but OP doesn't. He may now think: unsafe? Seems like it is the solution to all my problems! Let's get starting!
Hell no! unsafe should only be the solution, if you know how to deal with a summoned daemon.
Also: Always disregard a solution that has UB, because it is not defined behavoir. Look at the very first example of ralfs code:

fn always_returns_true(x: u8) -> bool {
    x < 150 || x > 120
}

fn main() {
    let x: u8 = unsafe { mem::uninitialized() };
    assert!(always_returns_true(x));
}

this always returns false no matter what. You can't explain this to a person new to rust. (This returns false every time in this specific environment with the exact same compiler). This is UB. Period.
When you are using unsafe you have to make certain guarantees to the compiler and if you fail to do so, it is UB. There is nothing like: Let's take this UB and take it to our advantage. That's not how it works (Also not in C).

Cerber-Ursi · July 15, 2019, 8:59am

Between two solutions, which are both unsafe, but:

the first can trigger insta-UB in several cases, and it's not entirely clear when it will and when it won't,
the second can be proved (not by compiler, but by programmer) not to trigger UB at all,

I think the second is clearly preferable, and that's the point of insisting on MaybeUninit over mem::uninitialized.

Stargateur · July 15, 2019, 9:02am

The problem is that Vec implement drop so if the program is stop for whatever reason (multi treading) your array of not initialized vec will make your program enter in UB state. Maybeunit is a tool that prevent Drop implementation to be call. That why it's much better and not UB.

DoumanAsh · July 15, 2019, 9:15am

Sure, you might have a C background, but OP doesn’t. He may now think: unsafe? Seems like it is the solution to all my problems! Let’s get starting!

People learn from their mistakes, eventually we all would need to use unsafe code.

Hell no! unsafe should only be the solution, if you know how to deal with a summoned daemon

If that's the case, then Rust's unsafe is one big flaw and not feature.
But that's false, and you know it.
So suggesting unsafe is fine.

this always returns false no matter what. You can’t explain this to a person new to rust.

Not true?
UB means it can be false or true.
But that's not the problem actually, because code is perfectly safe with uninitialized integer as it doesn't have dtor.

@Cerber-Ursi

Between two solutions, which are both unsafe , but:

If you mean between MaybeUninit then yes, of course.

It has better semantics to avoid obvious pitfals

@Stargateur

Sure, but you can properly initialize it and there will be no problem.
MaybeUninit is not good for usage as type since it is needed only for one time initialization.
So use case would be initialize MaybeUninit<[T; N]> and then call assume_init to get proper array

Yandros · July 15, 2019, 10:25am

UB means that there has been a contract violation with the compiler; there is no such thing as "non-issue UB". Maybe some compiler does not exploit something resulting in an implementation-based platform-based definition of behavior. Meaning that you would have a "safe" crate for a specific version of rustc and a specific architecture. In other words: you might as well just be sharing a binary release of the program.

Taking the uninitialized integer example, maybe some version of the compiler and on some architecture the function always returns true because the compiler does not exploit uninitalized integers even though it could. Later on, a new version of the compiler realises it can exploit it for more efficient binaries, resulting in the always_return_true function breaking. Whose fault is that? The programmer's.

Now, coming back to mem::uninitialized, when the type is inhabited, using only ptr::write on it could be seen as fine (e.g., for integer types this is being deliberated),

but there are cases when even this is clearly not fine.

Generic `mem::uninitialized<T>` is unsound

Take, for instance, ::array-init, with a generic (over Array and its Array:Item) usage of mem::uninitalized:

EDIT: this comment targeted the version of ::array-init as of its writing: 0.0.4.
::array-init has since been patched to correctly use MaybeUninit

pub
fn array_init<Array, F> (mut initializer: F) -> Array
where
    Array : IsArray,
    F : FnMut(usize) -> Array::Item,
{
    let mut ret: NoDrop<Array> = NoDrop::new(unsafe { mem::uninitialized() });
    // <At this point Rust knows that **we have elements of type Array::Item**>
    for i in 0 .. Array::len() {
        Array::set(&mut ret, i, initializer(i));
    }
    ret.into_inner()
}

Knowing that in Rust it is perfectly valid to define:
```
enum Uninhabited {}
```
then there is literally no value that can be of type Uninhabited (this is not something you can know coming from a C background, since C does not have uninhabited types).
Meaning that if some code were to witness such an element, then that code cannot possibly be reached. So, if reaching that branch was based on some condition, then Rust is allowed to skip checking the condition altogether.

But the code shown above is able to create values of type Uninhabited:

enum Uninhabited {}

fn trust_me_this_cannot_be_false (condition: bool)
{
    if !condition {
        let unreachable: [Uninhabited; 1] = ::array_init::
            array_init(|_| -> Uninhabited {
                loop {} // an infinite loop typechecks with everything
            })
        ;
    }
}

If the above function is given a false condition, you may think that it may loop indefinitely. But it so happens that Rust is allowed to instead assume that the condition is never false, without even checking it (c.f., the code of array-init: the closure is called after having created uninitialized inhabitants, i.e, too late).

And now we can have memory unsafety:

fn main ()
{
    let slice: &mut [u8] = &mut [];
    trust_me_this_cannot_be_false(slice.len() == usize::MAX);
    for i in 0 .. usize::MAX {
        // Since array.len() == usize::MAX, and i < usize::MAX, bound checking can be skipped
        slice[i] = 0x42; // memory corruption
    }
}

I am not saying that the above program will always corrupt the memory (it could loop indefinitely, abort, or whatever), I am just saying that it would be legal for the compiler to corrupt the memory with it.

All this just because mem::uninitialized was used on a generic type. I wouldn't call this "non-issue UB"

The difference with MaybeUninit, by the way, is that MaybeUninit<Uninhabited> is inhabited.
Only when calling assume_init, after the closure is called, would he have unreachable code.
Which is fine, since a closure forging an element of an uninhabited type cannot possibly return (it must loop {} indefinitely, or die / end the thread of execution (e.g., aborting)).

RustyYato · July 15, 2019, 10:50am

I would like to point out the obvious (I don't mean to offend, it just seemed to get lost in this discussion). Unsafe Rust is not the same as C. So even if somethings look similar (uninitiated memory, pointers) that doesn't mean that what appies to C also applies to Rust. Rust makes more guarantees about its types to allow more aggressive optimizations. Since you seem to be new to Rust, you will need to learn some more about Rust to understand the differences from C and how they help. For example, how uninhabited types interact with control flow as @Yandros pointed out.

Topic		Replies	Views
What's the best way to initialize large array? help	9	11075	January 12, 2023
Constructing arrays help	5	465	July 22, 2022
Efficient ways to initialize array/vector/struct containing all copyable elements? help	7	4085	April 30, 2021
[AtomicU16::new(0); N]; for const N: usize?	5	500	June 28, 2022
Array of strings in rust_lang help	9	14465	October 9, 2019

How to create a long array with non-copyable element?

Generic mem::uninitialized<T> is unsound

Related Topics

Generic `mem::uninitialized<T>` is unsound