"array lengths can't depend on generic parameters" with const generics... bug or expected behavior?

Posting this here because I honestly can't tell. The following code:

#![feature(const_generics)]

struct StaticVec<T: Copy + Default, const N: usize> {
    data: [T; N],
}

impl<T: Copy + Default, const N: usize> StaticVec<T, {N}> {
    fn new() -> Self {
        Self {
            data: [Default::default(); {N}],
        }
    }
}

fn main() {}

gives:

error: array lengths can't depend on generic parameters
  --> src/main.rs:10:40
   |
10 |             data: [Default::default(); {N}],
   |                                        ^^^

which doesn't make a lot of sense to me, because array lengths clearly can depend on generic parameters as evidenced by the fact the "data" field is able to exist at all.

Is this just a current implementation limitation of const generics? Or is there something else I'm missing here?

Don't use curly braces for arrays

Changing the struct initialization to:

data: [Default::default(); N],

gives the exact same "can't depend on" error, at the same place.

I don't know if the array syntax works with const-generics yet, but with a little unsafe you can do this which compiles and does exactly what you want.

I found while playing around with it some more that this also seems to work:

impl<T: Copy + Default, const N: usize> StaticVec<T, {N}> {
    fn new() -> Self {
        // The const_assert macro from the static_assertions crate doesn't
        // work in this context yet, but it'd obviously be preferable here.
        assert!(
            N > 0,
            "StaticVec instances must have a length greater than zero."
        );
        unsafe {
            let mut _data: [T; N] = std::mem::zeroed();
            for i in 0..N {
                _data[i] = Default::default();
            }
            Self { data: _data }
        }
    }
}

this this is instant UB if T has NonZero* types inside of it. Don't use mem::zeroed or mem::uninitialized instead, use MaybeUninit::zeroed or MaybeUninit::uninit (these don't suffer from the same problem). More over you are trying to drop invalid data, for example with String (which is already instant UB from the mem::zeroed) would be trying to drop a null pointer when you overwrite it with _data[i] = Default::default()

Before you write any more unsafe code on your own, please read the Rust Nomicon

Also, I'm not sure why you are trying to prevent 0 length. Doesn't seem necessary.

1 Like

Following up about using unsafe, I suggest reading the first of the Rust koans and meditating on what it teaches.

3 Likes

Well, I'd rather have not used any of the unsafe at all TBH.

I'd still like to just do what was in my original comment (and at this point, I do really think the fact it doesn't work currently is simply a const generics bug, or at least just a syntactic area where they're not "recognized" yet, so to speak.)

Even with the for loop solution, I wouldn't have used the zeroed() if it was possible to tell the compiler that the for-loop is what's intended to initialize the array.

Also, String isn't Copy and wouldn't compile with my struct at all. I get your overall point though.

Right, I forgot about the Copy bound. Then the writes should be fine, but the zeroed() still isn't. By the way, I made a crate a few weeks ago array-vec that you can use to initialize an array with completely safe code. You can use that like so

use array_vec::ArrayVec;
use std::convert::TryInto;

impl<T: Copy + Default, const N: usize> StaticVec<T, {N}> {
    fn new() -> Self {
        // The const_assert macro from the static_assertions crate doesn't
        // work in this context yet, but it'd obviously be preferable here.
        assert!(
            N > 0,
            "StaticVec instances must have a length greater than zero."
        );
        
        let mut array = ArrayVec::<T, {N}>::default();
        
        array.extend(std::iter::repeat(T::default()));
        
        Self {
            data: array.into_array()
        }
    }
}

edit: please use array-vec version 0.1.3 (I made a mistake in 0.1.2 and marked into_array unsafe even though it should be safe)

Thanks, I'll look into it.

In general though, would something like this (which appears to work fine) then be a more "correct" way to do it, in a scenario where you actually had to use unsafe for whatever reason:

impl<T: Copy + Default, const N: usize> StaticVec<T, {N}> {
    fn new() -> Self {
        // The const_assert macro from the static_assertions crate doesn't
        // work in this context yet, but it'd obviously be preferable here.
        assert!(
            N > 0,
            "StaticVec instances must have a capacity greater than zero."
        );
        let mut _data = unsafe { MaybeUninit::assume_init(MaybeUninit::<[T; N]>::uninit()) };
        for i in 0..N {
            _data[i] = Default::default();
        }
        Self {
            data: _data,
            length: 0,
        }
    }
}

That's what was in my playground link before :slight_smile:

Oh, yeah, it is exactly the same (mostly), haha.

Was there a particular reason you did this part:

data: unsafe { (&data as *const _ as *const [T; N]).read() }

versus just using the variable?

This is identical to calling mem::uninitialized. (i.e. just as bad)

The original playground posted earlier only used that pattern to construct [MaybeUninit<T>; N], not [T; N].

Are you sure? A version of the last new() I posted, modified so that it also only calls default() once seems to generate literally the same assembler code as their original new() from the playground link.

Looking at assembly code tells you nothing about whether something is undefined behavior. As the C Standard says about UB (which aligns with its meaning in Rust): (emphasis added)

Anything at all can happen; the Standard imposes no
requirements. The program may fail to compile, or it
may execute incorrectly (either crashing or silently
generating incorrect results), or it may fortuitously do
exactly what the programmer intended.

This is why const generics give an un-suppressable error every time you enable them. They're not done yet; they're called unstable features for a reason.

1 Like

Yes, UB has nothing to do with the generated assembly and everything to do with the possible optimizations allowed. For example if you had,

struct Foo(NonZeroU8);

impl Default for Foo {
    fn default() -> Self {
        Self(NonZero::new(13).unwrap())
    }
}

// UNDEFINED BEHAVIOR
let x = MaybeUninit::assume_init(MaybeUninit::<Foo>::uninit());

This would be UB. Why, because uninitialized memory is allowed to assume that every bit of Foo is uninitialized, and so any possible bit pattern can be used each and every time that it is read, even different bit patterns across different reads with no writes in-between.
This means that it can assume that Foo has the bit pattern 0b0000_0000, which is undefined behaviour, because NonZeroU8 can never be 0. Even with types defined for all bit-patterns like u8, reading uninitialized memory is UB (technically the jury is still out on this, but better to stay safe and assume that it is UB).

See this blog for more info

Fair enough. I really couldn't tell whether it was supposed to be like that or not, though.

I guess what's confusing me here is, why would I be concerned about uninitialized memory in this case at all, when I have a for loop that explicitly does initialize all of the values in the array, properly, with Default?

Because for the brief moment that you are in the process of initializing the array, LLVM (which Rust uses to generate machine code) could decide that you are invoking UB, and make arbitrary transformations to your code which may lead to very hard to find bugs.

But how exactly is your version doing anything fundamentally different than the last one I posted? It's a static array. The memory is allocated in advance. It's not clear at all to me why

let mut data: [MaybeUninit<T>; N] = unsafe {
    MaybeUninit::uninit().assume_init()
};

would or should be interpreted any differently than

MaybeUninit::assume_init(MaybeUninit::<[T; N]>::uninit())

in both cases the compiler is told to "assume the array is initialized", which it is, immediately afterwards.