Help in undestanding how a const generic function outputting an array works

Hello everyone !

After a few attempts, I finally managed to create a const generic function that the Rust compiler was okay with:

fn array_create<F, T, const N: usize>(a0: &T, f: F) -> [T; N]
where
    T: Copy,
    F: Fn(T) -> T {
    let mut a: [T; N] = [*a0; N];
    for i in 1..N {
        a[i] = f(a[i-1]);
    }
    a
}

fn main() {
    const N: usize = 10;
    let a0: f64 = 0.5;
    let a: [f64; N] = array_create(&a0, |x| x * (1.0 - x));
}

What I don't understand about this code is how is the array_create function able to infer the size of the array it must output when the size is not passed as an argument to the function ? What kind of desugaring is happening in the background ?

Thank you in advance !

A separate function is created for each unique N that is used in your code.

In your example, you assign the result of the function to an array of size 10, so a function with N=10 is created.

see monomorphization

When you call array_create(&a0, |x| x * (1.0 - x)), the compiler knows that a0 is a f64 and the type of the closure, therefore it is able to deduce that you are calling the function array_create::<{closure type}, f64, _>. At this point, it doesn't yet know what the value of the const parameter is so the return value can be thought of as something like [f64; _] where _ means it's currently unknown.

However, you then assign the result to a which has the type [f64; 10] (because [f64; N] is [f64; 10]). At this point, the compiler tries to unify the types [f64; 10] and [f64; _] and learns that the unknown const parameter's value is 10.

This is essentially the same process that allows this code to work ("bidirectional type inference"):

let mut v = Vec::new(); // at this point, the compiler knows v has type `Vec<_>`

v.push(1u32); // because the argument to push is the same type as the type parameter
              // to `Vec`, the compiler is able to infer above that `v` is 
              // of type `Vec<u32>`
4 Likes

That wasn't what I was asking about. I do understand how generics work in Rust. Guess my question wasn't clear enough.

The thing I don't understand about the behavior of the example I shared is: How is it possible for the array_create function to know the value of N when N is declared outside of its scope and its value never passed to it ? (N is not an explicit argument of the function.)

Closures, for example, are explicitly documented as being able to "capture their environment", which is - as far as I know - not a feature of regular functions, hence my confusion.

Does it mean that array_create::<{closure type}, f64, _> sort of works like an iterator on which the compiler calls .take(N) once it has been able to infer the value of N ?

I'm not sure I understand your confusion. Nothing from the calling environment is being captured like in a closure. The return type of the function is being determined by the type of the variable a in the same way any generic type parameter might be and the compiler reasons that N is 10 from that.

Edit: It may be helpful to rename N in main to something else to see that it isn't being captured:

If you're asking, how can the function know the value of N as a number as opposed to a parameter for [T; N], then it's important to note that the N-as-const-parameter in the function and N-as-specific-number in main are essentially unrelated entities and you're just confusing yourself by giving them the same name, as m51 pointed out.

But also, when you say "the value is never passed", that's not strictly true - although N is not a (runtime) parameter of the function, it is still part of the environment that the (monomorphized) function is compiled in, like all const items are. The only difference is that it is in "a deeper scope", specific to each monomorphization, which means different instances of the function can "see" it with different values. However, it is still part of the compilation environment rather than a runtime parameter, and there will be no storage/lifetime/etc associated with N in the generated code. It will be substituted at the use-site and then constant-folded in exactly the same way as constants you'd declare with specific values.

2 Likes

You misunderstand me. I never said that N was being captured. I was just making an analogy with closures based on the fact that a variable can be declared outside of a closure yet be used inside that closure without being explicitly passed to the closure.

I was basing my analogy on the fact that N is declared outside the function's scope yet the function still knows that N=10 even though we have not explicitly passed this value to the function. Unless I'm mistaken, Rust functions should not be able to have access to information outside of their scope.

My question can be rephrased as: how come array_create knows that N=10 ? That seems to contradict how functions normally work. Renaming N to Q just changes the question to: then how come array_create knows the value of Q ?

Same as my answer to @m51, renaming N to Q does not answer the question: at what point is the value of N or Q communicated to array_create ?

So what you are saying is that the Rust compiler uses const generics as a sort of macro system, using the size of the declared array to compile a version of the function specifically for it. Reminds me a little of C preprocessor macros.

Do I get it, or am I completely wrong again ? :sweat_smile:

You basically got it, but you're really seeing a distinction where there isn't one - all generic parameters work by creating separate monomorphizations of the item (type, function, or anything else) with specific instances of the parameters substituted in, whether those parameters are typenames or values. It is conceptually similar to C macros, but unlike C, it's all type safe and integrated into the language proper rather than doing raw text substitution on top of it. (So no issues with having to surround your macro output with brackets for precedence reasons)

Before generic consts were a thing, where the only non-lifetime you could use as a parameter was another type name, we produced the same effect it by hacking it with associated constants. You could write this:

trait Storage {
    const CAPACITY: usize;
}

struct Small;
impl Storage for Small {
    const CAPACITY: usize = 10;
}

struct Big;
impl Storage for Big {
    const CAPACITY: usize = 10000;
}

fn do_something<T: Storage>() {
    for _ in 0..(T::CAPACITY) {}
}

Which meant if you then called do_something<Small>, the compiler would see that and resolve it into something equivalent to:

fn do_something_small() {
    for _ in 0..Small::CAPACITY {}
}

And Small::CAPACITY, individually, is a known constant that can be subtituted at the point of use:

fn do_something_small() {
    for _ in 0..10 {}
}

And from there, it's no different than if you'd hard-coded the value in the original source.

2 Likes

For an example of this inference in action with a generic type parameter that is not passed instead of a const parameter, consider

fn empty_array_create<T>() -> [T; 0] {
    []
}

fn main() {
    // works
    let _arr: [Vec<String>; 0] = empty_array_create();
    // fails
    // let _arr = empty_array_create();
}

You can think about them as desugaring to

empty_array_create::<Vec<String>>();
array_create::<_, f64, N>(&a0, |x| x * (1.0 - x));

However, there is no actual desugaring -- there are situations where you can't use turbofish, but inference will still work.

1 Like

This is a very interesting example, but what would be the proper syntax for calling them in main ? Is it:

fn main() {
    do_something<Big>();
}

or is it a usecase for the turbofish operator:

fn main() {
    do_something::<Big>();
}

?

It's the latter (the compiler will catch if you type the former and tell you to use the latter).

1 Like

Thank you everyone !

Things are a lot more clear like this.