Possible to merge static arrays at compile time?

I'm trying to build up a flat static array from smaller arrays. The goal is to have something like

pub struct Store<const LEN: usize> {
    names: [&'static str; LEN],
}

fn main() {
    let mut name_list = Store::new(["a", "b"]);
    name_list.names(["c", "d", "e"]);

    // The below should be in the binary, everything else compiled out
    // names_list = Store { names: ["a", "b", "c", "d", "e"] };

    // ...
}

I first turned to const fn. In numerous attempts, the goal was something like:

impl<const LEN: usize> Store<LEN> {
    pub const fn names<const ADDITIONAL_LEN: usize>(
        mut self,
        more_names: Store<LEN>,
    ) -> Store<LEN + ADDITIONAL_LEN> {
        // let mut i = 0;
        // let mut names = [""; LEN + ADDITIONAL_LEN];
        // while i < LEN {
        //   names[i] =
        //     i += 1;
        // }

        Store {
            names: [self.names, more_names.names].concat(),
        }
    }
}

It doesn't compile, but it's more concise to explain than all my other attempts.

I found some other ideas from reddit, and read through Generating static arrays during compile time in Rust - DEV Community 👩‍💻👨‍💻 for guidance too.

I also spent several days learning macro_rules! and procedural macros until it clicked that you can only alter syntax, not values. And, I'm trying to avoid macros in the public API:

make_store!(Store::new(["a", "b"]).names(["c", "d", "e"]));

Is this possible?

I want it to be so bad I considered contributing to the language somehow, but I wouldn't know where to begin, and I don't wanna damage the language..

I don't know if you can get the addition to work on stable Rust, but this compiles:

pub struct Store<const LEN: usize> {
    names: [&'static str; LEN],
}


impl<const LEN: usize> Store<LEN> {
    pub const fn names<const ADD: usize, const SUM: usize>(
        &self,
        more_names: Store<ADD>,
    ) -> Store<SUM> {
        let mut result = Store {
            names: [""; SUM]
        };
        
        let mut i = 0;
        while i < LEN {
            result.names[i] = self.names[i];
            i += 1;
        }
        
        while i < SUM {
            result.names[i] = more_names.names[i - LEN];
            i += 1;
        }
        
        result
    }
}
1 Like

Thanks Alice, that does compile! The issue I'm noticing is that ADD and SUM are the same length, which for the example is 3. So, I end up with Store {names: ["a", "b", "c"]}, "d" and "e" are left out.

I also had to change it a bit to get it to work like the interface in the example without errors or having to specify the generics.

so for

let mut name_list = Store::new(["a", "b"]);
name_list.names(["c", "d", "e"]);

I had to alter it as follows, but I don't understand where the SUM is getting calculated, so I don't know why it's cutting the remaining strings out.

impl<const LEN: usize> Store<LEN> {
    // Alice's `names()` became `push()`
    // This was necessary to allow taking in an array parameter
    // without having to specify the length
    pub fn names<const ADD: usize>(&self, names: [&'static str; ADD]) -> Store<ADD> {
        // temp Store to satisfy `push()` method's signature
        let new = Store { names };

        self.push(new)
    }
    pub fn push<const ADD: usize, const SUM: usize>(&self, more_names: Store<ADD>) -> Store<SUM> {
        let mut result = Store { names: [""; SUM] };

        let mut i = 0;
        while i < LEN {
            result.names[i] = self.names[i];
            i += 1;
        }

        while i < SUM {
            result.names[i] = more_names.names[i - LEN];
            i += 1;
        }

        result
    }
}

@slanden in your last snippet you are not computing the sum of the arrays lengths.

The ideal solution

In the future / in nightly Rust we could write:

#![feature(generic_const_exprs)] // <- Allows computing `LEFT + RIGHT`

use lib::Store;
fn main ()
{
    let name_list = Store::new(["a", "b"]).names(["c", "d", "e"]);
    dbg!(name_list);
    // Can be embedded within the binary!
    static NAME_LIST: Store<5> = Store::new(["a", "b"]).names(["c", "d", "e"]);
    dbg!(&NAME_LIST);
}

mod lib {
    #[derive(Debug)]
    pub
    struct Store<const LEN: usize> {
        names: [&'static str; LEN],
    }
    
    impl<const LEN: usize> Store<LEN> {
        pub
        const
        fn new (names: [&'static str; LEN])
          -> Self
        {
            Self { names }
        }
    }
    
    impl<const LEFT: usize> Store<LEFT> {
        pub
        const
        fn names<const RIGHT: usize> (
            self: Store<LEFT>,
            right_names: [&'static str; RIGHT],
        ) -> Store<{ LEFT + RIGHT }>
        {
            Store {
                names: concat_strs_arrays(self.names, right_names),
            }
        }
    }

    const
    fn concat_strs_arrays<const LEFT: usize, const RIGHT: usize> (
        left: [&'static str; LEFT],
        right: [&'static str; RIGHT],
    ) -> [&'static str; LEFT + RIGHT]
    {
        let mut ret = [""; LEFT + RIGHT];
        const_for!(i in 0 .. LEFT {
            ret[i] = left[i];
        });
        const_for!(i in 0 .. RIGHT {
            ret[LEFT + i] = right[i];
        });
        ret
    }

    /* `const_for!` definition */
}

Now, the issue is that in current stable Rust (1.54.0 as of this writing), we cannot write a function signature with <const LEFT: usize, const RIGHT: usize>… -> …{ LEFT + RIGHT }.

Indeed, in the pathological case where LEFT and RIGHT would be huuge, the LEFT + RIGHT sum could overflow the backing usize storage, which is an invalid operation in const contexts.

This means that a function generic over those two LEFT and RIGHT const usize parameters can technically not handle the whole range of pairs of usize values, which means the function would have to be constrained. And it is not yet totally clear how to express this, API-wise, in a way that isn't error-prone, semver-wise. There may also be other subtleties or technical considerations involved, all of which make these sums be gated under an unstable nightly feature for the moment: generic_const_exprs (a few months ago this was const_evaluatable_checked and required extra where clauses; proof that the whole thing is unstable / in the works).


But what about stable Rust?

A stable workaround, aimed at using statics

  • (This is what @alice's solution was about)

The trick / idea is that rather than computing a sum of two const generic usizes, which is not allowed on stable Rust, we will have the function take yet another const generic usize parameter, SUM, which thus needs to be provided by the caller, and which will be expected / trusted by the callee to be equal to the sum of the first two parameters.

const
fn concat_strs_arrays<
    const LEFT: usize,
    const RIGHT: usize,
    const SUM: usize, // expected to be `LEFT + RIGHT`
>
(
    left: [&'static str; LEFT],
    right: [&'static str; RIGHT],
) -> [&'static str; SUM]
{
    // `const`-friendly assertion of equality between the two
    // (nicer error messages than out-of-bounds indexing or div by zero)
    let () = [(); ::core::usize::MAX][(LEFT + RIGHT) - SUM]; // errors if `LEFT + RIGHT < SUM`
    let () = [(); ::core::usize::MAX][SUM - (LEFT + RIGHT)]; // errors if `LEFT + RIGHT > SUM`

    let mut ret = [""; SUM];
    const_for!(i in 0 .. LEFT {
        ret[i] = left[i];
    });
    const_for!(i in 0 .. RIGHT {
        ret[LEFT + i] = right[i]; // errors if `SUM < LEFT + RIGHT`
    });
    ret
}

And you then add a SUM generic const parameter to .names() that gets propagated to this internal concat_strs_array call.

As you can see in that Playground, there is a small issue with this approach, as showcased in the body of the main() function:

use lib::Store;
fn main ()
{
    // "Type inference" error: missing sum!
    // let name_list = Store::new(["a", "b"]).names(["c", "d", "e"]);
    // dbg!(name_list);
    
    // Can be embedded within the binary!
    static NAME_LIST: Store<5> = Store::new(["a", "b"]).names(["c", "d", "e"]);
    dbg!(&NAME_LIST);
}

Indeed, now that we are taking an extra SUM parameter, we (the callers) are (expected) to provide the sum of the lengths!

This thus causes an error for the basic let name_list = Store::names(…) call, since in that case we are not providing it and Rust can't guess it for us.

But it doesn't cause a problem when defining a static (or a const, for that matter), since for those cases Rust requires that the type of the const/static be explicitly spelled out anyways.

Hence why this solution works quite well for your use case:

  • Indeed, you explicitly mention wanting that the resulting Store

    And the only way to guarantee that property is by putting the value in a static! (and if the value is not made of all zero bytes, otherwise a basic compression can happen by using the .bss section for ELF binaries, for instance).


Macro ergonomics ftw when just requiring compile-time computation (≠ static)

Imho, your requirement of having the Store be embedded in the binary is too restrictive. Having a static, that is, an explicit global value is only strictly needed when the static is (interiorly) mutable, or when doing advanced FFI / linking shenanigans, which are very niche.

I'll, thus, XY-guess your requirement: for the sake of performance, you want the concatenation to happen / to be pre-computed at compile-time, rather than runtime. And while a static is indeed a way to guarantee this, it's actually a bit more strict than the actual construct which expresses the "computed at compile-time" property: a const.

Indeed, by using a const, you are freeing Rust from the shackles of using statics, and thus, from the shackles of depending on linker stuff, which complicates things a bit (that is, there are different stages of "compile-time", and the earlier the stage, the more flexible we are, and a const is resolved at an earlier stage than a static. This is why consts can refer to other consts, but neither const nor statics can refer to other statics). Moreover, by using a const, you are allowing Rust to inline "copies of a theoretical global", which can thus sometimes yield better performance than using a static to begin with!

Relatedly, by using a const, you can pass the element by value even when it's not Copy, which thus allows you, through a macro, for instance, to bind the constant's value to a let binding, which thus allows you to skip that type annotation requirement.

And, related to using a macro, we are suddenly dealing with duck-typed / hacked "generics" (à la C++ templates), which thus removes that whole layer of theoretical concerns about generically constrained APIs such as the ones relating to generic_const_exprs.

That is, while <const X: usize, const Y: usize> … -> … { X + Y } cannot be computed in stable Rust, if you have const X: usize = 42;, const Y: usize = ["a", "b"].len();, for instance, that is, concrete constants rather than generic ones, then you can perfectly compute X + Y in a const context.

Which means that a macro can take two array expressions, for instance, and use $left.len() + $right.len() in a const context without problems.

Which thus yields the following API (look at how it has no type / length annotations):

fn main ()
{
    let name_list = Store![
        ["a", "b"],
        ["c", "d", "e"],
    ];
    assert_eq!(name_list, Store::new([
        "a", "b", "c", "d", "e",
    ]));
}

And it is indeed computed at compile-time, since the macro funnels the expression through an "anonymous" const:

pub fn _look_at_the_assembly () {
    let name_list = Store![
        ["a", "b"],
        ["c", "d", "e"],
    ];
    unsafe { ::std::ptr::read_volatile(name_list.names.as_ptr()); }

    // It is `const`-compatible!
    const _CONST_CONTEXT: () = {
        let _ = Store![
            ["h", "i"],
            ["h", "e", "l", "l", "o"],
        ];
    };
}
  • which yields:

    playground::_look_at_the_assembly:
    	subq	$24, %rsp
    	movups	.L__unnamed_1(%rip), %xmm0
    	movaps	%xmm0, (%rsp)
    	movq	(%rsp), %rax
    	movq	8(%rsp), %rax
    	addq	$24, %rsp
    	retq
    
    .L__unnamed_2:
    	.byte	97 // b'a'
    
    .L__unnamed_3:
    	.byte	98 // b'b'
    
    .L__unnamed_4:
    	.byte	99 // b'c'
    
    .L__unnamed_5:
    	.byte	100 // b'd'
    
    .L__unnamed_6:
    	.byte	101 // b'e'
    
    .L__unnamed_1:
        // &str
    	.quad	.L__unnamed_2 // .ptr
    	.asciz	"\001\000\000\000\000\000\000" // .len: 1_usize
        // &str, etc.
    	.quad	.L__unnamed_3
    	.asciz	"\001\000\000\000\000\000\000"
    	.quad	.L__unnamed_4
    	.asciz	"\001\000\000\000\000\000\000"
    	.quad	.L__unnamed_5
    	.asciz	"\001\000\000\000\000\000\000"
    	.quad	.L__unnamed_6
    	.asciz	"\001\000\000\000\000\000\000"
    

Playground (and implementation of the macro)

4 Likes

@dtolnay has written some experimental crates that do something similar to this, albeit using completely different ways to accomplish it. For those, look at linkme and inventory.

1 Like

Yeah along the lines of linkme is definitely what I would pursue here, instead of const.

3 Likes

Indeed; based on link sections / linker shenanigans. This has the advantage of featuring a maximally versatile call site; mainly, that of being able to provide the different array elements scattered across the code base (although the order of the elements is thus obviously not guaranteed). This can be incredibly powerful for, for instance, featuring a function annotation / (proc-macro) attribute that will yield the list of the so-annotated function names bundled within some static array . Very useful for self-description / reflection.

It does have the disadvantage of involving static rather than consts, with the caveats I've mentioned above (so, on this regard, it does offer a less flexible output), and, more importantly, the linker shenanigans involved are not 100% portable: while they work, for instance, on the three main platforms, for people targeting Wasm or more exotic platforms this is not an option either.

  • (and in the case of inventory, it also has the drawback of involving a tiny bit of life-before-main runtime cost, although it ought to be negligible compared to the cost of spawning the very process)

That's why, in order to be minimally restrictive, unless there was a need to provide the elements of some single array scattered across the code base, I recommend that the const-based approach, be it through manually written sums, or through a macro that computes those for us, be the one taken.

1 Like

Thank you, @Yandros, for that thorough walk-through. Describing that difference with static and const was enlightening, and you're right; What I actually care about is that the given API, and its structs and functions, will be thrown out during compilation after being transformed into another form.

The example I've given was the simplest I could think of that would still require what I need to do. A tad bit more realistically (trying to avoid complicating the example), it might look like

let store1 = Store::new(["a", "b"]).names([
    "c",
    Store::new(["d", "e"]), // .names(...)...
    "f"
]);

I'm trying to build up the array, but the nesting is meaningful. I think the macro could still work, but if there are macros, I'd strongly prefer to hide them as an implementation detail. And that's where I was running into the issue that you can't work on values with macros.

Trying to hide the macro (using your code as example) like

impl<const LEFT: usize> Store<LEFT> {
    pub const fn names<const RIGHT: usize, const SUM: usize> (
            self: Store<LEFT>,
            right_names: [&'static str; RIGHT],
    ) -> Store<SUM> {
        Store {
            // `build_arr!` couldn't do anything useful because all the
            // macro had was the literal string value "self.names", so I
            // couldn't get the data, and I couldn't build out the array
            // as a string to then parse, e.g., `format!("{:?}", $names)`
            names: build_arr!(self.names, right_names)
        }
    }
}

fn main() {
    let store = Store::new(["a", "b"]);
    store.names(["c", "d", "e"]);
}

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.