Does a type like `[[i32]]` need to be forbidden?

Is it really the best way of thinking about it? It's very easy to imagine Swift-style “dynamic Rust” where [[i32]] is easy, simple, and entirely mundane.

Real Rust have picked a different style compilation way which made [[i32]] hard to implement in it and thus made [[i32]] illegal, but I still prefer to think about things like that as about “something that is excluded from the compiler to not repeat ALGOL68 mistake of producing no implementations and no users”.

I know that for me that explanation is much better than simple assertion that “void is not a type” and “you can't have variable of type void” that my teachers gave me so many years ago and which always sounded hollow to me (and which Rust brilliantly disproves because in Rust () is a perfectly valid type and you can create a variable of type ()).

Because “this would just have made compiler too complicated” is much easier to swallow than “this is not a type, but we wouldn't tell you why”. Again, at least for me.

[[i32]] is definitely an impossible type no matter what ABI you invent. One of the main reasons the slice type exists is because it is always indexable with pointer math. You can't do that if the elements are different sizes, hence the Sized requirement.

Rust could make [i32] ownable by changing the ABI, but they wouldn't ever make [[i32]] possible.

3 Likes

Oh well… :sweat_smile:

The possibility that one might take a remark such as

and follow up with the question/discussion of “what could be the meaning of [[i32]]”, even though that’s entirely besides the point of the technical remark @CAD97 was trying to make[1], was exactly the motivation for my clarifying message that


Edit: This seems sufficiently off-topic from the original question of whether a generic “T” includes “[T]” that I’ve split the topic.


  1. a remark which exclusively addressed niche cases where the compiler would currently kind-of accept “[[i32]]” as a type but did never imply there could ever be any value of type [[i32]] ↩︎

So what? Add some table to [[i32]] and use another level of indirection.

It maybe a good enough reason not to implement it in the compiler, it's not good enough reason to declare that it's not a type. void is “not a type” in C for similar reason, but later Rust did a different choice.

Why not? Array of pointers to slices which live after these pointers would work perfectly fine for it.

That’s [&[i32]]. If [[i32]] meant the former, it would break the compositionality of the type algebra. [T] is an array without indirection, so [[T]] is an array of arrays without indirection. But if your arrays have indirection intrinsically, then it’s of course okay. int[][] is a perfectly valid type in Java.

5 Likes

I don't think it's much different from a type like std::iter::Flatten<String>. The type has a where bound requiring that the generic parameter implements Iterator, but String doesn't.

Similarly, [T] has the where bound T: Sized.

4 Likes

One of the things that I like about Rust is that there aren't invisible layers of indirection.

If you want [&[T]], Rust has that.
If you want [Vec<T>], Rust has that.
If you want [impl AsRef<[T]>], Rust has that.
If you want [&dyn AsRef<[T]>], Rust has that.

"No, that's not a legal type, so say what you want here" is a great answer.

15 Likes

This implies some other place where & points to. I'm talking about self-referential type.

Yes, Rust doesn't like them, but again, that's something of an arbitrary decision, C++ support them.

That's good answer, but this doesn't explain “why” it's not a legal type.

“We don't want to add it to the language, because it would make implementation of the language too complex for no good reason” is a valid answer, “no, that's not a type because I said so” is not.

I don’t believe [[i32]] support is really achievable, but it could be interesting to discuss some prior steps.

For example, ([i32], [i32]) - why doesn’t that work currently? I’d say two reasons. First, unsized fields currently must be the last field in a struct. Arguably, that rule could be changed though. (I’m not even 100% sure why it technically exists.) But then, second:multiple unsized fields would still be a significant problem, since you’d need to handle the metadata for two types.

That is, assuming that ([i32], [i32]) or struct Foo<S: ?Sized, T: ?Sized>(S, T) does mean that the two slices have different lengths. (Would that also apply to struct Foo<T: ?Sized>(T, T)? That seems… well… complicated, too.) The main issue with that is that pointer metadata currently all should be one usize-word large. Though that needn’t be a law of nature either.

So if we had all of (), ([i32],), ([i32], [i32],), ([i32], [i32], [i32],), ([i32], [i32], [i32], [i32],), … then [[T]; N] isn’t far off:

If we allow arbitrarily large metadata, and a one-metadata-per-value approach, then something like [[T]; N] might be possible, where &[[T]; N] would essentially have a metadata of type [usize; N] encoding the lenghts. It’s debatable whether that should necessarily be the true array type [_; N], or better its own thing, so arrays can always keep the assumption that “offset-calculation is a simple ‘base + size * index’ calculation”. (This also, once again, runs into the design question of whether perhaps we want a [[T]; N]-style type to actually mean that all contained slices can have different lengths, or if they should all be the same. Maybe we’d want to allow both possibilities – well yay, that’s two new ‘array’-like types then!)

[[i32]] is still problematic then, though. (Except in the “this means all contained slices have the same size” interpretation.) If – due to metadata – the pointer types &[[i32]; 0], &[[i32]; 1], &[[i32]; 2], &[[i32]; 3], … all have different sizes, you cannot create values of a single type “&[[i32]]” that unifies them all. Interestingly enough, if you consider something like ThinBox, I’d say that - arguably - since in this world ThinBox<[[i32]; 0]>, ThinBox<[[i32]; 1]>, ThinBox<[[i32]; 2]>, ThinBox<[[i32]; 3]>, etc should all still have the same size, it’s not entirely unreasonable that perhaps ThinBox<[[i32]]> could be made to work, somehow. That’s a lot of ifs… but maybe, maybe, this way [[i32]] could have a (very limited) meaning after all? I don’t believe that this is something that we would ever want to pull off in Rust proper (and even if, I also think it should be its own kind of type, not literally using the existing “slice” type[1]). Too complicated for too little benefit. Here’s my thoughs anyways:

It’d still require some additional language mechanisms. ThinBox<Ty>’s API works in practice by creating &Ty or &mut Ty references for you (by loading the metadata from memory), but &[[i32]] didn’t quite work, as discussed above. Well more technically, converting &[[i32]; N] to some &[[i32]] types doesn’t work. Maybe &[[i32]] has a representation with metadata-behind-a-reference, i.e. somewhat of a &[usize] metadata. This would preclude &'a [[i32]; N]to &'a [[i32]] conversions entirely; however it would still allow &'b &'a [[i32]; N] to &'b [[i32]] re-borrows, and also &'b ThinBox<[[i32]]> to &'b [[i32]] refererencing, I suppose.

There’s likely endless alternative language designs in this kind of direction of “assign some meaning to [[i32] (or [[i32]; N] or ([i32], [i32]))”, and in any case, I’d say even support for something like ([i32], [i32]) is sufficiently nontrivial that we would likely want to focus entirely on that alone before generalizing any further.


Notably, ([i32], [i32]) seems somewhat desirable to me on its own, because with unsized-type function arguments (which already have unstable language support), a fn foo(x: [i32], y: [i32]) becomes possible but fn foo(x_and_y: ([i32], [i32])) doesn’t, which is somewhat at odds with how we’d like to represent function argument lists as tuples in the Fn traits.


  1. for the same reasons already mentioned before, that index calculations should not implicitly become linear-time by doing anything more complicated than a “base + size * index”-style calculation at the core of it ↩︎

3 Likes

I'm having trouble following your complaint here.

Would you be happy with "instantiations that don't pass their where bounds checks aren't real types"? Because slices aren't really any different here from struct Foo<T>(T); not allowing Foo<[i32]>.

I do agree with this part of the post that's not in this new thread's OP

It's not "never talk about this" or "nobody should be told why". It's "for right here, it's not helpful".

2 Likes

Which is precisely my objection. When someone told me something like this in school that I would follow them for days with demands to explain me “why”.

If it's “this is arbitrary decision that we made” then it's one thing (you don't explain arbitrary coin toss, do you?), if it's “this is something that fundamentally couldn't exist” then it's different kettle of fish.

Yes. Saying [[i32]] doesn't exist because developers of Rust made arbitrary decision to not support that type is exactly the same as saying “instantiations that don't pass their where bounds checks aren't real types”. These types would have existed if someone decided to allow them. But someone tossed the coin and did different choice. Fine.

But saying that it doesn't exist and implying that there are fundamental reason for it… that's different.

Note that such language actually exists and you even know it. It's called, surprisingly enough, C. Specifically C99.

E.g. you may write something like this:

int foo(int x, int y, int z) {
    struct Foo {
        int bar[x];
        int baz[y];
    };
    struct Foo arr[z];
    …
}

It was added in C99 and I assume there were plans to go further. But compiler developers revolved: many compilers don't properly support such types and C11 made them optional. Which is why MSVC supports C11 and even C17 but there are no plans to add C99 support to it.

2 Likes

A type [T] means (unless changed by compiler optimization, but optimization generally respects the semantics of the "intended" memory layout) "some number of T layed out back-to-back, without any additional metadata as part of the [T] itself."

An example `[T]`:          ... 1  5  9  2  6 ... (other RAM before and after)
                               ^
                               |
A `&[T]` that references it:   \-- pointer
                                   length = 5

     (The metadata storing its length it stored in the `&[T]`,
      not in the `[T]` itself.)

The metadata necessary to make a [T] useable is its length. When you have a &[T] (generally, whenever you have a pointer to an unsized type), it works by being a "fat pointer": wherein it automatically becomes two 64-bit values (assuming a 64-bit system) rather than just one: the first is the actual pointer, and the other is another 64-bits of metadata (in the case of a &[T], the metadata is the length).

So a [[i32]] would mean, some number of groups layed out back-to-back of some number of i32 layed out back-to-back without any metadata as part of the [[i32]] itself.

Hypothetical `[[T]]`:      ... 9  3  7  2  5  8  4 ...  (other RAM before
                               ^                         and after)
                               |
A hypothetical `&[[T]]`        \-- pointer
    that references it:            But then how do we store the metadata?

In that example, that sequence of i32 in RAM could be bracketed into many (infinitely so) numbers of double-nested arrays, such as:

  • [[9, 3, 7, 2, 5, 8, 4]]
  • [[9, 3, 7], [2, 5, 8, 4]]
  • [[9], [3], [7], [2], [5], [8], [4]]
  • [[9, 3], [], [], [], [7, 2], [], [], [], [], [], [], [2, 5, 8, 4], []]
  • Etcetera...

But the information about how a hypothetical [[T]] would be bracketed into sub-arrays as such could not necessarily be represented in 64-bits of information (or in any finite number of bits of information, since empty arrays are a thing), so there wouldn't be a reasonable way to represent a &[[T]], which makes [[T]] a concept that fundamentally just doesn't work with the concept of what unsized types in Rust are in a useful way.

And yes, it's true that the compiler can optimize memory layout and change it to "whatever" it wants, but generally it's still the case that language constructs are created with more or less a "default" way of layout out memory implicitly in mind and then let the compiler transform it into alternative layouts that are equivalent in high-level properties but faster. Having the compiler automatically insert some sort of dynamically-sized heap-allocated structure to make &[[T]] work would be far out of the scope of what the compiler is expected to do (and for good reason).

6 Likes

If that’s your interpretation of assigning meaning to “[[i32]]” note that that interpretation would be one where the [i32] elements of the [[i32]] value must all have the same length.

I’d say, answering the question of whether or not that’s the best way to do it is a highly nontrivial design question.

1 Like

This compiles in Rust:

fn foo<const X: usize, const Y: usize, const Z: usize>() {
    struct Foo<const X: usize, const Y: usize> {
        bar: [i32; X],
        baz: [i32; Y],
    }
    let arr: [Foo<X, Y>; Z] = todo!();
}

I think it could mean a list of fat pointers. Then you have a fat pointer that describes the whole thing. Or perhaps I am thinking about this wrong.

( Ok, I am thinking about it wrong, that would be a datatype with references in it, that is [&[i32]] )

1 Like

Yes, but C99 you may call function that I wrote before like this:

    int x, y, z;
    while (scanf("%d%d%d", &x, &y, &z) == 3) {
        foo(x,y,z);
    }

That's produces data structures with arrays of variable sizes which are defined at runtime and is entirely different kettle of fish from what you have in your generic.

You are right. C99 haven't quite achieved [[i32]]. They have come close but haven't actually made it.

I’d say, answering the question of whether or not that’s the best way to do it is a highly nontrivial design question.

Sure, but first we need to decide whether [i32], itself, is even type that exists. You may not create a variable of that type, you couldn't pass it around, etc.

Similarly to how void acts in C. And yet it Rust () is just a regular type, no special treatment.

You may decide that [i32] includes it's metadata before actual content (where else may you put it if variable of type [i32] is made as local variable in function?) but then &[i32] is still a fat pointer. Then layout of [[i32]] would immediately become obvious and the only question to decide would be to see if &[[i32]] is fat pointer or very fat pointer (and thus DST, itself).

I don't think adding [[i32]] to the language is too hard, it's mostly of “what do you plan to do with it”. What kind of task do you plan to use it for?

If [[i32]] were to become a legal type, I would personally expect it to be [[i32; N]; M] where N and M are dynamically stored in the wide pointer.

This construction comes by modelling slice references as hoisting const generics to runtime dynamic values. [T; N] is essentially syntax sugar for some struct slice<T: Sized, const N: usize>. If a const parameter is only used in limited ways (i.e. for runtime computation and not feeding into other const computation), it's possible to imagine instantiating the type as slice<T, dyn usize>, with the const value is instead deferred to be dynamic at runtime (cf. dyn Trait) and passed as an implicit extra parameter to methods (i.e. as fat pointer metadata).

I like this model because it unlocks a reasonably straightforward way to a useful subset of custom DSTs. However, a full "matrix slice" also wants to have a dynamic stride, such that it's possible to e.g. slice to a 2×2 segment of an N×N matrix. For this reason among others (e.g. expecting jagged instead of square), while I think [[i32; dyn usize]; dyn usize] could be a somewhat plausible type, [[i32]] seems much more unlikely. (It also avoids potential issues around unifying symbols between [T; N] and [T] if [T] were to be an alias for [T; dyn usize].)

New nit: perhaps interestingly, unsized function parameters (unstable, but IIUC is the magic behind Box<dyn FnOnce>: FnOnce which is stable) result in "invisible" indirection. However, since this indirection is generally part of the ABI[1], it's arguably not "extra" indirection.


  1. There's three ways arguments can be passed at the ABI level. They can be passed in registers (used for primitive scalar types), at a known stack offset (used when the argument passing registers are exhausted), or by passing a pointer to the actual data with one of the prior methods (used for user defined aggregate types). The unstable Rust ABI can play some tricks (e.g. by decomposing simple user types and passing them as a pair of scalars instead) but this is the underlying foundation. Using by-reference passing for nontrivial types instead of fixed-offset by-value passing is generally preferred, despite introducing dynamic indirection, since it reduces the stack shuffling that would be necessary to get data at the correct fixed offsets.

    Generally, this is entirely transparent to the user. Unfortunately, address disjointedness guarantees can complicate this. My "favorite" example is that since Copy values are copied and this is a by-ref usage of the source place, rustc never "moves ownership" of large Copy values into a called function, even if it's the last usage of the place, and always makes a defensive copy instead, as references to the place have not been invalidated yet, and if no copy is done, a pointer that escaped analysis could validly inspect the place even as the called function thinks it uniquely owns the place.

    This (and other related missed optimizations) is the reason you'll sometimes see large data/buffer types that aren't Copy despite there being no semantic issue with copying, as well as sometimes seeing such types passed around behind explicit &mut despite semantically transferring ownership of the data. ↩︎

7 Likes

I think whether something is "arbitrary" lies in the eyes of the beholder. Was it arbitrary when Rust adopted the borrow checker over GC/RC? How about when we chose stackless coroutines over green threads? How about our decision to use <> to delimit generics instead of []?

Each of those decisions was deliberate, and the same is true for the [[T]] situation.

We decided that [U] is generally unsized regardless of U. In our case, U=[T]. We have also decided that the metadata for unsized types must itself be sized. So for something like &[[T]], where would you store the metadata for the inner slices?

3 Likes

I don't think that decision was arbitrary: for &mut arr[x..y] to return reference to slice of array said slice have to be unsized and layout of &mut [x..y] have to have metadata in the reference itself. And the ability to take slice from an array is pretty fundamental to Rust.

But that decision is not limited by anything else in the language design. And can be lifted at any time if there would be enough reasons to do so.

In the reference. Similarly &[[T]] would have all the metadata in the reference. As C99 example shows such type is perfectly possible to place such thing on stack (at least with some compilers), thus it's not clear what's the problem.

It's actually very objective: decision is arbitrary if both current approach and it's negation both are compatible with the rest of the language. At it's not so hard to see how you are conflating different decisions, some arbitrary, some not:

If you want to have both RAII with destructors which are run at predictable time and memory safety then you need some kind of lifetime tracking and then you automatically get borrow checker.

Whether to have GC/RC in addition to that is, to some degree, arbitrary: you may provide language with not GC/RC at all, with RC, or with GC and RC. There are tradeoffs involved, sure, but all possibilities are possible. Except language with GC would require runtime while RC and no GC/RC options may live without it.

Green threads are incompatible with “no runtime” decision. But if that decision would be reversed then choice between stackless coroutines over green threads would become arbitrary.

I guess the final decision which we arrive is: in Rust where [T] type can only exist in some special places (at the end of struct, never on stack, etc) the decision not to have [[i32]] type is not arbitrary (so I was wrong, after all).

But that decision, itself, is quite problematic (and leads to strange proposals like dyn*) and, more importantly, it's perfectly arbitrary in the language itself.

I guess it was dictated by the limitations of LLVM, but these are outside of language (even if, practically speaking, they are quite important, because language without a working compiler is not too much useful).

P.S. It's interesting that decisions form some kind of pyramid. Where entirely arbitrary decisions (like the decision to keep metadata for unsized types sized) are supported by things which are not part of the language (may be limitation of compiler or target enviroment, etc), then you have first-level non-arbitrary decisions (like the availability of [[i32]] hinges on the question about whether unsized type metadata have to sized or not) and so on. Some decisions on top of that pyramid affect so many other things that it's almost impossible to reverse them without ending up with entirely dissimilar language, while some may be reversed surprisingly simply (even if their “dependency chain” it very tall it's also narrow so you don't hit the cascade effect where reversal of one decision leads to need to reverse another one, that one reverses third and in the end you end up with half of initial decisions reversed, like would happen if borrow checker and lifetime system would be removed in favor if GC).

It means that we couldn't have a function signature like this one, so that would need to be reworked. How would you propose that we generically interact with metadata?

That is one possible implementation, and it would be gated on custom DSTs, so I think that might be your answer right there. Custom DSTs need to land first before we can have [[T]].

(C99 gets away with this because it doesn't make the same safety guarantees we do for buffer overflows)

1 Like