Type of array index

Hi there,
this code does not compile:

    let mut arr1 = [1,2,3,4];
    let x: i32 = 2;
    arr1[x] = 8;

Ok, correct, index must be usize.
while this is ok:

    let mut arr1 = [1,2,3,4];
    let x = 2;
    arr1[x] = 8;

why is this ok? Is not i32 the default for a number?

thx.

i32 is the default for (integer) numbers, yes, but this default only applies if the type-checker cannot determine what the actual type is supposed to be. This might feel a bit surprising since information seems to flow backwards from later instructions into earlier ones, but it can be quite useful. This works reliably for the first non-generic use of the variable. In this case, x is being used as an array index. The type checker also knows that the Index trait on arrays is only implemented for a single kind of integer-type, i.e. usize.

This is enough to make the second code example basically equivalent to as if let x = 2usize had been used in the first place.

For further demonstration, see how e.g.

fn f() {
    let mut arr1 = [1, 2, 3, 4];
    let x = 10000000000;
    //arr1[x] = 8;
}

does not compile, with error: literal out of range for `i32` , but

fn f() {
    let mut arr1 = [1, 2, 3, 4];
    let x = 10000000000;
    arr1[x] = 8;
}

does compile (e.g. in the playground)

Or how

fn main() {
    let mut arr1 = [1, 2, 3, 4];
    let x = 2;
    print_type(&x);
    arr1[x] = 8;
}

fn print_type<T>(_: &T) {
    println!("{}", std::any::type_name::<T>());
}

prints usize.

4 Likes

Really very interesting. I've a lot to learn about inference in Rust.
Thx.

To elaborate on the "information flowing backwards" part:

The key thing to remember about type inference is that it's "global" within a function (it is intentionally restricted to single functions for reasons I'm not going to reiterate here). Type inference is not particularly concerned with the actual execution order of statements and expressions at runtime; instead, it is a purely compile-time constraint solving system.

You can think of it very roughly as if the compiler threw all the expressions in a function into a big hat, along with constraints on the types arising out of the interaction of the expressions. By "interactions" and "constraints", I mean judgements like:

  • if a function fn foo(argument: T) is called with variable x as its parameter, then x must have a type convertible to T
  • if two expressions x and y are compared for equality like x == y, then the type of x must implement the PartialEq<Type of y> trait.
  • The two arms of an if expression must have the same type

etc. The set of all such constraints is not unlike a huge system of equations and inequalities, that the compiler knows how to solve through a mixture of formal and heuristic methods.

If you consider an ordinary system of algebraic equations like

2x + y = 4
y - 3x = -1

then the solution (which is x = 1, y = 2) does not depend on the order of the individual equations. Similarly, the solution to the set of type constraints (which is the ascription of a concrete or generic type to each of the expressions in the program) does not depend on the order in which said constraints are imposed, i.e. the order they appear in the source text.

(This is mostly true – there are some exceptions which usually emerge from the heuristics the compiler has to use in order for type inference and checking to finish in a reasonable amount of time, but the mental image I depicted above is a good start.)

6 Likes

This is a very impressive part of Rust. Not so intuitive coming from other languages.
thx

This is a great point that I hadn't thought about before. I'd add that Rust's type inference is modeled after Haskell and related languages that are much more declarative (as opposed to procedural) in nature, and in which making use of "order of appearance in the source code" during inference would be even less natural for that reason.

Most languages with type systems use declarative syntax for the type system. I certainly wouldn't consider C related to Haskell in any way... Unless you're comparing to something like Python, where you have to run meta-object constructors to initialize object constructors to initialize objects and such things, but that isn't really a "type system" in the sense of a system of logical constraints, but more like an ad-hoc convention for metaprogramming in a dynamic environment.

I'm talking about type inference, which was first introduced (in the form I'm familiar with) in ML and is associated with ML-like languages, including OCaml and Haskell.

When I say "declarative" I'm not referring to the type system, but to the fact that e.g. a function definition in Haskell does not explicitly describe control flow in the way that a Rust function definition does. All expressions, no statements.

Well, Haskell does describe control flow because it has case … of and if and on top of all of that, it's lazily evaluated. What it doesn't have is loops and early returns, which are sometimes convenient. You could omit explicit loops and explicit returning from Rust as well, it would remain just as usable, albeit probably a lot less convenient.

So in this regard Rust and Haskell aren't very different, and as far as I know (but I may well be wrong), both use a variant of Damas & Hindley & Milner's algorithm for inference. The point is not whether control flow is explicit (and if it is, how rich it is) in the language – what is essential is that as long as control flow results in well-defined constraints, the order in which these constraints appear does not affect the overall types.

E.g. consider these two definitions of the same Haskell function:

classify :: Int -> String
classify x = if x `mod` 2 == 0
             then "even"
             else "odd"

classify' :: Int -> String
classify' x = if even then "even" else "odd"
  where
    even = x `mod` 2 == 0

The only difference between these functions is the order of declaration for the divide-and-compare expression that eventually checks for parity. Yet, the control flow is exactly the same, even though the discriminant of the if is syntactically at different places in the two function definitions.

At some point, the syntax has to imply the semantics, otherwise it wouldn't be possible to precisely elaborate the meaning and intent of a program on its own, solely by analyzing the program text.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.