Why indexing by isize is disallowed?

Why did Rust decide to only allow usize as index values and not accept isize?

After having read multiple threads about the topic here, I still don't understand the benefit of it.

In most cases, usize indexes are perfectly usable. However, some algorithms require signed arithmetic to compute an index. Vector traversal, some convolution filters, etc. In that case, I need to use isize to compute the value of the index.

Usually, relevant code is inside a loop and looks something like this:

let i: isize = /* some signed integer arithmetic */;
v[i as usize] = 42;

The as usize part can really get out of hand. As algorithmic complexity increases, the number of indexing operations grows. And the as usize part only adds boilerplate to the code without any clear benefits that I see.

What if Rust accepted isize as an index?

1. There would be the same amount of runtime checks required.
If Rust accepted isize as an index (v[i] = 42;), then the indexing operation's range check would have to perform a negativity test. Which is the same test that the as usize explicit conversion performs, so there is no performance benefit of converting to usize at all.

2. There would be the same amount of compile-time checks.
If the compiler can deduct the value being negative, then it harms the usize conversion equally bad and the indexing range check. So there should be the same amount of warnings and errors in either case.

3. The code would be easier to read and to write
Many bought up this question before, and I wouldn't type this lengthy post if the amount of as usize boilerplates wouldn't be a concern to us. It is not a serious issue, but definitely hinders the readability of the code.

Sometimes signed arithmetic is necessary.
Please help me understand this design decision.

I always assumed that it is because negative indices into arrays that start at 0 does not make any sense.

5 Likes

There are no checks in that conversion, it's your resposibility to ensure that value is not negative (or you may want to convert negative value into large position value, then no checks are needed).

This invalidates your other points.

usize is used as index of array simply because this reduces complexity of the compiler, there are no any other reason.

Additional thingie: on some platforms (e.g. 32bit Linux) you can, actually, create an array which would be larger than isize::MAX. Use of usize makes it easier to deal with that.

4 Likes

You can implement your own wrapper types that implement Index<isize>.

2 Likes

One reason to not accept isize on stdlib types today is that (1) not accepting usize is a breaking change and (2) inference would fail and a lot of annotations would be needed if both usize and isize were possible. Example.

4 Likes

I don't think either of these really require isize. It would be useful if you post example code, maybe you can just stick to usize indexing.

This works fine for isize to usize conversions here, but in general I try to avoid the as operator because it truncates without bounds checking (even in debug mode), which I consider really dangerous and inconsistent with other operators such as + that at least in debug mode perform bounds checks. So I would instead write this:

v[usize::try_from(i).unwrap()]

which, I know, is even more verbose...

It's not the as usize that would perform the bounds check, but the subsequent indexing operator, but yes, your point stands, the isize bounds would be exactly same machine code as the usize bounds check.

The reference claims that it's not possible:

The theoretical upper bound on object and array size is the maximum isize value.

However, I have just made an array larger than that and it worked, so something is wrong: playground.

But if you also implement Index<i32> the example works again.

1 Like

It does, but it then infers a different type than before, which will change the semantics of some programs and break others. I believe it's technically an allowed change to inference, but the practical fallout would be so great I don't think it'd ever be accepted.

2 Likes

Your object have zero size, so there are no contradiction. And while Vec have special code which ensures that standard vector would never grow beyond isize::MAX nothing prevents you from calling mmap directly.

But yeah, most of the time you can rely on indexes being less that isize::MAX which makes the fact that you can not use isize index quite mysterious.

1 Like

I would assume that if I have a random i: usize, then the index operator access s[i] is cheaper than if I had a random i: isize, because in the first case I need one comparison, in the second case, I need two comparisons.

Edit: Wrong, see response below.

This also holds if I previously convert from a smaller type:

fn main() {
    let ary: [&'static str; 2] = ["Alice", "Bob"];
    let n: u16 = 1;
    let i: usize = n as usize; // no boundary check needed
    println!("{}", ary[i]); // only one boundary check (>= 2) needed
}

Note that in this case, not even one check might be needed as the compiler should be able deduce that the boundary is met at compile time (try setting n=2 in Playground).

Nonetheless, if n was random or a result of a complex (unsigned) calculation, then only using unsigned integers would allow to skip the lower check (require that it is >=0 at run-time), because they implicitly bear the guarantee to be non-negative. Isn't that right?

Edit: Wrong, see response below.

You only really need one comparison, because x >= 0 && x < n can be compiled to x as usize < n. And compilers do this.

1 Like

Oh right, of course! They can simply do an unsigned comparison irregardless of the specified type. Disregard my post then, I was wrong.

Indeed it does.
I wrongly assumed 'as usize' performs a runtime check for negativity.

Thank you all for the explanation. :slight_smile:

Maybe there is still an advantage when calculating indices/lengths in some cases as there is a guarantee that the result of a pointer substraction fits into an usize?

I always wondered if there is a theoretical risk of integer overflow when I use ssize_t in C.


Reading this, I would answer my question with "yes":

When I'm in C, I'm always very careful if I use anything else other than (unsigned) size_t to store sizes.

Well… Rust's Vec guarantees that it's size would never be larger than isize:MAX, but that's limitation of Vec implementation, not anything else.

It's easy to create a data structure which is bigger, if you would use raw syscalls, but knowing that just regular vector can not grow beyond that is still nice.

Yes makes sense, but the reference is somewhat confusing -- when it says "array size" I thought it was talking about array length, but it's really talking about array size in bytes. If so, why does it even separate "object" and "array" size, don't arrays count as objects?

Umm… The answer is literally in the very next sentence:

This ensures that isize can be used to calculate differences between pointers into an object or array and can address every byte within an object along with one byte past the end.

With zero-sized objects difference between pointers is always zero and address of any object in such array is always the same thus there are no need to limit len of such array.

Certain calculations may fail for such array (basically any caculations where you divide by size of element), but they would fail for any len so limiting it doesn't help.

That allows for a nice demonstration why "as _" can be dangerous (regarding .len in this example, not regarding pointer comparison of elements in a Vec):

fn main() {
    let a = [(); usize::MAX];
    println!("{}", a.len() as isize);
}

(Playground)

Output:

-1