Indexing Vec with i32 is necessary

Hi. I'm new to rust. It's a interesting language and I learned a lot from it ^-^.

Recently, I found that rust restricts users to index vectors with unsigned numbers only. However, it's not plausible at all.

During programming, we often need to add an index with a negative offset. Therefore, we should use signed integers to store the index number. Rust programmers frequently use codes like arr[index as usize] to index vectors with integers, which is not safe. Such kinds of casting could lead to underflow if the integer is negative.

Instead of leaving the responsibility of type casting to the user, it's better to allow users to index vectors with signed integers and throws an IndexError at runtime if the number is negative.

But if an underflow occurs you'll most likely get a panic anyway because the number you'll get is too big. Not to mention you should use .try_into() if you're not sure your number is positive.

The problem with this is that vec[0] becomes ambiguous. Is it an i32 or a usize?

Nitpick: Rust doesn't have exceptions, so it won't throw anything, instead you'll get a panic.

8 Likes

In Rust there's no concept of negative offset when accesing vector elements by index. That's what the type system is telling you by using usize rather than a signed integer.

6 Likes

Not stable yet, but to me that sounds like https://doc.rust-lang.org/std/primitive.usize.html#method.checked_add_signed, not "indexing should accept negative numbers".

(That said, I do personally think that array indexing should be allowed with any type -- after all, -1 is no more out of bounds than 18446744073709551615 -- but that can't be done right now because of inference implications.)

As an aside, it's clearly not "necessary" to index by i32, since you can always add .try_into().unwrap().

7 Likes

In addition, conceptually it doesn't matter whether the indices are signed or not: in both cases you have some N cells (for a Vec of length N), and the indices are nothing more than a way to tell them apart. Conceptually it doesn't matter whether that is done with usize, isize, &str or what have you.

1 Like

Well, not &str. Let's not get too crazy. If you went that far you'd just have a map. Arrays' having numeric indices lets you compute relative offsets, pigeonhole hash codes to randomly access bins in a hash table, find the midpoint between two slots for binary search, etc.

I fail to see how negative indexes are useful at all. It you need to compute a negative offset, just subtract the absolute value of the offset and test for wrapping (which you need to do regardless).

5 Likes

Of course it matters in practice, if only because using a key of type &str adds a factor of either an O(log n) (the BTreeMap route) or a constant factor for hashing (HashMap) on top of whatever else access is happening. Fun fact, with hashes you can also compute relative offsets - that's (very roughly) how HashMaps work. Which means you can do it with arbitrary values of type T: Hash, including strings.

But it's precisely the kind of detail that doesn't matter conceptually, which is what I was talking about. At that level, performance considerations are not relevant. What is relevant is keeping each mapped value identifiable. Which is also why I italicized the word conceptually.

As for the problem, if @anti-destiny is ok with a little overhead, they could just use a HashMap<i32, V> instead of a Vec<V>.

I've got bad news. Rust does have exceptions, they're called "unwindable panics".

These are typically not called exceptions because panics are supposed to be used for unrecoverable errors and catching them is not intended for error handling but for FFI and threading.

4 Likes

I think the whole thing is caused by the notion that there are, somehow, signed and unsigned numbers. It reality it's an illusion, figment of our imagination, in reality these don't exist. Not in hardware, anyway.

Today's computers don't distinguish between -1i32 and 4294967295u32. It's the exact same bit pattern and, importantly, all operations (+, -, *, /) behave identically, too (CPUs don't have separate instructions which deal with signed or unsigned numbers).

The only thing which distinguishes them is overflow checking.

And that doesn't happen with values, it happens with operations on values.

But of course Rust can not expose that. The whole story of Rust is the same thing repeated bazillion times: take simple and beautiful (but unfamiliar for most people!) mathematics concept and turn into ugly (yet familiar to most people!) form.

When Rust deals with integers it takes a simple ℤ ₂ᴺ ring and turns it into a couple of types, one signed, one unsigned.

Given that fact I don't see how allowing to use signed integers as indexes would help anyone: by the time when you have and i32 or i64 memory cell it's too late to check for overflow.

Something like -42i32 can be an overflow or [incorrectly typed] 4294967254, you just never know at the point when it's used as index, you need to look on the place where is it calculated. If something like that is presented as array index it's programmer's error and programmers responsibility to fix the issue.

Current approach makes it very clear and usually leads to prominent crash which programmer would have to resolve anyway. I don't see how adding the ability to use isize as index would help. It would just add more confusion IMO.

5 Likes

Incorrect. +/-/* are the same, but signed (x86 idiv) and unsigned (x86 div) / are different:

dbg!((-1_i32 as i32 / 2) as i32);
dbg!((-1_i32 as u32 / 2) as i32);
[src/main.rs:2] (-1_i32 as i32 / 2) as i32 = 0
[src/main.rs:3] (-1_i32 as u32 / 2) as i32 = 2147483647
4 Likes

Your comments are helpful. However, I still believe it would be more convenient if rust could support indexing vectors with negative numbers, although it's not "necessary".

My implemented version of Vec is attached below. It checks the index before converting it to usize. (And everyone should do this checking, unless he/she is very confident about the correctness of the code, right?)

use std::ops::{Deref, DerefMut, Index, IndexMut};
//  Modified from "https://ideone.com/4HZTOr"


#[derive(Debug)]
struct MyVec<T>(Vec<T>);
 
impl<T> Deref for MyVec<T> {
    type Target = Vec<T>;
    fn deref(&self) -> &Vec<T> {
        &self.0
    }
}
 
impl<T> DerefMut for MyVec<T> {
    fn deref_mut(&mut self) -> &mut Vec<T> {
        &mut self.0
    }
}
 
impl<T> Index<i32> for MyVec<T> {
    type Output = T;
    fn index(&self, index: i32) -> &T {
        if index < 0 || (index as usize) >= self.len() {
            panic!("Index out of bound!");
        }
        &self.0[index as usize]
    }
}
 
impl<T> IndexMut<i32> for MyVec<T> {
    fn index_mut(&mut self, index: i32) -> &mut T {
        if index < 0 || (index as usize) >= self.len() {
            panic!("Index out of bound!");
        }
        &mut self.0[index as usize]
    }
}

impl<T> IntoIterator for MyVec<T> {
  type Item = T;
  type IntoIter = <Vec<T> as IntoIterator>::IntoIter;

  fn into_iter(self) -> Self::IntoIter {
    self.0.into_iter()
  }
}

 
fn main() {
    let mut u = [0, 1, 2, 3].to_vec();
    for i in &u {
        println!("{}", i);
    }
    u.push(4);
    
    let mut v = MyVec(Vec::<i32>::new());
 
    v.push(10);
    v.push(20);
    println!("{:?}", v);
    // for i in &v {
    //     println!("{}", i);
    // }
 
    v.pop();
    println!("{:?}", v);
    for i in v.iter() {
        println!("{}", i);
    }
 
    v.push(30);
    let mut i :i32 = 0;
    v[i] = -78;
    println!("{:?}", v);
    while i < v.len() as i32 {
        println!("{}", v[i]);
        i += 1;
    }
    
    // panic if v < 0
    // println!("{}", v[-1]);  //Line 76, Char 20: Index out of bound! (solution.rs)
}

A disadvantage of this design is that it may make the reader confused about the data type of the index. To address this issue, maybe we can introduce a new function. For example, we can use "vec.at(index)" to index the vector with negative indices. As for vec[index], it still only accepts unsigned numbers.

Why should negative values be accepted at compile time only to be rejected at runtime with a panic?

6 Likes

Ugh. Yeah, you are right. Division is different, of course. Thanks for pointing that out.

My main point still stands: by the time you try to use value as index it's too late to try to catch errors. Either the value was calculated correctly and x as usize would be correct index (even if x itself may not be correct) or it was calculated incorrectly but then simple if x < 0 check doesn't guarantee anything!

You should have done your checks before, when you were doing computations.

1 Like

If you're going to accept negative indices, at least make them somewhat meaningful, like python's slice syntax. This is trading a compile time check against a runtime crash.

Rust sometimes does things for convenience but what you are proposing is “convenient if you don't care about correctness”. Rust doesn't do these.

Granted, it's actually works fine on 64bit platforms for obscure technical reasons: sign bit of an address is used to distinguish kernel pointers and userspace pointers thus you can not create array with indexes larger than valid isize value (except for ZST, but then you don't care about index at all).

This may mean that, technically, it may be possible to add it to the language and it would even work… but then, on these same platforms simple x as usize is guaranteed to work, too (for the exact same reason) thus it makes no sense to add such capability.

It's not guaranteed to work and is, actually, dangerous on 32 bit platforms and these as breadcrumbs are working as reminders that you are doing something dangerous (if not necessarily incorrect).

Indeed. Everyone should do the checking to ensure that such code would never be used!

If you ever see if index < 0 in the code then it's time to ask the guy who wrote this to fix it!

Depending on the platform this check is either useless or wrong. In the former case it must be removed. In the latter case it must be removed and some other code must be rewritten.

Adding useless/or dangerous code automatically is just not something I see Rust would do.

3 Likes

It's worse. On 32bit platform that check is just wrong. Granted it's unusual to see 3000000000 index on 32bit platform but I have seen programs where these made sense, they are not impossible.

On 64bit platform this check is useless, instead, and these can be made useful, but it sounds like a bad idea to divide Rust into two radically incompatible languages just for the sake of “convenience”.

3 Likes

You don't need to because vectors do the check automatically.

If you remove this from your code:

You get exactly the same panics.

IMO there is no need to implement indexing with i32 because you get it by just adding as usize and the check of invalid index is done at runtime.

    let v = [0, 1, 2, 3].to_vec();
    let i: i32 = -1;
    println!("{}", v[i as usize]);

generates:

thread 'main' panicked at 'index out of bounds: the len is 4 but the index is 18446744073709551615'

(well, you don't get that index is negative, but you at least you get that it is wrong)

I'll note that we also allow indexing [u8; 256] by usize. So it's not at all clear to me that a[-1] is so much worse than a[18446744073709551615] that the negative needs to be rejected by the type system when the way-too-big value isn't.

Sure, it's wrong, but for every slice or array I've ever seen in real code, there's always far more incorrect indexes than valid ones.

(And for slices of non-ZSTs, there's not even a perf cost to allowing indexing by signed numbers, because objects can't be more than isize::MAX bytes anyway, thus any index that'd be negative in as usize is out of bounds anyway.)

I'm actually strongly opposed to that. I want off-by-one errors like that to be visible, rather than cause strange surprises later when I unexpectedly got the last element.

Having a convenient way to do modular indexing sounds good, but that should be an opt-in that's more obvious than just using a signed type. (Especially if indexing by i32 worked, because integer fallback means it's easy to get that type.)

6 Likes