Question to the official Rust book (vectors)

Hello,

I have one more question related to the official book. I am currently reading chapter 8.1 about Rust vectors, which are very similar to the sequence type in Nim. In the text we have

let v = vec![1, 2, 3, 4, 5];

    let does_not_exist = &v[100];
    let does_not_exist = v.get(100);

Listing 8-5: Attempting to access the element at index 100 in a vector containing five elements

When we run this code, the first [] method will cause the program to panic because it references a nonexistent element.

So this program fragment would successfully compile? I had the impression, that Rust does strong compile time checking, and for a vec initialized with 5 elements, and then accessing position 100, it should be able to detect a range error at compile time? Note, the 5 elements, and the index value 100 are both compile time constants. Is this check at compile time not done by intent, or is Rust currently not smart enough to detect the error at compile time?

Another question: Why does the second line use a reference to access the value? The vector contains plain integers, so I see no advantage for reference use. For other element data, like large structs, a ref would avoid the copy effort. But for an integer, a ref is an indirection -- well the compiler may optimize it. But a plain value copy should be fine in my view, I can see no real benefit for use of a reference in this concrete situation.

And finally, a related question to arrays and vectors in Rust: From what I read, indices have to start always at zero for both. But there are situations, where non-zero start indices (positive or negative) makes sense. For example some textbook algorithm may use one as start index, and sometimes negative indices are useful as well. Adding an offset manually is not that nice. Has Rust a good solution for this?

Best regards,

Stefan Salewski

The compiler can't determine the length of the vector based solely on the type information, since a vector can grow or shrink at runtime. A whole different story would be with the array type, which has the size information available for the compiler to check.

Another thing that you might find useful is that you can use checked access to a vector element with the get method which returns an Option.

2 Likes

As for this, I don't know where you read it, but arrays and vectors in Rust are 0 index based. That's pretty much the norm on most programming languages.

Note that this is about these data structures which are provided by the language itself. There could be libraries (called crates in Rust) which could provide their own collection data structures and use a different index, but then we are not talking about arrays and vectors anymore.

1 Like

A whole different story would be with the array type,

Yes, that came to my mind a few minutes after posting this question.

checked access to a vector element with the get method

Yes, that is explained well in the book.

Another question/remark to the same section:

    let v = vec![100, 32, 57];
    for i in &v {
        println!("{i}");
    }

Listing 8-7: Printing each element in a vector by iterating over the elements using a for loop

We can also iterate over mutable references to each element in a mutable vector in order to make changes to all the elements. The for loop in Listing 8-8 will add 50 to each element.

    let mut v = vec![100, 32, 57];
    for i in &mut v {
        *i += 50;
    }

Listing 8-8: Iterating over mutable references to elements in a vector

Both code fragments use references to access the vec elements, but for the second case with the mutable reference we have to use the deref operator * to modify the value. But in the first example, we can print the value without deref.

That is really surprising for me.

arrays and vectors in Rust are 0 index based. That's pretty much the norm on most programming languages.

I think it is 1 for Julia. Which has the advantage that length and highest index position are equal. Nim allows arbitrary index ranges for arrays (fixed size, stack allocated). Some algorithm in textbooks use 1 based arrays, as I student I had some trouble porting them to C. You may say, just let index position 0 unused. But that time, computers had a few MB of RAM, I think we had 8 MB ram available for our data, and had to do a fast fourier transformation. I think the algorithm was from the book of Press, "Numerical Recipes".

References explicitly implement Display, so they can be printed as their inner value.

You're right that this is unnecessary here, but it was probably done to match get, which returns a reference inside an Option.

This is done because it's most faithful to the hardware: an index operation will compile into an assembly instruction that has a base address and an offset, and that offset will be zero-based.

You can easily make your own type that indexes differently, but if you use rust even a little bit, you just get used to it and it doesn't really matter. It won't make anything faster to have 1-based indexes.

There's two other benefits: ranges are upper-exclusive, so you can write something like:

let v = vec![0; len];
for index in 0..len {
    v[index];
}

which perfectly indexes every element. And since you can iterate over Vec directly, it's unlikely you'll need to specify a length anyway. Methods like pop, push, last, and indexing with half-open ranges [n..] let you do things with the last element without writing the length.

And you can use the modulo operator to ensure an index is inside a Vec[1].

let random = rand::random();
let index = random % len;
v[index];

This would be fully possible, but it's probably not done because it's not that useful:

  • This panic will always happen every time the code is run, so there's little chance of it being missed.
  • Most cases where you statically know the length of a Vec, you should use an array instead.
  • Most cases where you use a Vec, you won't statically know the length.
  • You don't index a Vec or even arrays with a constant that often in real code.

And for your specific situation, if you wanted one of the values from the Vec, you just wrote it, so you can write it again or make it a variable. This is even more clear with vec![n; 5] syntax. The only case where you need to retrieve it from the Vec is when the Vec has been modified, which means you would need a lint that can tell which methods of Vec change the length.

Rust also avoids making special cases for non-core types and traits. It keeps the complexity of the language down. If there was a lint for this, people would want a way to add it to their own collection types. That could happen by adding another attribute similar to #[must_use] for constructor functions. But this attribute would only be useful for this specific type of data structure where the valid indices are a known specific range, and only when the range is given as a constant.

This might be a good lint to add to clippy.


  1. there's better ways to get a random item, but you get the idea ↩︎

2 Likes

Because Julia is designed for implementation of algorithms from textbooks.

Yes, but that's pretty much the only place where you want this. Otherwise the answer from Dijkstra is as valid today as it was valid 40+ years ago when it was written.

It's unfortunate that there are lots of books which use 1-based indexing. That's unfortunate, but there's not much we can do.

All the APIs designed by software-developers for software-developers are using zero-based indexing and doing anything else would just make them harder to use.

3 Likes

As a general rule, many traits are implemented to work transparently through references, including those for printing. This remains unambiguous because, by convention, we always choose that instead of choosing, say, printing the address instead (unless you explicitly request that with "{i:p}").

However, assignment is a basic operation that doesn't have an obviously sensible way to “overload” it, and there are a lot of things that you could meaningfully assign to. So, when you use =, you have to specify exactly what place you want overwritten; leaving off the * would mean replacing the value of i itself with a different reference, which is valid but not what you want here.

3 Likes

As a counterpoint when I finally understood the way the FFT works well enough to write my own from scratch I used arrays starting at zero. I'm not the smartest coder on the block and that was the most obvious thing to do. GitHub - ZiCog/fftbench: A parallel integer only FFT for multi-core micro-controllers like the Parallax Propeller or XMOS devices.

It's not clear to me how memory limitations affect one choice of array base or why zero should be worse. 8MB of RAM is absolutely huge. My FFT runs in the 32Kb of RAM of a Propeller Microcontroller and uses only integer maths to save code space and for performance.

In four decades of programming I have never felt the need for other than zero based arrays.

2 Likes

FFT is easier to think of in terms of 0-based indices. The Cooley-Tukey FFT algorithm for power-of-2 lengths can be expressed with the bit-reversal permutation, which is natural to think about in terms of n-bit indices 0 ≤ i < 2^n.

0-based indices work well with Rust's half open ranges, 0..n which combine nicely. For example slice.split_at(a) will split your range of indices 0..n into 0..a and a..n.

For mathematically-minded people: in set theory, the ordinal numbers (of which the natural numbers are a subset) are most naturally defined starting from 0, so that's what's always done. The standard (von Neumann's) definition is that the ordinal number n is the n-element set of numbers smaller than n, e.g. 5 = {0, 1, 2, 3, 4}. This matches the indices of an n-element slice in Rust.

3 Likes

Starting from zero really is better. Even in real life it's old things that start from 1, with newer things starting from 0.

For example, what times are there around 2 o'clock? That's right, 01:59:59 and 02:00:00 and 02:00:01. Minutes and seconds start from zero because humanity got smarter and realized that's better than having the clock go from 01:60:60 to 02:01:01. (Many countries have fixed hours to also be 00, 01, 02, ..., 23, though of course the US still thinks that the day being 12A, 1A, .., 11A, 12P, 1P, .., 11P is a good idea for some reason.)

Here's a nice worked example not using computers at all that also shows starting from zero working better:

3 Likes

Well, all that was around 1995, when I built a fourier spectrometer as part of my diploma thesis in physics. We had an old Windows 3.1 computer in our lab, with a measurement card with a storing capacity of five million values max. Each data point was a 12-bit int. That old PC had 16 or 18 MB RAM total. For fast fourier transformation (FFT), we typical use powers of two for data size. As Windows 3.1 needed some memory itself, we were able to allocate only a block of 8MB continues RAM for our FFT data. We wrote our algorithm in C, based on a textbook example in Pascal or Fortran. I think it was Numerical Recipes from Press, who also published a book version for C later. To really use all the available storage, I had to convert the 1-based FFT algorithm to zero-based C.

Generally, I think that 1-based indices as Julia use them are more naturally, as humans typically start counting at one, not at zero. But technically, for systems programming languages, starting at zero avoids offset, increasing performance. One use case for arbitrary start indices was in my toy chess game, see GitHub - StefanSalewski/salewski-chess: The Salewski Chess Engine, where negative array indices are used a few times for convenience. Of course, using constant offsets would have worked as well. Perhaps I will once port that toy chess game from Nim to Rust. But that is too early for me, I have to spent this winter to learn Rust first. And then, maybe I could port the GTK GUI to a native Rust GUI, maybe for ICED or the Druid/Xilem. Would be a nice exercise, as I initially created that chess game as my first exercise to learn Nim in 2016.

[EDIT]

Sorry, my memory was wrong. I think the measurement card in the Windows 3.1 PC had a max sampling rate of five million samples per second (5 MS/s) and could record 2M samples max. These 2 MB samples needed a 8MB data buffer for FFT, as C float size is 4 byte. All the details can be found in my diploma thesis, but unfortunately only in German language: https://ssalewski.de/Diploma-Thesis.html.en

If you really want to, you can define an array-like data structure with arbitrary indexing. For example, here is an array type that allows indexing with any base:

1 Like

Many times they do not...

In the UK the "first floor" of a building is one up from the "ground floor" one entered it on. Clearly the ground floor is the "zeroth floor" (This seems not to be the case in places like Finland where elevator buttons get numbered from 1).

On my phone I can set an alarm at some hour and 00 minutes. Indeed the 24 hour clock starts at hour 00.

Measuring instruments generally start at zero. From my bathroom scales, thermometer... to my digital micrometer.

Seems to me the problem of feeling the need for 1 based arrays starts in infancy when we are first taught about numbers and counting. Children's counting books universally start at 1 and never mention zero. (Anyone have an example of an exception here?). One would hope ones arithmetic and mathematical skills have moved on from infant level by the time one starts programming.

1 Like

Regarding floors, the American dislike for 0 is shared with the post-USSR countries (and Finland):


Blue countries start floors at 0, red countries start at 1 (source).

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.