There is any performance difference between &Vec<f32> and &[f32]?

Hi,
There is any performance difference (speed) between &Vec and &[f32]?

I am not an expert so i am not sure, but I can still share my apriori waiting for someone else to answer you, but since a vec is only a wrapper around an &[f32], i would say no, and also because the compiler will handle it probably the same way. But, there could potentially be a difference between a [f32; N] and a Vec<f32> since one can be in stack but not the other.

So if your &[f32] just comes out of the vec, probably no, but if your &[f32] comes out of a [f32; N], then maybe, but even there I am not really sure.

1 Like

A Vec<f32> stores a pointer to the memory it manages, which is a sort of [f32]. Every &T is also a pointer to T.

Therefore, &Vec<f32> is a pointer to a pointer to [f32] — meaning that accessing the items in it requires the CPU to dereference two pointers, one after the other. (And one pointer must be dereferenced to even find its length.) Thus, it will often be slower to access, at least when the relevant data is not in cache.

Of course, this may still be a very small effect not worth thinking about, depending on what kind of code you're writing.

However, there's a better reason to prefer to avoid &Vec<f32>, at least if you're writing a function: if your function takes a &Vec<f32> then the caller must supply a reference to an entire Vec<f32>, whereas if the function takes a slice reference, &[f32], the function can work with any sort of [f32] — e.g. a slice of part of a Vec<f32>, or an Arc<[f32]>, or a slice of an array [f32; 1024]. It's both more flexible/general and more efficient, so it should be preferred whenever possible.

Generally, you might end up with the type &Vec<f32> due to generic code (e.g. in iterators) but you shouldn't deliberately continue to use it. The Rust compiler's auto-dereferencing will automatically convert &Vec<f32> to &[f32] whenever it sees that one is needed for a function parameter or method receiver.

28 Likes

Thank you so much.

1 Like

Yes that is a good point about flexibility and why you will always prefer to work with &[f32], but about deferencing pointer, I would be very suprised not to see the compiler optimizing it.

The compiler can optimize away unneeded references in inlined code (which is why it's often okay that double references arise in generic code) but it cannot optimize them away when they appear in a function signature and the function isn't inlined, since that would be a different ABI. (At least, that's my mental model; would be interested to hear corrections.)

6 Likes

Oh ok, I am seeing this happening, I did not know, love trying things here : https://rust.godbolt.org/

(and in fact, even in optimized mode, you still need the two defferencing phases)

1 Like

Thank you so much. It is very clear.
I am writing some optimization algorithms (like Genetic Algo and others) and I would make code run fast as possible. So, I think it is matter because arrays are used intensively in this kind of routines.
Thank you.

Since i tried with public functions, this is obvious as you said. But in a whole program, I guess this will always be optimized. SaadD usually, don't worry too much about the way you are writing code, one of Rust goals is to optimize your code, so usually making readable code usually makes it fast too.

1 Like

Thank you so much. I have switched to Rust because it offers techniques and tools to write highly optimized code.
I really appreciate your assistance. Kind regards :grinning: :grinning:. :+1:

1 Like

I think the only time time use a reference to a Vec is when you need to change the size of it (inserting or deleting items). Then you need an &mut Vec. Sometimes, it may matter that an &Vec<T> is just one pointer, while an &[T] is a pointer and a length. Buth then, the &Vec<T> is a pointer to a pointer and a length (and an allocated length).

Also note that [T; N] is not a pointer, so if a function takes an [u64; 1024] you will actually copy eight kilobytes of data when calling it. On the other hand, an [u8; 4] is just four bytes, while an &[u8] will be 16 bytes on most modern desktop computers (eight bytes pointer and eight bytes length). An &[u8; 4] will be eight bytes, just a pointer (since the length is known in the type). So for small arrays of small values, copying the full array can be more efficient than using a reference / slice.

3 Likes

Thank you for the answer.
The code that I work on uses fixed size arrays (where the size is defined by the user), but arrays still unchanged size until the end. Also, these arrays are used (for reading & writing data) thousands of times. So, slices &[T] are preferred, no ?

The compiler is (last I checked, at least) better at removing bounds checks from indexing in &[T] than in &Vec<T>.

That will probably get fixed at some point, but for now it's another good reason to do the thing you should anyway.

2 Likes

I would also like to point out that technically, a Vec is not a "wrapper around a slice". This in turn implies that the claim about &Vec requiring double indirection is usually wrong in practice.

What Vec is is a pointer to a heap-allocated buffer along with its length and unused/total capacity. Ultimately, it contains a raw pointer: if you follow the type contained in Vec (eg. RawVec, Unique/NonNull – I forget exactly which one), then there's technically no slice to be found anywhere in its implementation. The value formed from the (pointer, length) pair is the same as the representation of a slice.

However, unless you are performing an operation that changes the length/capacity of the vector, you will usually interact with methods that defer to a slice, because of the impl Deref<Target = [T]> for Vec<T> trait. Thus, in the overwhelming majority of the time, if you are not inserting into or deleting from the vector, then deref coercions kick in, and a &Vec<T> is converted directly into a &[T]. In other words, the slice is made up on the spot, out of thin air, or more precisely, using the ripped-out guts of the Vec. It simply won't have existed until after the deref impl will have been invoked.

At the assembly level, the pointer-and-length information of the vector will be read, and it will be used for forming the &[T] which, as explained above, bears the exact same representation. At no point will there be a &&[T] involved, and all further access of individual elements going through the Deref impl will work as if there really was a &[T], with its single level of indirection.

The point here is that in the usual scenario, when you pass a local variable (or a transitive borrow thereof) of type &Vec<T> to a function, either the (pointer, capacity, length) triple has to be read from a stack slot, in which case it would work identically even if you had an owned local variable, or it will be optimized (likely) and be read from registers anyway. Of course there are exceptions especially if you are doing things the optimizer can't see through (eg. trait objects), but this is the typical behavior.

Therefore, in the majority of cases pertaining to "normal"
(non-weird, idiomatic) code, I would expect zero difference between taking a vector by-ref and taking a ref-to-slice directly.

3 Likes

That's true, but I wouldn't say it's fundamental to being a Vec, more an artifact of what library functionality was available when it was first written.

These days, if I were going to implement a Vec<T, A>, I'd do it with a field of type Box<[MaybeUninit<T>], A>. That way it would better emphasize that the purpose of the vector is to keep track of which parts of the allocated memory is initialized, and that allocating/freeing the memory is the box's job.

(The current implementation invented the internal RawVec type largely because the distinction is useful for writing sound code.)

5 Likes

Indeed, but then the argument carries over to the representation of Box, as the same points apply to &Box<[T]> as well.

Note that for multi-dimensional arrays there can be a more significant performance difference (over, say, a Vec of Vecs ...). Knowing the dimensions of the array often allows the compiler to calculate the address associated with a set of indices (e.g. array[i][j][k]) more efficiently, especially where dimensions other than the most major are powers of 2.

4 Likes

When I run "cargo clippy" command, the following warning shown :
writing &Vec<_> instead of &[_] involves one more reference and cannot be used with non-Vec-based slices.

Well, again, I'm not sure whether the "involves one more reference" part is universally (or even mostly) true. However, the "cannot be used with non-Vec-based slices" part is correct: if you have something other than a vector, e.g. a VecDeque or an array or another reference-to-slice that you got from a Deref coercion, or a single-element slice you got from slice::from_ref, then you can't pass either of those to a function expecting &Vec except for allocating a whole new Vec and cloning all of its elements. This is of course much slower than just passing along a slice, but more importantly, it might not be possible if the elements are not cloneable. Therefore, you should indeed avoid taking &Vec and start accepting slices instead.

3 Likes

Thanks so much.