Rust has zero cost abstraction.What does this mean in a practical sense

Let's I have a function which takes in 5 one dimensional vectors. Performs operations like square roots , multiplication etc on them by accessing each element. Then returns a vector. This uses no functions .
There is another program which has combined three elements (one from each vector from the top) of three vectors in a struct. This struct overloads add operation and also has methods. Then this struct goes into another struct. We pass a vector of this struct to a function. Function body only consists of calling other methods. dyn keyword is not used here nothing is determined at runtime. Things once calculated are not calculated again but there is lot of passing of f64 to functions.
Will both programs be equally efficient in rust?

This has lot spaghetti code but as a example GitHub - Prashant-Shekhar-Rao/special_relativity_ray_caster: This program allows you to view relativistic effects, such as Terrell rotation, on 3D models through ray tracing. This works on OBJ files and doesn't have any requirements except having the executable.
Everything important happens in camera. I know passing three dimensional vector is inefficient but it only happens once so i am not changing it. On a side note I also know that practically speaking using multi threading and other rendering techniques would benefit far more in this specific case but i am asking this question more generally.

What are the things which are actually zero cost.

This did not solve the exact problem I asked the question for but it is a pretty great answer to the title question.

1 Like

You can use rust.godbolt.org (add -O to flags!) or cargo asm to see what code is generated.

Rust's zero-cost abstraction are meant to mean that you can use higher-level abstractions, and still get the same performance as if you wrote lower-level code optimized by hand.

In practice, it means you get a lot of function inlining :slight_smile: If your vectors are fixed-size arrays (not heap-allocated Vec), then the optimizer will be able to optimize them well and eliminate redundant operations. Vec is harder, since the size is variable at run time, allocations are side effects, so they're zero-cost only if you carefully structure your code.

3 Likes

I can't read assembly (yet). My question is more in terms of does passing around values a lot, significantly affect code performance.

I really don't like the term "zero cost abstraction". I think the C++ community made that up, I'm not sure. there's nothing that is truly "zero cost", it's just a bad phrase to say "I know what the compiler actually did to my code!".

to me, the important thing is, do you understand the language features well enough to use them correctly?.

4 Likes

For example, it's well known that when you use a lot of abstractions on other programming languages (i.e. specially when using the functional programming paradigm), this goes in detriment of performance.

Let's take the declarative vs. imperative way of mutating the elements of a collection in JavaScript:

// Declarative way:

const my_collection = [1, 2, 3, 4, 5];

const my_collection_doubled = my_collection.map(el => el * 2);

// Imperative way:

const my_collection = [1, 2, 3, 4, 5];

let my_collection_doubled = [];

for (element of my_collection) {
  my_collection_doubled.push(element * 2);
}

This is a naive example, but it's well known that you need to write hand-written loops if iterating through a collection is suffering from performance issues.

An example of a "zero cost abstraction" in Rust in this case is that you can write your iterations in a declarative way, and the compiler will optimize the code as if you had written it imperatively.

so you mean it's raw number crunching?

aggregate data into struct can be optimized away, but it depends on many things, so you'd better show some concrete example.

operator "overload" is purely syntactical sugar for trait function calls, and function calls can be inlined, there's good chance it's optimized away, but it's not guaranteed. you may add #[inline(always)], but I'd suggest to resist the urge of "micro-optimization". optimization should be based on measures.

this, however, might make a difference in term of performance, but the real answer is, as always, it depends.

array of structs is typically thought to be less performant than multiple parallel arrays, mainly due to cache efficiencies, but it also depends the algorithm you are using to process the data. also, for small data set, I don't think it makes any difference.

what do you mean by "lot of"? primitive types usually pass by value, but due to architectural ABI restrictions, if a function has too many parameters, it might overflow the available registers and certain parameters are passed on stack. but it really doesn't matter in most cases, unless your measurement says otherwise. if you mean the data set is large, then it doesn't matter either, even if it isn't inlined, the overhead of a function call is negligible compared to the actual computation work.

I don't really have concrete examples in mind I was just thinking of cases where it might matter.

This is part of the code I sent.
Just to get a feel.

  let d = dot(
                [
                    v1[0] - point_on_ray[0],
                    v1[1] - point_on_ray[1],
                    v1[2] - point_on_ray[2],
                ],
                cross_product,
            ) / vec3_dot;
            if smallest_distance_of_polygon_from_eye > d && d > 0f64 {
                let a = i / 3 * 2;
                let mut v1_texture = [textures[a], textures[a + 1]];
                let mut v2_texture = [textures[a + 2], textures[a + 3]];
                let mut v3_texture = [textures[a + 4], textures[a + 5]];
                let dd12 = V::vec3_square_len(V::vec3_sub(v1, v2));
                let dd31 = V::vec3_square_len(V::vec3_sub(v1, v3));
                let dd23 = V::vec3_square_len(V::vec3_sub(v2, v3));

I am taking elements from a vector putting those elements in a array. Then calling functions on them to get f64.
If instead of calling functions like vec3_sub I manually subtracted it element by element, would it have made a difference.
Also does taking elements from a vector and putting them in a array before passing it to a function make a difference.
This code is executed about 1024*1024 times.

If it was executed even more would it have made a real difference. Both using sub function and turning it to a array.

My main problem is I don't know how to learn what practices are efficient and what are not efficient in rust code.

most probably it'll be the same, functions like vec3_sub should be the high priority candidates for inlining.

again, most likely they are the same. for function arguments, an array of 3 floating point numbers is the same as 3 individual floating point numbers.

in release builds, I doubt it make any difference at all. in debug builds, it might incur some overhead, but that's to be expected.

if not sure, just use your intuition, like, prefer simpler code. most of time, simpler is faster. but there's always exceptions.

the whole point of the "zero cost abstraction" pitch is to prevent people from worrying performance all the time. so please don't overthink, use what you feel comfortable. unless the measured performance doesn't meet the goal, you should prioritize productivity first.

just my opinion.

2 Likes

If the goals are not being met. Then is there some sort of checklist for low hanging fruits in performance gains. Like turning the program multi-threaded so it uses all CPU cores. (does multi threading make your code use more CPU cores?)[Preferably the checklist could also answer questions like this], seeing if something can be computed on compile time instead of runtime and so on.

Not sure I should make a separate question or not.

there's no such check list in general, performance tuning is mostly black art.

however, if you are not too far away from your set goal, for example, less than 10 percent, you might try to do little tweaks here and there, I would first check memory allocation, and synchronization points like mutex contention.

but don't spend too much time into guess works. always do your measurement.

not necessarily. it depends on what type of computation you are doing. if the algorithm is inherently serialized, or, if your program is IO bound, there's nothing you will gain by using multiple cpu cores.

in practice, most data processing can be done in parallel, at least partially, so if you are comfortable to write multi-threaded code, usually it is at least no worse than single core in performance.

but if you are not very experienced with concurrency and parallel, you'll very likely make things worse. bottom line is, don't think multi-threading as an "easy" way to make your code faster, it's actually harder to make your program multi-threaded than to optimize your data structure and algorithms.

1 Like

Welp I guess the best way would just be, to look at the source code/explanations of programs which are doing what I am doing but more efficiently and learning from there. Mainly looking at the algorithm and data structure stuff.

It means that you're generally not penalized for using an abstraction. It might be easier to illustrate with some examples.

In Java, if you make a class MyId that wraps an int, it now needs to be allocated, thus MyId[] is not a zero-cost abstraction since int[] doesn't need the allocation-per-element. In rust, a #[derive(Copy, Clone)] struct MyId(u32); is a zero-cost abstraction because &[u32] and &[MyId] (or Vec<u32> and Vec<MyId>, or ...) have the same performance characteristics.

In C#, if you make a function that takes an IEnumerable<string> and pass in a string[], that's usually slower than if you'd written the function to take a string[] specifically. Whereas in Rust if you write a function that takes impl Iterator<Item = &str> and pass in an iterator you got from a &[&str], it's just as fast (at runtime) as if you'd written the function to take the reference-to-slice specifically.

I like how Why I Think Rust Is The Way Forward | by Rico Mariani | Medium put it:

I don’t have to avoid patterns in the book. I use the book. The book works.

TL/DR: It means that high-performance code looks like normal code, not some verbose be-extra-careful dialect that can't use typical features.

3 Likes

It does not seem such an unworthy phrase to me. The way I see it is that everything in a high level language is an abstraction over something, that is the whole point of an HLL. For example:

z = x + y 

Is an abstraction of:

LOAD R0, [x]
LOAD R1, [y]
ADD R0, R1
STORE [z], R0 

In some hypothetical assembler language.

The we have abstractions over conditions, loops, function calls, structures, arrays, etc, etc, etc.

The notion of "zero cost" is simply that one should be able to use an abstraction like z = x + y and it generates code that is efficient as writing the assembler by hand.

How well compilers meet that goal is another matter...

The C++ guys extend the idea of zero-cost abstraction to the higher levels of abstraction that C++ offers. One should be able to use smart pointers, or iterators or whatever with no penalty over achieving the same results with raw pointers, for loops and so on. Sounds fair enough to me.

Again, how well compilers meet that goal is up for debate and measurement. It's a goal worth aiming for.

I don't understand that. One typically has very little idea what compilers do to whatever you think the code ought to be. Especially after their optimisers have been at it. That is true even before we start using higher level abstractions to which the term "zero cost" is typically applied.

As noted by someone above "zero cost abstraction" can hardly apply to languages like Javascript or Python where everything has a huge cost from the get go and latter abstractions added to the language just make it worse.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.