How Vec or what is optimized here?

Please have a look at the two functions here on Playground.
They only differ around line 23 and 63

// let len = if size > 4096 { 4096 } else { size };
let mut buf = vec![0u8; size];


let len = if size > 4096 { 4096 } else { size };
let mut buf = vec![0u8; len];

But the elapsed time in nanosecs to run them is quite differently

limit_vec_1_bench: 490
limit_vec_2_bench: 2051449

What is here optimized or happening in limit_vec_1_bench?

I increased the benchmark iteration counts and limit_vec_1_bench's time didn't change. This (along with the time of just a few nanoseconds), is a sign that the function is being optimized into almost nothing at all. Benchmarks like yours run on constant input and throw out their output, which makes it possible for the compiler to realize it can skip running the loops entirely, and in this case the compiler happened to detect this in case 1 and not in case 2 — probably because without the if condition, the innermost loop always runs exactly once because it has a right-sized buffer.

You can usually (not absolutely always!) fix this by using std::hint::black_box() on the input and the output, which tells the optimizer to not consider the data flow across that point. Specifically, do it to your input data, so the optimizer doesn't see the input being constant:

let readers: [&mut [u8]; 9] = [...];
let readers = std::hint::black_box(readers);

and do it to the output you are currently discarding in the loop, so the optimizer doesn't conclude it doesn't need to be computed:

let mut buf = vec![0u8; len];
// ...reading loop here...

With these changes the results seem to be more or less equal.

When writing benchmarks, it's wise to use a benchmark framework like criterion to handle many setup and measurement details that can otherwise throw off your benchmark. criterion automatically black_boxes the outputs and (where applicable) inputs of benchmarked functions, and also takes more care with the time measurement and does statistics to check how noisy the results are.


I've used Rust's nightly testing feature bench tool and changed it to this main() instant elapse measuring for Playground, because there is no bench run available in Playground.
At least with nightly bench the result has been comparable before.

The purpose of all this has been to decide if it makes sense to use stack slices in the cases where data is very small and Vec only for the bigger data. The later occurs very rarely in application. With this compiler optimization I got some problem to measure when Vec becomes faster.

Also thanks for your hint to black_box(), didn't remember although I've used it some time ago.

The playground also will give you fairly noisy benchmarks, because it's running on a shared machine that's constantly running other people's code too.

It's worth setting up a local scratch Rust project — both for more stable results and so you can use more tools.

Yes, sure, but might be difficult to link the local project code in the question :wink:

You can still post the code in a code block in your post. "You can run the code unedited on the Playground" is a good feature, but not always the top priority.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.