My program runs slow on windows

lemon37564 · June 11, 2021, 12:31pm

I'm facing a performance issue on my laptop.
my othello(aka reversi) AI program takes around 2.6 seconds to solve a move on kde neon, while it takes 8.8 seconds on windows (both program runs on the same laptop).
I'm pretty sure that it wasn't the antivirus program or windows defender interfering my program, because my program start only once and have a loop input, I think it will be faster on the second input if it was the antivirus scanning the program, but it didn't.
I've tried both cargo build --release and rustc -C opt-level=3, but it performed as a same result.
This is my program https://github.com/lemon37564/othello-board/blob/main/builtinai/ai6.rs if needed.
The input will be "++++++++++++++OX++++XO++++++++++++++ 1"
I have to submit the executable since this is my project in university, can anyone help me?

wesleywiser · June 11, 2021, 2:02pm

It looks like your program spends a lot of time allocating memory (the call to ntdll.dll!RtlpAllocateHeapInternal near the top left of the flamegraph):

The system allocators on Windows and Linux have different performance characteristics and since your program runtime is dominated by memory allocation, it's likely the reason for the performance difference as well.

On a hunch, I guessed the issue was the call to Vec::with_capacity(32) on line 91 which is called from sorted_valid_nodes() on line 401. After changing this to simply Vec::new(), the program ran noticeably faster.

lemon37564 · June 11, 2021, 2:16pm

Thanks for replying!
I've just tried the Vec::new() and the speed has a notably difference!
But I'm little bit curious about the reason. Theoretically, I think it should be slower if my vector is appended dynamically than allocated the space at first.
Why would this happened?
BTW, may I ask what's the profiling tool in the picture? I liked the pprof in golang a lot, but I didn't found a similar tool on rust (I am new to rust).

wesleywiser · June 11, 2021, 2:28pm

You're welcome!

I don't have a good grasp of your program so these are just general guesses that may not be valid for this particular situation:

If nothing is ever actually pushed to the Vec, Vec::new() does not actually allocate any memory and so it will be much faster than than Vec::with_capacity().
Many allocators have optimizations for small sized allocations. 32 * sizeof<Node>() might be large enough that it by passes that fast path. If you only ever end up pushing a few items to the Vec in most instances, this could make a difference.
Depending on the distribution of data, it could be that you never need any more than the initial allocation Vec::new() gives you once an item is pushed into the Vec. In that case, there are no reallocations going on which is a situation where Vec::with_capacity() makes a lot of sense.

The tool is Windows Performance Analyzer. It's part of the Windows SDK and is free to use. It works pretty well with Rust code but you do need to make sure you always tell it to Load Symbols which will resolve function symbols from the PDBs (under the Trace menu) since Windows does not know how to demangle Rust symbols.

lemon37564 · June 11, 2021, 2:51pm

It seems to be the reason in my opinion. I didn't thought of the condition that the content of vector will be a small size or even empty. I should modify the code to avoid problems like this.

Cool. I'll give it a try. Thanks a lot.

Hyeonu · June 11, 2021, 2:56pm

If the allocator dominated the performance, why don't you use faster allocator instead? You can replace the global allocator with the mimalloc. But never optimize without measuring, as there's no silver bullet which is guaranteed to make everything actually faster.

lemon37564 · June 11, 2021, 3:22pm

Interesting, This is the first time I investigate the allocator that deep.
I think the details in allocator are too hard to me for now, but I'll try to simply use it and compare the performance.
But I think what I need to do now is to improve my awful algorithm, it should impact the performance the most.(lol)

mejrs · June 11, 2021, 6:58pm

If you have a lot of small vecs, consider using a stack-based vec variant. There are crates that provide this. That way you can avoid heap allocations.

lemon37564 · June 12, 2021, 3:49am

I've search stack-based vec for a while, but it seems that they all need a constant capacity when declared.(It makes sense because basically it is an array?)
So I just tried to use a larger capacity when initialized, and it is 5-10% faster than the original one.
(might be a measurement error, idk)
But this is also very helpful, Thanks!

Gilnaa · June 12, 2021, 4:55am

smallvec, for example, will store up to a certain amount of items on the stack before spilling to the heap.

lemon37564 · June 12, 2021, 6:36am

Well, I'm using arrayvec right now but I also tested for smallvec. After some experimentation, I cannot found significant speed difference between these two.
I don't know how they implemented and which one will be better.(maybe they have a similar approach?)
For me this is a kind of dilemma.

bjorn3 · June 12, 2021, 9:28pm

Arrayvec has an upper limit on the array size and will never use heap allocations. Smallvec uses the stack up to the specified limit and then switches to a heap allocation once the limit is exceeded. Use arrayvec when there is a hard limit on the size and use smallvec when the size is unbounded but below a small amount most of the time.

lemon37564 · June 13, 2021, 7:52am

That's a clear explanation, thanks!
Now I think smallvec will fit better in my case, I'll consider change to use that later.

system · September 11, 2021, 7:53am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
[resolved] Hello_world compiles very slowly on Windows 7 help	36	1698	January 12, 2023
Speed issue with program on Windows help	6	1119	June 11, 2021
Performance of OrbTK help	2	629	October 2, 2020
Why this simple piece of code is slower than python? What am I missing? help	13	540	February 23, 2023
Beginner Q: Why so slow? help	6	560	March 19, 2021

My program runs slow on windows

Related Topics