Normally when using spawn functions that run something concurrently, the variables borrowed by the closure must live for the 'static lifetime to avoid issues with the spawned thread living longer than the reference.
However the rayon::scope function that I called in Tinv guarantees that all threads spawned inside the scope using the provided scope variable will have finished when the call to rayon::scope returns. Thus it is okay for those threads to borrow things from outside the scope, because e.g. gMin cannot be dropped while the rayon::scope function is still running.
In the type &Scope<'scope> we are saying that we are dealing with a scope, and the lifetime 'scope represents the lifetime of everything inside the surrounding rayon::scope call. Thus if we have a reference with the lifetime 'scope, we require that this reference is valid inside the entirety of this scope. Threads spawned by scope.spawn will return before the end of the rayon::scope call returns, so such a thread is included in this lifetime and gMin is guaranteed to be valid for the duration of that thread.
So by adding 'scope to Tqueue we're basically saying “I want to be able to use the reference gMin inside threads spawned from the scope”, and the compiler guarantees you can only call Tqueue if gMin actually is valid for that long.
Thanks for that great explanation alice. It might have taken me a long while to glean all that from the docs. I must say that all this "scope: &Scope<'scope>" is a bit mind numbing.
The upshot is that I now have your suggestion in place and it all builds without complaint. Does not run though. Seems to be starting threads and running work on them. Crashes out with a billion overflow on subtract errors.
I'm not at all clear about hat atomic exchange thing. In the C code I found that this worked fine:
Basically the atomic swap is trying to perform this operation:
gMin = min(gMin, xp.s);
but atomically. The while loop works this way:
The smin variable contains the last seen value of gMin (but some other thread may have changed it since then).
We know that no other thread will ever increase the value of gMin, so if smin <= xp.s, we can stop as we then have gMin <= smin <= xp.s, thus our operation would leave gMin unchanged.
Swapping unconditionally like you do in your new Rust code would not work, because what if we have this situation: gMin < xp.s < smin ? This can happen if some other thread decreased gMin since we last saw it, and if we replace the value at gMin with xp.s, we performed our operation incorrectly (we increased gMin).
However if we are in this situation: xp.s < smin = gMin, then replacing gMin with xp.s is fine, because that definitely decreases it. Luckily the atomic compare_and_swap method does exactly this:
fn compare_and_swap(&self, current: u32, new: u32, order: Ordering) -> u32 {
// this but atomic
if *self == current {
*self = new;
return current;
} else {
return *self;
}
}
The point is that self is only changed if it is equal to what we thought it was. So in this case we say: replace it with xp.s, but only if it's currently equal to smin, which is the last value we saw. If gMin is different from smin, we just update smin, reevaluate if we still need to update it, and try again if so.
This does not deadlock, because if one thread fails to update, it means some other thread succeeded.
That said your panic is probably from something else, so hopefully Rust can tell you which line underflowed and maybe you can find your mistake using that.
Rust is trying to tell me something with a few hundred lines of messages like this:
thread 'thread 'thread 'thread 'thread '<unnamed><unnamed><unnamed><unnamed>thread '<unnamed>' panicked at 'thread '' panicked at '' panicked at '' panicked at '<unnamed>' panicked at 'attempt to subtract with overflow<unnamed>attempt to subtract with overflowattempt to subtract with overflowattempt to subtract with overflow' panicked at 'attempt to subtract with overflow', ' panicked at '', ', ', attempt to subtract with overflow', src/queue.rsattempt to subtract with overflowsrc/queue.rssrc/queue.rssrc/queue.rs', src/queue.rs:', :::src/queue.rs:46src/queue.rs464646:46:::::46:2146212121:21
:
21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
21
thread 'thread 'thread 'thread 'thread '<unnamed>thread 'thread '<unnamed><unnamed><unnamed><unnamed>' panicked at '<unnamed><unnamed>' panicked at '' panicked at '' panicked at '' panicked at 'attempt to subtract with overflow' panicked at '' panicked at 'attempt to subtract with overflowattempt to subtract with overflowattempt to subtract with overflowattempt to subtract with overflow', attempt to subtract with overflowattempt to subtract with overflow', ', ', ', src/queue.rs', ', src/queue.rssrc/queue.rssrc/queue.rssrc/queue.rs:src/queue.rssrc/queue.rs::::46::46464646:4646::::21::21212121
2121
I gather it is complaining about the subtract in queue.rs line 46. That is in a harmless helper function which is good, it's the same as the non threaded version.
Enabling RUST_BACKTRACE produces 20 thousand lines of output. This could take some time....
Great tip. Unfortunately it removes useful information.
In the full trace I can make out the following in the noise, repeated many times:
14: tatami_rust::queue::tfree
at src/queue.rs:46
15: tatami_rust::queue::T
at src/queue.rs:79
16: tatami_rust::queue::Twork
at src/queue.rs:111
17: tatami_rust::queue::Twork
at src/queue.rs:134
18: tatami_rust::queue::Twork
at src/queue.rs:122
19: tatami_rust::queue::Twork
at src/queue.rs:134
20: tatami_rust::queue::Twork
at src/queue.rs:122
21: tatami_rust::queue::Twork
at src/queue.rs:122
22: tatami_rust::queue::Tqueue::{{closure}}
at src/queue.rs:162
Using abort the only line in my code I see is:
14: 0x7fc2a01f83e6 - tatami_rust::queue::tfree::he9ddb25726b4164c
at src/queue.rs:46
Either way it does not help much. It just shows that threads start, they run through the expected chain of calls and then fail on subtract in a little leaf function.
Clearly I have something wrong in that chain, just got to find it...
You are computing (k - 1) * (n + 1) which panics when k = 0. You may need to special case that situation, for instance, doing
k .checked_sub(1)
.map(|it| it * (n + 1))
.unwrap_or(value_when_k_is_0)
or just k.saturating_sub(1) * (n + 1) if value_when_k_is_0 == 0
EDIT: I just noticed there is a division by k just before, so nevermind this post. Still, it seems that for different reasons, the case k = 0 ought to be handled.
Hell will freeze over before I want to replace a simple arithmetic expression with all that indecipherable map and lambda functional programming stuff.
I'm not sure if it is by accident or design but the case that k = 0 never happens when things are working right. It might as well just panic and make it known something is wrong.
It's reported that this solution now runs on a Pi 4 in almost exactly the same time as the winning C solution in the performance chart here: Tatami Pi Fun. - Page 10 - Raspberry Pi Forums
pi@debian-buster-64:~/tatami-rust $ time target/release/tatami_rust 200
T(85765680)=200
real 0m0.549s
user 0m2.137s
sys 0m0.004s
Scaling up the original Project Euler problem #256#256 Tatami-Free Rooms - Project Euler to solve T(S) = 1000 needs moving to 64 bit primes. The Rust solution using a rayon thread pool handily beats the current champion written in C and using pthreads. On my x86_64 PC:
pqplum64 (C + pthreads)
T(63405342000)=1000
real 1m49.223s
user 12m14.781s
sys 0m0.219s
tatami_rust_threaded64 (Rust + rayon)
Using 64 bit integers.
PNUM = 40000
FNUM = 20
SMAX = 100000000000
Pr(40000)=479909
Running Rust translation of queue.c...
Uing 8 cores.
T(63405342000)=1000
real 1m36.383s
user 11m53.078s
sys 0m0.281s
Which again is about 3.5 times faster than the fastest single thread C code we have.
Once again, thanks for the input everybody.
However, the situation on ARM is dire. There the threaded Rust solution is slower than the single threaded one !