I tried Steve's version and that runs in about .23 seconds on my machine, vs. about .15 seconds for C.
I would also write this this way, though with more newlines
I get a much higher variance with the C version:
steve@warmachine:~/tmp$ time ./hello
31626
real 0m0.124s
user 0m0.124s
sys 0m0.000s
steve@warmachine:~/tmp$ time ./hello
31626
real 0m0.153s
user 0m0.148s
sys 0m0.004s
steve@warmachine:~/tmp$ time ./hello
31626
real 0m0.124s
user 0m0.120s
sys 0m0.000s
steve@warmachine:~/tmp$ time ./hello
31626
real 0m0.195s
user 0m0.192s
sys 0m0.000s
steve@warmachine:~/tmp$ time ./hello
31626
real 0m0.181s
user 0m0.176s
sys 0m0.000s
steve@warmachine:~/tmp$ time ./hello
31626
real 0m0.169s
user 0m0.168s
sys 0m0.000s
Same compiler flags. So, sometimes faster, sometimes slower, whereas the Rust is very consistent. I have no idea why this is.
Actually, sorry, that was gcc
. with clang -O3
:
steve@warmachine:~/tmp$ time ./a.out
31626
real 0m0.115s
user 0m0.112s
sys 0m0.000s
steve@warmachine:~/tmp$ time ./a.out
31626
real 0m0.119s
user 0m0.116s
sys 0m0.000s
steve@warmachine:~/tmp$ time ./a.out
31626
real 0m0.154s
user 0m0.152s
sys 0m0.000s
steve@warmachine:~/tmp$ time ./a.out
31626
real 0m0.168s
user 0m0.164s
sys 0m0.000s
Same deal.
By the way, I also edited the above message to use proper formatting too. It's markdown: use either four spaces, or triple backticks to show code.
Strange. I am not seeing that variance:
dca@igor:~/Euler/rust$ cd ../c
dca@igor:~/Euler/c$ ls
problem_20 problem_20.c problem_20_in_c
dca@igor:~/Euler/c$ time ./problem_20
31626
real 0m0.160s
user 0m0.157s
sys 0m0.000s
dca@igor:~/Euler/c$ time ./problem_20
31626
real 0m0.165s
user 0m0.163s
sys 0m0.000s
dca@igor:~/Euler/c$ time ./problem_20
31626
real 0m0.155s
user 0m0.153s
sys 0m0.000s
dca@igor:~/Euler/c$ time ./problem_20
31626
real 0m0.155s
user 0m0.157s
sys 0m0.000s
dca@igor:~/Euler/c$ time ./problem_20
31626
real 0m0.166s
user 0m0.163s
sys 0m0.000s
dca@igor:~/Euler/c$ time ./problem_20
31626
real 0m0.156s
user 0m0.153s
sys 0m0.000s
dca@igor:~/Euler/c$
Please, format your code and output... It's a lot easier to read for us.
Enclose your code in triple back ticks like this:
```
// Code goes here
```
For me there is almost no difference between C and Rust (with iterators) execution time:
C: 0.21s user 0.00s system 99% cpu 0.209 total
Rust: 0.20s user 0.01s system 98% cpu 0.212 total
C is a little faster between 0.205
and 0.209
than Rust between 0.210
and 0.216
. Doesn't seem really significant to me.
Ah yeah, I like to write if/else expressions on one line when they fit and it's for the purpose of returning a value depending on a condition. It's like the ternary operator but more explicit
Thanks for the formatting tip; I hadn't taken the time to figure out how to use this site and saw that the code was pretty ugly..
Even though the code I submitted earlier gets the right answer in all versions, there's some confusion, vestiges of changes in how I wrote it as I was solving the problem. For example, the test for < 10000 in the amicable function isn't necessary; it's never going to see an n>=10000 because of the how the main function is written.
Here are improved versions.
C:
#include <stdio.h>
#include <stdlib.h>
int d(int n) {
int n_half=n/2, m, result=1;
for (m=2; m<=n_half; m++)
if ((n % m) == 0)
result += m;
return result;
}
int amicable(int n) {
int dn=d(n), ddn=d(dn);
if ((ddn == n) && (n != dn))
return n;
else
return 0;
}
int main () {
int i, result=0;
for (i=1; i<10000; i++)
result += amicable(i);
printf("%d\n", result);
exit(0);
}
Rust:
fn amicable(n:i32) -> i32 {
let dn=d(n);
let ddn=d(dn);
if (ddn==n) && (n != dn) {
n
} else {
0
}
}
fn d(n: i32) -> i32 {
(2..(n / 2 + 1)).
filter(|m| n % m == 0).
fold(0, |result, i| result + i)
}
fn main () {
let mut result:i32=0;
for i in 1..10000 {
result = result + amicable(i);
}
println!("{}", result);
}
And since a couple of you have gotten Haskell-ey, here's a Haskell version:
d :: Int -> Int
d n = foldl (\acc m -> if (mod n m)==0; then acc+m; else acc) 1 [2..(div n 2)]
amicable :: Int -> Int
amicable n = if ((d (d n)) == n) && (n /= (d n)); then n ; else 0
main = do print (foldl (\acc n -> acc + (amicable n)) 0 [1 .. 9999])
On my machine, the C version runs in about .16 seconds. The Rust version runs in about .23 seconds, or a little under 44% slower than the C version. The Haskell version runs in about .84 seconds, or 425% slower than the C version.
As I said before, the machine is a Thinkpad X-250, 4 GB memory, 7200 rpm disk, Arch Linux, up-to-date.
In your C code you're using ints, while in the Rust version you're using i32, that is a 32 bit signed integer. Are those two types equal on your system? Above I've used int32_t in the C code to be sure it's a more correct comparison.
Yes, they're the same.
(A tip I didn't know until fairly recently is perf stat -r 10 ./program
, which will run a program repeatedly and summarize the time it takes.)
Nice! Okay, so here's a full run from me:
steve@warmachine:~/tmp$ cat hello.rs
fn d(n: i32) -> i32 {
(2..(n / 2 + 1)).
filter(|m| n % m == 0).
fold(0, |result, i| result + i)
}
fn amicable(n: i32) -> i32 {
let dn = d(n);
let ddn = d(dn);
if ddn == n && n != dn && n < 100_000 {
n
} else {
0
}
}
fn main() {
let mut result = 0;
for i in 1 .. 10_000 {
result += amicable(i);
}
println!("{}", result);
}
steve@warmachine:~/tmp$ rustc -O hello.rs
steve@warmachine:~/tmp$ perf stat -r 10 ./hello
41274
41274
41274
41274
41274
41274
41274
41274
41274
41274
Performance counter stats for './hello' (10 runs):
134.867537 task-clock (msec) # 0.998 CPUs utilized ( +- 0.37% )
2 context-switches # 0.016 K/sec ( +- 6.06% )
1 cpu-migrations # 0.004 K/sec ( +- 53.75% )
106 page-faults # 0.786 K/sec ( +- 0.24% )
435,105,586 cycles # 3.226 GHz ( +- 0.03% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
536,636,455 instructions # 1.23 insns per cycle ( +- 0.00% )
164,832,900 branches # 1222.184 M/sec ( +- 0.00% )
154,391 branch-misses # 0.09% of all branches ( +- 0.14% )
0.135123580 seconds time elapsed ( +- 0.38% )
steve@warmachine:~/tmp$ cat hello.c
#include <stdio.h>
#include <stdlib.h>
int d(int n) {
int n_half=n/2, m, result=1;
for (m=2; m<=n_half; m++)
if ((n % m) == 0)
result += m;
return result;
}
int amicable(int n) {
int dn=d(n), ddn=d(dn), result=0;
if ((ddn == n) && (n != dn))
if (n < 10000)
result += n;
return result;
}
int main () {
int i, result=0;
for (i=1; i<10000; i++)
result += amicable(i);
printf("%d\n", result);
exit(0);
}
steve@warmachine:~/tmp$ clang -O3 hello.c
steve@warmachine:~/tmp$ perf stat -r 10 ./a.out
31626
31626
31626
31626
31626
31626
31626
31626
31626
31626
Performance counter stats for './a.out' (10 runs):
115.504819 task-clock (msec) # 0.998 CPUs utilized ( +- 3.26% )
2 context-switches # 0.021 K/sec ( +- 16.67% )
1 cpu-migrations # 0.004 K/sec ( +- 53.75% )
46 page-faults # 0.398 K/sec ( +- 1.30% )
361,705,040 cycles # 3.132 GHz ( +- 0.02% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
340,614,308 instructions # 0.94 insns per cycle ( +- 0.00% )
10,571,430 branches # 91.524 M/sec ( +- 0.01% )
44,740 branch-misses # 0.42% of all branches ( +- 0.48% )
0.115752287 seconds time elapsed ( +- 3.27% )
steve@warmachine:~/tmp$ uname -a
Linux warmachine 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3 (2016-01-17) x86_64 GNU/Linux
Neat!
@steveklabnik and @donallen just for the sake of correctness, in your last Rust implementations you get the wrong final answer printed because you start the fold accumulator at zero instead of one.
fold(0, |result, i| result + i)
gives 41274 as final result
fold(1, |result, i| result + i)
gives the original, I assume correct, value of 31626
Whoops! At least that shouldn't affect the speed
31626 is correct, confirmed by my submission to the Euler Project website last night.
Rust 1.8 nightly, Clang 3.7.1, Intel core i5:
C:
Performance counter stats for './c' (10 runs):
157.773517 task-clock (msec) # 0.983 CPUs utilized
18 context-switches # 0.113 K/sec
0 cpu-migrations # 0.000 K/sec
44 page-faults # 0.275 K/sec
460,557,670 cycles # 2.879 GHz (83.14%)
299,530,716 stalled-cycles-frontend # 64.80% frontend cycles idle (83.14%)
68,134,185 stalled-cycles-backend # 14.74% backend cycles idle (66.32%)
348,827,544 instructions # 0.75 insns per cycle
# 0.86 stalled cycles per insn (83.45%)
3,424,487 branches # 21.404 M/sec (83.79%)
53,505 branch-misses # 1.61% of all branches (83.66%)
0.160472723 seconds time elapsed ( +- 0.40% )
Rust:
Performance counter stats for 'target/release/rust' (10 runs):
161.201039 task-clock (msec) # 0.975 CPUs utilized
18 context-switches # 0.109 K/sec
0 cpu-migrations # 0.000 K/sec
81 page-faults # 0.492 K/sec
469,508,148 cycles # 2.850 GHz (82.89%)
254,773,475 stalled-cycles-frontend # 53.88% frontend cycles idle (83.28%)
78,178,590 stalled-cycles-backend # 16.53% backend cycles idle (67.17%)
455,225,406 instructions # 0.96 insns per cycle
# 0.57 stalled cycles per insn (83.63%)
123,972,610 branches # 752.610 M/sec (83.63%)
245,925 branch-misses # 0.20% of all branches (83.12%)
0.165250902 seconds time elapsed ( +- 0.62% )
The runtime is within 5 milliseconds. What I find Interesting is the number of branches in C vs Rust...
Hm, this is a pretty long thread about performance to have not spoken a word about cpu scaling or cpu affinity. I guess it's short running enough that affinity doesn't matter so much (as evidenced by the 0 cpu-migrations
), but please check that /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
is set to performance
(on Linux) if you want more consistent results. If you want cpu affinity, take a look at likwid
.
I've just discovered how immature LLVM/Clang was on ARM.
In the benchmark results posted above there doesn't seem to be much difference between C
and Rust
on x86_64
, however on my ARM
system it's like:
gcc's 1.5s
to rust's (or clang's 3.6) 3.4s
meaning the newer llvm 3.7
bundled with rust makes no difference.
A source of branches is the checking Rust does for the %
operator (just like for division). It often optimizes out, but here it looks like it doesn't, yet the impact of those branches is pretty minimal. I'd switch to unsigned integers to remove some of the panic cases that are checked for in the code.
Thanks, that's definitely true. Rewriting d
to use a for
loop reduces the number of branches from 125M to 80M, so there is still some optimization potential for the iterator based version. However, the branches are also well-predicted and therefore I see only a 1-2ms (<1%) difference.
An update on the ARM issue - it seems it was just a little regression in LLVM (quite mature otherwise) leading to the much slower modsi3
being chosen over divmod
on processors without hardware division support.
Soon to be fixed in 3.8
. (thanks ARM guys)
EDIT:
The patch is here which brings performance in line with gcc's (1.5s
vs 1.6s
).
A Happy End!
I was curious about the difference cited with switching to iterator adapters, so I looked at the LLVM IR for these versions of d
:
A (fast):
fn d(n: i32) -> i32 {
let mut result: i32 = 1;
for m in (2..(n / 2 + 1)).filter(|m| n % m == 0) {
if n % m == 0 {
result = result + m;
}
}
result
}
B (slow):
fn d(n: i32) -> i32 {
let mut result: i32 = 1;
for m in (2..(n / 2 + 1)) {
if n % m == 0 {
result = result + m;
}
}
result
}
The first one has the odd duplication in an attempt to isolate the difference, as on my machine it had the same performance as the version with fold
.
On my machine, A runs around 0.54s while B runs around 0.73s.
Here are graphs of the optimized IR:
http://dl.dropbox.com/u/1237941/esap-a.pdf
http://dl.dropbox.com/u/1237941/esap-b.pdf
As you can see, the slower one (B) is much simpler, while the faster one (A) has an additional panic check (in A, panic_loc3976
is "attempted remainder with a divisor of zero", and panic_loc3979
is "attempted remainder with overflow"; B has somehow gotten rid of the overflow one). The faster one also lacks a nsw
(no signed wrap) on the add of 1, which is supposed to allow optimization, while the slower one keeps it. Makes total sense, right?
Assembly:
http://dl.dropbox.com/u/1237941/esap-a.s
http://dl.dropbox.com/u/1237941/esap-b.s
Edit: One odd thing in the assembly is some redundant code in the slow version:
+0x42 cmpl %r10d, %edi
+0x45 setl %al
+0x48 movzbl %al, %eax
+0x4b addl %edi, %eax
+0x4d cmpl %r10d, %edi
+0x50 movl %edi, %esi
+0x52 movl %eax, %edi
+0x54 jl +0x30