Std library inclusion policy, and a data point of compilation times


#1

I’m now writing many small Rust programs, some of them are Project Euler solutions and similar things, and some of them are practical (sometimes they are Rust translations of small programs I wrote in Python, D and Haskell).

Writing so many small programs I’m seeing that the Rust standard library lacks some small functions/features (and most of them are already present in some external crates!) that I think are basic and useful in many different situations.

I don’t yet know what rules Rust follows to decide to keep something in external crates, or to add it to the standard library. So do you know what those rules are?


But I am not asking for std library features yet. In this post I prefer to talk about something more general.

This is the Euler Project Problem 61:
https://projecteuler.net/problem=61

This is a very nice solution in Haskell (the original code is not mine. But I’ve tried to improve this code in some ways):

import Data.List (permutations)

poly :: Int -> [Int]
poly x = takeWhile (< 10000) $ dropWhile (< 1000) $ scanl (+) 0 [1, x - 1 ..]

isRow :: Int -> Int -> Bool
isRow a b = mod a 100 == div b 100

solve :: [Int] -> [Int]
solve [x3, x4, x5, x6, x7, x8] =
    [sum [a, b, c, d, e, g] | a <- poly x3,
                              b <- filter (isRow a) $ poly x4,
                              c <- filter (isRow b) $ poly x5,
                              d <- filter (isRow c) $ poly x6,
                              e <- filter (isRow d) $ poly x7,
                              g <- filter (isRow e) $ poly x8,
                              isRow g a]

e61 = head $ concatMap solve $ map (8:) $ permutations [3 .. 7]
main = print e61 -- 28684

I have tried to translate it to Rust:

#![feature(iter_arith)]

fn concat<T: Clone>(v1: &[T], v2: &[T]) -> Vec<T> {
    let mut v1c = v1.to_vec();
    v1c.extend_from_slice(v2);
    v1c
}

fn permutations<T: Clone + Copy>(items: &[T]) -> Vec<Vec<T>> {
    fn perms<T: Clone + Copy>(s: &[T], prefix: Vec<T>, result: &mut Vec<Vec<T>>) {
        if s.is_empty() {
            result.push(prefix);
        } else {
            for (i, &c) in s.iter().enumerate() {
                perms(&concat(&s[..i], &s[i + 1..]), concat(&prefix, &[c]), result);
            }
        }
    }

    let mut result = vec![];
    perms(items, vec![], &mut result);
    result
}

fn poly(x: usize) -> Vec<u32> {
    let mut result = vec![];
    let mut n = 0;
    for i in  0 .. {
        n += i * (x - 2) + 1;
        if n < 1_000 { continue; }
        if n >= 10_000 { break; }
        result.push(n as u32);
    }
    result
}

fn is_row(a: u32, b: u32) -> bool { a % 100 == b / 100 }

fn solve(xs: &[usize], polys: &[Vec<u32>]) -> Vec<u32> {
    let (x3, x4, x5, x6, x7, x8) = (xs[0], xs[1], xs[2], xs[3], xs[4], xs[5]);

     polys[x3 - 3].iter().flat_map(|&a|
     polys[x4 - 3].iter().filter(move |&&b| is_row(a, b)).flat_map(move |&b|
     polys[x5 - 3].iter().filter(move |&&c| is_row(b, c)).flat_map(move |&c|
     polys[x6 - 3].iter().filter(move |&&d| is_row(c, d)).flat_map(move |&d|
     polys[x7 - 3].iter().filter(move |&&e| is_row(d, e)).flat_map(move |&e|
     polys[x8 - 3].iter().filter(move |&&g| is_row(e, g))
     .filter(move |&&g| is_row(g, a))
     .map(move |&g| [a, b, c, d, e, g].iter().sum()))))))
     .collect::<Vec<_>>()
}

fn main() {
    let polys = (3 .. 9).map(poly).collect::<Vec<_>>();

    let e61 = permutations(&[3, 4, 5, 6, 7])
              .iter()
              .flat_map(|p| solve(&concat(&[8], &p[..]), &polys))
              .next()
              .unwrap();
    println!("{}", e61); // 28_684
}

This Rust code compiles, runs very quickly, and gives the right answer (and in my opinion it shows some things that should be present in the Rust standard library, but I leave this to other posts), and if you have suggestions to improve the code they are welcome.

But on my fast enough computer it compiles without optimizations in about 12.4 seconds. I think that’s a bit too much for a 53 cloc lines long program :slight_smile:

I am aware that rewriting the program in a C-style way, the Rustc compiler is probably able to compile it on my pic in 1.5 seconds. All those iterators slow the compilation down a lot, I guess. If you use such iterator-heavy code for thousands of lines I think the compilation times become excessive, even on the fastest PCs.


#2

The excellent rustc developers just recently fixed a type checking slowdown in particular for such chained iterator cases. Maybe it would help your program? It should be fast again in nightly.

Rust doesn’t have a policy of adding every conceivable feature to libstd, but you are welcome to propose it. Minor things with a PR, major things with an RFC instead, so that it can be discussed. Here and on the internals forum is a good place to test out ideas for RFCs.

I’m not sure libstd wants to add functionality for permutations. It used to have it before 1.0, but it’s one of those things that was removed to have a well thought out features for 1.0.

For concatenation, there is already [a, b, c].concat() that concatenates a number of slices or vectors.


#3

The permutations iterator that I put into libstd (and was removed before Rust 1.0) was slow anyway. Happy it’s gone :smile:. The new impls in crate permutohedron are better.


#4

I was using a v1.9, but I’ve just updated the compiler to (64 Windows):
rustc 1.9.0-nightly (b678600ac 2016-03-29)

The compile-time, without optimizations, is now a bit higher, about 12.9 seconds :frowning:

Right, sorry! I forgot about this method, and I wasn’t able to find it again in the docs… Perhaps the Vec functions/algorithms could be grouped by purpose (searching, mutation, and so on) in the docs.

Beside that concat function useful for multiple slices to concat, I think I’d like to use an operator to concatenate one vector and a slice, like you do for strings:

fn main() {
    let s1 = String::from("Hello ");
    let s2 = "World";
    let s3 = s1 + s2;
}

I’m sure this was discussed a lot in past.


#5

Now with rustc 1.11.0-nightly (12238b984 2016-06-04) the compilation times are more reasonable:

-C opt-level=0 -Z no-trans: 0.62 seconds
-C opt-level=0 -C prefer-dynamic: 1.94 seconds
-C opt-level=3 -C target-cpu=native: 2.44 seconds

Now with -C opt-level=3 the Rust code compile time is just about twice the compilation time of the roughly equivalent Haskell code (1.11 seconds with GHC v. 7.10.2, -O3).


#6

Now with rustc 1.11.0-nightly (ad7fe6521 2016-06-23):

-C opt-level=0 -Z no-trans: 0.48 seconds
-C opt-level=0 -C prefer-dynamic: 1.69 seconds
-C opt-level=3 -C target-cpu=native: 2.24 seconds


#7

Now with rustc 1.12.0-nightly (0a3180baa 2016-08-03):

-C opt-level=0 -Z no-trans: 0.48 seconds
-C opt-level=0 -C prefer-dynamic: 1.93 seconds
-C opt-level=3 -C target-cpu=native: 2.40 seconds

The compilation times have recently increased (in other code I’ve seen even more increase).

Also note with -Zorbit=off:

-Zorbit=off -C opt-level=0 -C prefer-dynamic: 1.61 seconds
-Zorbit=off -C opt-level=3 -C target-cpu=native: 2.10 seconds


#8

Thanks for keeping these performance changes bumped @leonardo. I strongly encourage you to file bugs every time you see a reproducible regression as well.

cc @eddyb @nikomatsakis some perf regressions here.


#9

The biggest compilation time regression I’ve seen in the last 3 days is for the whole pack of my Euler problems solutions, from about 42 seconds to about 61 seconds (the binary is also a bit bigger and the run-time is a little longer, but this is a small change). The code in that single module stresses the compiler a lot, I presume. I think it could be used in some kind of regression suite for Rust, but it’s a little large module… it contains the efficient solutions to the first 210 problems.


#10

That does sound like a good candidate for a compile-time perf suite cc @nrc.


#11

Now with rustc 1.14.0-nightly (f09420685 2016-10-20):

-C opt-level=0 -Z no-trans: 0.26 seconds
-C opt-level=0 -C prefer-dynamic: 0.86 seconds
-C opt-level=3 -C target-cpu=native: 1.39 seconds

The compilation times have halved!? (It’s now a bit faster than the Haskell compiler, that takes about 1.4 seconds for the Haskell code with -O3, GHC version 8.0.1).


#12

Maybe it is partly thanks to this change by nnethercote, among many other incremental improvements lately https://github.com/rust-lang/rust/pull/36917


#13

With rustc 1.14.0-nightly (cae6ab1c4 2016-11-05) the timings are about unchanged:

rustc -C opt-level=0 -Z no-trans test.rs: 0.25 seconds
rustc -C opt-level=0 -C prefer-dynamic test.rs: 0.84 seconds
rustc -C opt-level=3 -C target-cpu=native test.rs: 1.38 seconds

But there’s a surprise for me:
rustc temp.rs: 1.41 seconds
rustc -O temp.rs: 1.83 seconds

Apparently -C target-cpu=native is making a lot of difference now, do you know why? I think in past it didn’t make much difference.

Edit: I’ve compiled some other programs, and -C target-cpu=native doesn’t change the compilation time much, so it’s not a general pattern.


#14

The timings have changed in an interesting way:

rustc -C opt-level=0 -Z no-trans test.rs: 0.22 seconds
rustc -C opt-level=0 -C prefer-dynamic test.rs: 0.76 seconds
rustc -C opt-level=3 -C target-cpu=native test.rs: 1.30 seconds
rustc test.rs: 1.49 seconds
rustc -O test.rs: 1.86 seconds

ghc test2.hs: 1.88
ghc -O3 test2.hs: 1.70

#15

Something stranger has happened in the meantime (rustc 1.16.0-nightly (83c2d9523 2017-01-24):

rustc -C opt-level=0 -Z no-trans test.rs: 0.48 seconds
rustc -C opt-level=0 -C prefer-dynamic test.rs: 1.08 seconds
rustc -C opt-level=3 -C target-cpu=native test.rs: 1.61 seconds
rustc test.rs: 1.28 seconds
rustc -O test.rs: 1.69 seconds

Ant the resulting binary sizes keep growing.


#16

Using rustc 1.17.0-nightly (4be034e62 2017-02-27):

rustc -C opt-level=0 -Z no-trans test.rs: 0.54 seconds
rustc -C opt-level=0 -C prefer-dynamic test.rs: 1.13 seconds, 191_040 bytes
rustc -C opt-level=3 -C target-cpu=native test.rs: 1.66 seconds, 1_097_582 bytes
rustc test.rs: 1.33 seconds, 1_218_762 bytes
rustc -O test.rs: 1.74 seconds, 1_097_582 bytes

The compilation time has grown up a bit. The binary sizes have kept slowly growing almost every day since months, and with the last rustup update (2017-02-27) then have had another increment.


#17

Yes, I’ve noticed a significant different between stable (1.15.1) and nightly. Enough to make me prefer stable!


#18

rustc nightly-x86_64-pc-windows-gnu 1.17.0-nightly (cab4bff3d 2017-03-21):

rustc -C opt-level=0 -Z no-trans test.rs: 0.18 seconds
rustc -C opt-level=0 -C prefer-dynamic test.rs: 0.81 seconds, 194_394 bytes
rustc -C opt-level=3 -C target-cpu=native test.rs: 1.32 seconds, 1_138_133 bytes
rustc test.rs: 0.99 seconds, 1_261_386 bytes
rustc -O test.rs: 1.41 seconds, 1_138_133 bytes

More generally, I think binary sizes have had a spike up, compilation times have decresed a little, run-times have decreased a bit.


#19

Thanks for continuing to track this @leonardo.


#20

rustc nightly-x86_64-pc-windows-gnu 1.18.0-nightly (7627e3d31 2017-04-16):

rustc -C opt-level=0 -Z no-trans test.rs: 0.18 seconds
rustc -C opt-level=0 -C prefer-dynamic test.rs: 0.79 seconds,191_709  bytes
rustc -C opt-level=3 -C target-cpu=native test.rs: 1.35 seconds, 1_132_923 bytes
rustc test.rs: 0.98 seconds, 1_250_669 bytes
rustc -O test.rs: 1.43 seconds, 1_132_923 bytes

Binary sizes are a little smaller, compilation time is almost the same. In Nightly the run-time of the compiled code has decreased for some benchmarks since the precedent day.