Benchmarking flatten vs concat and collect

I made these benches, thought they were interesting. Note the difference in timing only increases with scale.

flatten_collect         time:   [27.625 µs 27.854 µs 28.147 µs]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

concat                  time:   [7.1218 µs 7.1610 µs 7.2107 µs]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

nested_flatten_collect  time:   [31.480 µs 31.707 µs 31.940 µs]

nested_concat           time:   [6.3997 µs 6.4424 µs 6.5028 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

extend_flatten          time:   [27.459 µs 27.614 µs 27.856 µs]
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

resulT_flatten          time:   [27.401 µs 27.579 µs 27.742 µs]

result_collect          time:   [19.095 µs 19.312 µs 19.526 µs]

option_flatten          time:   [27.369 µs 27.624 µs 27.905 µs]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

option_collect          time:   [20.255 µs 20.375 µs 20.498 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
use criterion::{black_box, criterion_group, criterion_main, Criterion};

const QUANTITY: usize = 100;

pub fn concat(v: Vec<Vec<usize>>) -> Vec<usize> {
    v.concat()
}

pub fn iter_flatten_collect(v: Vec<Vec<usize>>) -> Vec<usize> {
    v.into_iter().flatten().collect()
}

pub fn map_nested_flatten_collect() -> Vec<usize> {
    (0..QUANTITY).map(|i| vec![i; QUANTITY]).flatten().collect()
}

pub fn map_nested_concat() -> Vec<usize> {
    let v = (0..QUANTITY).map(|i| vec![i; QUANTITY]).collect::<Vec<_>>();
    v.concat()
}

pub fn extend_flatten(v: Vec<Vec<usize>>) -> Vec<usize> {
    let mut result = Vec::with_capacity(QUANTITY * QUANTITY);
    result.extend(v.into_iter().flatten());
    result
}

pub fn result_flatten() -> Vec<usize> {
    let v: Vec<Result<usize, ()>> = vec![Ok(1); QUANTITY * QUANTITY];
    v.into_iter().flatten().collect()
}

pub fn result_collect() -> Vec<usize> {
    let v = vec![Ok(1); QUANTITY * QUANTITY];
    v.into_iter().collect::<Result<_, ()>>().unwrap()
}

pub fn option_flatten() -> Vec<usize> {
    let v = vec![Some(1); QUANTITY * QUANTITY];
    v.into_iter().flatten().collect()
}

pub fn option_collect() -> Vec<usize> {
    let v = vec![Some(1); QUANTITY * QUANTITY];
    v.into_iter().collect::<Option<_>>().unwrap()
}

fn benchmark_concat_vs_flatten_collect(c: &mut Criterion) {
    let v: Vec<Vec<usize>> = vec![vec![1; QUANTITY]; QUANTITY];

    // cmpare flatten vs concant
    c.bench_function("flatten_collect", |b| {
        b.iter(|| black_box(iter_flatten_collect(v.clone())))
    });
    c.bench_function("concat", |b| b.iter(|| black_box(concat(v.clone()))));

    // compare flatten when its part of an iterator chain building the vec vs
    // build it entirely then concat
    c.bench_function("nested_flatten_collect", |b| {
        b.iter(|| black_box(map_nested_flatten_collect()))
    });
    c.bench_function("nested_concat", |b| {
        b.iter(|| black_box(map_nested_concat()))
    });

    // compare flatten when the final vec is preallocated
    c.bench_function("extend_flatten", |b| {
        b.iter(|| black_box(extend_flatten(v.clone())))
    });

    // compare flatten vs collect
    c.bench_function("resulT_flatten", |b| b.iter(|| black_box(result_flatten())));
    c.bench_function("result_collect", |b| b.iter(|| black_box(result_collect())));
    c.bench_function("option_flatten", |b| b.iter(|| black_box(option_flatten())));
    c.bench_function("option_collect", |b| b.iter(|| black_box(option_collect())));
}

criterion_group!(benches, benchmark_concat_vs_flatten_collect);
criterion_main!(benches);

git: GitHub - arifd/rust-flatten-vs-concat-and-collect: benchmarks

This doesn't mean that one should avoid flatten. Running time is not the only objective to optimize for. Collecting everything into a Vec is not always an option; sometimes you want/need to save memory.

2 Likes

That's a really good point! I have ammended my post to be more neutral.

@H2CO3 I suspect there are a lot of into_iter().flatten().collect() usecases in the wild, for which there is no reason to prefer flatten over concat. May be a good first issue for a clippy lint?

Note that concat is much less flexible than flatten + collect:

  • it only works if the final item type is Clone
  • it only works for slices of AsRef<[T]> (think for example if you have a VecDeque<T> somewhere there)

Another options would also be improving the performance of .flatten().collect() for some common types (e.g. Vec<Vec<T>>) with specialization.

#110353 will optimize a bunch of flattens, but not Vec<Vec<T>>, because implementing an accurate size_hint for that would make size_hint slow since it'd have to iterate over the outer vec every time it's called and we prefer O(1) size hints. And I think a large fraction of the speedup is due to the size_hint/correct preallocation (the other one being the Copy specialization for appending slices).
Specializing flatten().collect::<Vec<_>> would be possible, but of course that would break down as soon as you insert additional adapters between the flatten and the collect.

1 Like