Is String.splitn() missing a huge easy optimisation?

Looks like size_hint isn't implemented for SplitN

let mut split = line.splitn(6, ' ');
eprintln!("size hint: {:?}", split.size_hint()); // (0, None)

350MB/s - profiler shows a LOT of time is spent inside Vec::grow and such

let parts: Vec<&str> = line.splitn(6, ' ').collect();

450MB/s - setting the vec capacity for myself

let mut parts = Vec::with_capacity(6);
parts.extend(line.splitn(6, ' '));

If I'm understanding things correctly, implementing size_hint for SplitN as return (n, Some(n)) would result in a ~25% speedup for collect(). I don't know how to actually make and test modifications to the standard library though. So... I guess I'm just throwing this out there, in the hope that somebody will either explain why this is a bad idea, or go ahead and implement it :smiley:

The highest “correct” lower bound is 1 though. (Or 0? I’m not 100% sure what it does for empty strings. I think 1 is correct, too.)

The effect of an incorrectly high lower bound would be that too large Vecs could be allocated in a setting where the bound, e.g. 6, was just an upper bound thats rarely reached. With the potential effect of significant overhead in memory usage.

Note that it's worse than that. A consumer of the iterator is allowed to misbehave (so long as it doesn't trigger UB) if the implementation doesn't respect the rules.

So, for example, if the size_hint is (n, Some(n)), it would be legal for collect to notice that and run

let mut v = Vec::with_capacity(n);
for _ in 0..n {
    v.push(iter.next().expect("look, you said it has this many"));
}

and thus end up panicking if the iterator didn't actually have that many items in it, or to silently drop omit some if there were actually more in the iterator than the upper bound said.

(TBH, I don't think size_hint is the best API. filter would also be happier with one that's more like capacity_guess that's allowed to over- or under-shoot the real value arbitrarily. Especially since last I checked, size_hint().1 is never actually used for anything.)

7 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.