Is there a missing API for optimal &[u8] -> Result<String> conversion?

I want to go from a &[u8] to a Result<String,_>. Obviously I can use String::from_utf8(slice.into_vec()) or str::from_utf8(slice).map(|s| s.to_string()), but it seems to me both of those loop over the bytes twice --- once to copy to the heap, once to check UTF8 validity.

Shouldn't there be a String::from_utf8_slice(&[u8]) -> Result<String,_> function that copies to the heap and checks UTF8 validity with a single loop?

I don't think any such API exists. I would be quite curious to see a benchmark, to be honest. I would be surprised if it was easy to beat the existing two pass approach in common cases. It might make more of a difference if your string was very large though.

The issue is that while the existing APIs do imply two passes over the bytes, each of those passes is going to be exceptionally well optimized.

4 Likes

I guess you're right, although I expect for truly enormous strings there would be a measurable difference.

This can be done only with an algorithm greatly biased towards the happy path: a Vec big enough is allocated since the beginning, and while scanning for the UTF-8 validity of each byte, they get copied into this Vec.

Here are some very basic benchmarks (only tested with slices exclusively made of ASCII bytes), so take them with a grain of salt:

running 4 tests
test ascii::length_268435456::one_pass   ... bench: 100,125,048 ns/iter (+/- 101,487,589)
test ascii::length_268435456::two_passes ... bench: 137,491,704 ns/iter (+/- 19,751,579)
test ascii::length_32::one_pass          ... bench:          42 ns/iter (+/- 8)
test ascii::length_32::two_passes        ... bench:          47 ns/iter (+/- 4)
  • 268435456 bytes are 256MB

  • gist

4 Likes

Wow, that's amazing work. Thanks!

Very nice. One thing that immediately pops into my mind: doesn't char-by-char copying inhibit efficient memcpy'ing in chunks of size_of::<usize>() bytes (potentially even vectorized)?

I'm curious if that's possible to reintroduce, e.g. by buffering the actual copies into a #[repr(simd)] wrapper around [usize; 4], and if so, does it actually help gain some additional performance?

Perhaps TryFrom could have this optimization?

bstr does a little better here because it has a vectorized version of ASCII validation:

test ascii::length_268435456::one_pass        ... bench: 109,777,336 ns/iter (+/- 5,505,324)
test ascii::length_268435456::two_passes      ... bench: 128,788,444 ns/iter (+/- 4,066,359)
test ascii::length_268435456::two_passes_bstr ... bench: 124,390,484 ns/iter (+/- 9,790,566)
test ascii::length_32::one_pass               ... bench:          17 ns/iter (+/- 0)
test ascii::length_32::two_passes             ... bench:          23 ns/iter (+/- 0)
test ascii::length_32::two_passes_bstr        ... bench:          21 ns/iter (+/- 0)
2 Likes

Maybe you can try to split the string in page size, or larger. Like, verify 4KB, copy 4KB and repeat. Some larger chunking must be needed.

2 Likes

That should indeed be an interesting thing to try: it is thus left as an exercise to the reader :stuck_out_tongue:

3 Likes