Jetscii now works with (future) stable Rust 1.27.0


#1

After over 3 years of development, Jetscii will soon be able to be used on stable Rust thanks to the stabilization of SIMD!

Since docs.rs cannot build these docs at the moment, allow me to paste them in here to give an overview.

Happy to answer any questions you might have!


Jetscii

A tiny library to efficiently search strings for sets of ASCII characters or byte slices for sets of bytes.

Examples

Searching for a set of ASCII characters

#[macro_use]
extern crate jetscii;

fn main() {
    let part_number = "86-J52:rev1";
    let first = ascii_chars!('-', ':').find(part_number);
    assert_eq!(first, Some(2));
}

Searching for a set of bytes

#[macro_use]
extern crate jetscii;

fn main() {
    let raw_data = [0x00, 0x01, 0x10, 0xFF, 0x42];
    let first = bytes!(0x01, 0x10).find(&raw_data);
    assert_eq!(first, Some(1));
}

Using the pattern API

If this crate is compiled with the unstable pattern feature flag, AsciiChars will implement the Pattern trait, allowing it to be used with many traditional methods.

#[macro_use]
extern crate jetscii;

fn main() {
    let part_number = "86-J52:rev1";
    let parts: Vec<_> = part_number.split(ascii_chars!('-', ':')).collect();
    assert_eq!(&parts, &["86", "J52", "rev1"]);
}

What’s so special about this library?

We use a particular set of x86-64 SSE 4.2 instructions (PCMPESTRI and PCMPESTRM) to gain great speedups. This method stays fast even when searching for a byte in a set of up to 16 choices.

When the PCMPxSTRx instructions are not available, we fall back to reasonably fast but universally-supported methods.

Benchmarks

Single character

Searching a 5MiB string of as with a single space at the end for a space:

Method Speed
ascii_chars!(’ ').find(s) 5882 MB/s
s.as_bytes().iter().position(|&c| c == b’ ') 1514 MB/s
s.find(" ") 644 MB/s
s.find(&[’ '][…]) 630 MB/s
s.find(’ ') 10330 MB/s
s.find(|c| c == ’ ') 786 MB/s

Set of 3 characters

Searching a 5MiB string of as with a single ampersand at the end for <, >, and &:

Method Speed
ascii_chars!(/* … */).find(s) 6238 MB/s
s.as_bytes().iter().position(|&c| /* … */) 1158 MB/s
s.find(&[/* … */][…]) 348 MB/s
s.find(|c| /* … */)) 620 MB/s

Set of 5 characters

Searching a 5MiB string of as with a single ampersand at the end for <, >, &, ', and ":

Method Speed
ascii_chars!(/* … */).find(s) 6303 MB/s
s.as_bytes().iter().position(|&c| /* … */) 485 MB/s
s.find(&[/* … */][…])) 282 MB/s
s.find(|c| /* … */) 785 MB/s

#2

Now that’s a cool crate name if I ever saw one :+1:

Just now I’ve been thinking about the performance of a string parsing function I’m using, I’ll definitely check this out.

Question: Any additional gains to be had if not only the pattern, but the string itself is ascii (or to be treated as ascii, i.e. a byte slice)? Read first, ask second… sorry!


#3

Sadly, I can’t claim credit for it. I know that I solicited feedback on the name way back when I created it, but I can’t find where or who originally suggested it!


#4

Since the original post, I’ve re-introduced the Substring type. I haven’t published this version yet (my fingers are crossed for a docs.rs update shortly!).

Here’s the relevant docs:

Searching for a substring

use jetscii::Substring;

let colors = "red, blue, green";
let first = Substring::new(", ").find(colors);
assert_eq!(first, Some(3));

Searching for a subslice

use jetscii::ByteSubstring;

let raw_data = [0x00, 0x01, 0x10, 0xFF, 0x42];
let first = ByteSubstring::new(&[0x10, 0xFF]).find(&raw_data);
assert_eq!(first, Some(2));

Using the pattern API

use jetscii::Substring;

let colors = "red, blue, green";
let colors: Vec<_> = colors.split(Substring::new(", ")).collect();
assert_eq!(&colors, &["red", "blue", "green"]);

Benchmarks

Substring

Searching a 5MiB string of as with the string “xyzzy” at the end for “xyzzy”:

Method Speed
Substring::new(“xyzzy”).find(s) 5680 MB/s
s.find(“xyzzy”) 4440 MB/s

#5

Love this project. I’m using it to implement a fast-path optimization that requires scanning for the full set of 16 bytes that jetscii supports, as fast as possible.

Stable SIMD is going to enable a lot of great stuff in the ecosystem. There should be like a SIMD strike force that just goes around accelerating popular crates.

Next one I’m hoping gets the upgrade is the twoway crate @bluss.


#6

Yay!

Relatedly, yesterday I ported the SIMD UTF8 validation from this post to Rust but I have no idea what I’m doing: https://github.com/killercup/simd-utf8-check

Maybe someone in this thread wants to have a look at this and help get parts of this into std?


#7

Now I just wonder why the conventional method with a slice is slower than with a closure. Is that just from inlining?


#8

Cool!

It might be worthwhile to benchmark the code relative to UTF-8 validation in encoding_rs when encoding_rs is compiled with --features simd-accel.

The UTF-8 validation code in encoding_rs is a fork of the standard library code with the ASCII fast path replaced with a faster ASCII fast path. (I’ve targeted Wikipedia HTML as the benchmark.) The non-ASCII stuff is unchanged. The faster ASCII fast path uses SIMD on x86_64, x86 and aarch64. On ARMv7+NEON, using SIMD for the ASCII fast path made the ASCII-only case faster but pessimized even German HTML, which contain non-ASCII but the markup is ASCII and even the German text part has non-ASCII infrequently enough for there to be runs of 16 or more ASCII characters. As a result, I’ve kept ARMv7 to ALU code, but it’s still faster ALU code on ARMv7 than what’s in the standard library. (The standard library autovectorizes on x86_64, so on x86_64, the standard library isn’t really ALU code. The encoding_rs ALU code that’s faster on ARMv7 doesn’t autovectorize on x86_64.)

Using SIMD for non-ASCII looks interesting. I haven’t yet figured out what it does exactly. I’m a bit concerned though about the code being Intel-specific, even though on cursory look it appears to do things that are expressible in portable SIMD. (This is why I’ve been worried about stabilizing std::arch ahead of std::simd. We need both, but we risk needless Intel-specificity in the ecosystem when std::arch gets used for stuff that could use std::simd.)


#9

I did that too this week :smiley:

@shepmaster I have an is_ascii(x: &[u8]) -> usize that returns the index for the first non-ascii character in x using SIMD that might be interesting for your crate. It’s about 1.7x faster than str::is_utf8 at validating whether an array of bytes is ascii or not (turns out that str::is_utf8 is pretty fast already when it comes to ascii at least).


#10

The UTF-8 validation code in encoding_rs is a fork of the standard library code with the ASCII fast path replaced with a faster ASCII fast path.

I wrote this with std::simd and only managed to get it a bit faster than str::from_utf8 (about 1.2x faster). I’ve filled an llvm bug since it appears that llvm cannot optimize the core of it to a bunch of _mm_testz_si128 instructions. I then switched that to using std::arch and managed to make it 1.7x faster, but I still feel that it could be faster. Also, because it requires SSE4.1, we can’t really use it in std

EDIT: @killercup @hsivonen this is my repo: https://github.com/gnzlbg/is_utf8 , I haven’t finished the UTF-8 part yet, and these are my benchmarks results on a couple of cases on my laptop (fail means when validation should fail quickly, which penalizes simd versions that process too much data in bulk, and pass when the input is valid ascii):

test large_ascii_fail_hoehrmann                ... bench:         375 ns/iter (+/- 99) = 8264 MB/s
test large_ascii_fail_is_ascii_scalar          ... bench:          91 ns/iter (+/- 37) = 34054 MB/s
test large_ascii_fail_is_ascii_vector128       ... bench:          14 ns/iter (+/- 2) = 221357 MB/s
test large_ascii_fail_is_ascii_vector128_sse41 ... bench:          52 ns/iter (+/- 28) = 59596 MB/s
test large_ascii_fail_is_ascii_vector256_avx   ... bench:          98 ns/iter (+/- 46) = 31622 MB/s
test large_ascii_fail_rustc                    ... bench:          21 ns/iter (+/- 12) = 147571 MB/s
test large_ascii_pass_hoehrmann                ... bench:      10,072 ns/iter (+/- 2,888) = 307 MB/s
test large_ascii_pass_is_ascii_scalar          ... bench:       2,259 ns/iter (+/- 886) = 1371 MB/s
test large_ascii_pass_is_ascii_vector128       ... bench:         200 ns/iter (+/- 136) = 15495 MB/s
test large_ascii_pass_is_ascii_vector128_sse41 ... bench:         158 ns/iter (+/- 115) = 19613 MB/s
test large_ascii_pass_is_ascii_vector256_avx   ... bench:         100 ns/iter (+/- 63) = 30990 MB/s
test large_ascii_pass_rustc                    ... bench:         180 ns/iter (+/- 137) = 17216 MB/s

(note, the rustc version is compiled with -C target-feature=+avx here).


#11

Very cool! I just refactored the benchmarking setup in https://github.com/killercup/simd-utf8-check to make it a two line change to bench a function against all the inputs I have – want to add a few? :slight_smile:


Fast ASCII and UTF-8 byte slice validation in Rust
#12

Sorry, for the off-topic: Are you back to Rust again? My last info was, that you quit with the Rust project and didn’t even know, if you continue to develop software.