Rewriting a sha1 benchmark for wasm

I should mention that I initially started writing this thread looking for help getting a working rewrite, then accidentally managed to get it working while trying to document the issue. I'm kind of learning both javascript and rust while using them in a project. here is the original, and here is my fork

I'm trying to decide whether to use neon or wasm and decided to check out a performance comparison, both in terms of speed and the underlying code complexity.

I cloned and ran the original repo that was comparing short and long ops/sec in native js, a neon based rust project and a wasm based rust project. the results were similar to the posted benchmark:

wasm#short x 77,917 ops/sec ±0.57% (82 runs sampled)
wasm#long x 34,726 ops/sec ±1.20% (82 runs sampled)
js#short x 275,189 ops/sec ±0.69% (92 runs sampled)
js#long x 21,498 ops/sec ±0.75% (89 runs sampled)
neon#short x 644,123 ops/sec ±0.34% (92 runs sampled)
neon#long x 182,730 ops/sec ±0.32% (91 runs sampled)

the wasm test results were much slower than I expected so I started comparing the implementation between the original neon and wasm implementations.

the wasm implementation seemed far more complex than the neon rust implementation,and had more of the work done by native javascript. From what I was reading it seems that wasm previously didn't support string operations directly, which seems to no longer be the case, and the neon implementation seemed closer to the type of examples listed in the rust wasm docs, so I've been trying to rewrite the wasm implementation. I made a lot of small changes to various non rust files and if you need me to list all of them I can.

the test that called all 3 implementations seemed to be passing strings directly, so I changed the argument type to &string and the return type orignally to Result<String,JsValue>. currently the function is this:

#[wasm_bindgen]
pub fn digest(string: &str) -> Option<String> {

    let mut m = Sha1::new();
    m.update(string.as_bytes());
    let dgst= m.digest().to_string();
    
    Some(dgst)
}

built in the Makefile via:

cd wasm && wasm-pack build --release --target nodejs

I didn't change the actual test other than removing the ready.then block that wasn't necessary anymore.
the new results are this:

wasm#short x 229,630 ops/sec ±0.47% (85 runs sampled)
wasm#long x 114,446 ops/sec ±0.49% (89 runs sampled)
js#short x 275,967 ops/sec ±0.70% (90 runs sampled)
js#long x 20,944 ops/sec ±0.63% (89 runs sampled)
neon#short x 691,628 ops/sec ±0.35% (94 runs sampled)
neon#long x 188,252 ops/sec ±0.39% (91 runs sampled)

while this is definitely a major improvement, npm is spitting out a lot of complaints prior to starting the tests about unhandled promise rejections, and I want to make sure this isn't effecting the actual runtime, plus I want to make sure that there is nothing in my wasm rewrite that would cause issues if someone actually tried to use the implementation as is. While I doubt anyone will, I'm doing this to try to figure out the both the fastest and the least complicated way of getting both languages to interact(the two may or may not be mutually exclusive), and unintended problems at runtime definitely qualifies as complicated.

There a few problems:

  • JavaScript uses UTF-16 strings, Rust uses UTF-8. So passing a string from one side to the other involves conversion.
  • Rust running in WASM cannot access data that JS uses. All data needs to be copied into WASM memory first.

To improve things:

  • take &Uint8Array as input and get the data via to_vec()

sorry about not getting back to this sooner I was basically out of commission yesterday.
I rewrote as

#[wasm_bindgen]
pub fn digest(input: &js_sys::Uint8Array) -> Option<String> {
    let string = input.to_vec();

    let mut m = Sha1::new();
    m.update(&string);
    let dgst = m.digest().to_string();

    Some(dgst)
}

that actually made short op slower by approximately 50,000 ops/sec and long op dropped all the way to 10,436 ops/sec

I also tried skipping the variable declaration, and doing the conversion inside the function call itself, and that further slowed down the code.

Is it possible that the conversion is happening implicitly via some code in wasm-bindgen and is fairly optimized? I'm mainly trying to understand why explicitly converting actually slowed down the code.

Do you have to create the Uint8Array first?
I was assuming your input was already available with that type.

You could also try &[u8] as input, that should not be slower than &str.

I changed the input to &[u8] and removed the intermediary conversion, passing the borrowed u8 vector(slice?) to the sha1 instance. with similar results(actually slightly slower by 10,000)

though I also just change back to &str, removed the variable declaration and passed it to sha1 using &value.as_bytes() with similar results to before (around 224k for short).

interestingly enough (at least to me, whose still figuring out a lot about the language) removing the second borrow operator(value.as_bytes()) caused a 4k increase to short ops/sec but a 4k drop in long ops/sec

A 4k in 224k is 2%. Might just be noise.

Passing Vec and &[u8] still involves copying data around.

Since I have no clue as to what is going on, I would just file it under "funky benchmark"…

The WASM version can't win against the JIT compiled JS version due to the unavoidable copy.