How to outperform js with string interpolation?

Hello there!

I was trying to see how fast rust can be vs js, and to my surprise, it didn't really go well, so I must be doing something wrong.

Here's the rust version:

use std::time::Instant;

struct User {
    name: String,
    age: i32,
}

fn to_bench() -> i32 {
    let mut users: Vec<User> = vec![];

    for i in 1..=100 {
        users.push(User {
            name: format!("John Doe {}", i),
            age: i,
        })
    }

    let mut total_chars = 0;
    let mut total_ages = 0;

    for user in users.iter() {
        total_chars += user.name.chars().count();
        total_ages += user.age;
    }

    total_chars as i32 + total_ages
}

fn main() {

    let start = Instant::now();
    for _ in 0..1_000_000 {
        to_bench();
    }

    let duration = start.elapsed();
    println!("Total execution time: {:?}", duration);

}

Here's the js version of it:

const to_bench = () => {
  let users = [];

  for (let i = 1; i <= 100; i++) {
    users.push({ name: `John Doe ${i}`, age: i });
  }

  let total_chars = 0;
  let total_ages = 0;

  for (let i = 0; i < users.length; i++) {
    total_chars += users[i].name.length;
    total_ages += users[i].age;
  }

  return total_ages + total_chars;
};

const main = () => {
  console.time("Bench");

  for (let i = 0; i <= 1_000_000; i++) {
    to_bench();
  }
  console.timeEnd("Bench");
};

main();

Both codes are meaningless in logic, I just wanted to test somewhat of a real world example

I figured out that the format! call is the slowest part of the code. But while the rust version takes 5.8s in my machine, the js version takes about 4s. I don't know if it should be that slow, or maybe now js is fast for this particular test?

I tested the rust code against a release build.

So, what's the matter? I'm trying to build a compiler in rust, as I want it to be as fast as posible. I know I probably won't be doing the same ops as the example above, but I'll be doing a lot of string manipulation.

So how can I totally outperform js in this example?

I think you've found a case where having a garbage collector is actually a speedup. What's happening in the two pieces of code is that you create a really large number of very small objects, destroying them, and repeating this. With a garbage collector, it can destroy all of the values in one go instead of destroying them one-by-one. Another thing that could happen is that some of the objects are destroyed after your main loop, meaning that they're not included in the benchmark.

Here is the most well optimized version I was able to write:

use std::time::Instant;
use smallstr::SmallString;

struct User {
    name: SmallString<[u8; 32]>,
    age: i32,
}

fn to_bench() -> i32 {
    let mut users: Vec<User> = Vec::with_capacity(100);

    for i in 1..=100 {
        let mut name = SmallString::new();
        name.push_str("John Doe ");
        itoa::fmt(&mut name, i).unwrap();
        users.push(User {
            name,
            age: i,
        })
    }

    let mut total_chars = 0;
    let mut total_ages = 0;

    for user in users.iter() {
        total_chars += user.name.len();
        total_ages += user.age;
    }

    total_chars as i32 + total_ages
}

fn main() {
    let start = Instant::now();
    for _ in 0..1_000_000 {
        to_bench();
    }

    let duration = start.elapsed();
    println!("Total execution time: {:?}", duration);
}
[dependencies]
itoa = "0.4.8"
smallstr = "0.2.0"

The above uses two crates. The itoa crate has a very optimized string formatting implementation, which is faster than the one in the standard library. The smallstr crate lets you create small strings without doing any memory allocations at all, entirely removing that cost. The code also uses Vec::with_capacity, which makes a noticeable difference.

However, I think it is important to point out here that you've happened to find a problem that JS actually does really really well. Rust will still outperform JS in most use-cases without having to spend any time optimizing the code.

3 Likes

You're also not doing Rust any justice by only using one thread - try using Rayon, and compare its performance (along with the simplicity of use) to whatever will need to be written for JS to do anything remotely similar. That's what truly separates Rust from the rest of the pack.

Just to highlight a couple improvements Alice made that don't involve crates:

    // Pre-allocate known sizes.
    // Iterator map + collect will often have an analogous improvement.
    let mut users: Vec<User> = Vec::with_capacity(100);
        // This is faster (just read an integer) and returns the number
        // of UTF8 code units, i.e. bytes (Rust Strings are UTF8).
        //
        // The javascript version returns the number of UTF16 code
        // units. Javascript Strings are UTF16, so that's also just
        // reading an integer.
        //
        // The chars iterator version returns the number of Unicode
        // scalar values.  That requires examining the entire String
        // (in either language).
        total_chars += user.name.len();
2 Likes

One important thing I noticed is that just changing

     for user in users.iter() {
-        total_chars += user.name.chars().count();
+        total_chars += user.name.len();
         total_ages += user.age;
     }

sped it up 21% on my machine. (EDIT: I no longer trust that number. This is the problem with benchmarks that don't use something like Criterion.rs - Criterion.rs Documentation ...)

I would propose that the number of UTF-8 Code Units (like this change does) is just as useful as the number of UTF-16 Code Units (that the JS code does) or the number of Codepoints (that the original Rust version did) -- because either it's just as good, or they're all wrong since one should be counting grapheme clusters instead.

This is an important observation. I see about a 24% speedup by adding std::mem::forget(users); to the end of to_bench, and thus not counting the cost of destructing the Vec<User> and its elements. (Of course, that's a memory leak so probably not how you'd want to write things normally, but it's a useful measurement technique. And for short-lived programs one could always build the program with an allocator than never releases memory, to get GC-like memory behaviour.)

It might be interesting to see how the performance of the JS version is impacted if one makes the problem run for long enough that it needs to GC.

3 Likes

Thank you all!

I think @alice is in the right direction. As I've never programmed without a garbage collector, I didn't consider its side effect:

Running the code with 1000 iterations on the to_bench function in both js and rust gives me:

Rust (my naïve implementation): 9.4ms
Rust (@alice implementation): 4.2ms
Js version: around 20ms

If I ever finish my compiler, I probably won't be using this exact form of code, so probably isn't that bad?
Although I've seen other people (Rust vs Go: Which is best? THE Definitive Answer - YouTube) struggle with strings as well.

2 Likes

If I ever finish my compiler, I probably won't be using this exact form of code, so probably isn't that bad?

You don't need to worry about microoptimization of this type of code. Write the code; then, if you have a performance issue, you can try to optimize. Premature optimization is almost always counterproductive, especially in a language like Rust that is so easy to refactor (with the help of the compiler).

I think comparing single thread performance is a perfectly valid thing to do.

If ones single thread code is so poorly optimised then throwing more cores at it is just using more hardware, more energy, causing more global warming...

2 Likes