Focus on String, Rust vs Java, 5.23 seconds vs 464 milliseconds?

Rust code, it takes 5.23 seconds with --release and some rustc omptimizations (opt-level = 3, lto = true, codegen-units = 1):

fn id() -> &'static str{
    // with_capacity is better than new
    let mut ids_string=String::with_capacity(41);
    for shape_id in 0..20 {
        // &shape_id.to_string() is better than shape_id.to_string().as_str()
        ids_string.push_str(&shape_id.to_string());
        ids_string.push('/');
    }
    return Box::leak(ids_string.into_boxed_str());
}

fn main(){
    let now = std::time::Instant::now();
    for _ in 0..80_0000 {
        let id=id();
    }
    let elapsed = now.elapsed();
    println!("{:?}", elapsed);
}

Side-by-side Java code, it only takes 464 milliseconds:

public class test2 {
	static String id() {
		StringBuilder sb = new StringBuilder(41);
		for (int shape_id = 0; shape_id < 20; shape_id++) {
			sb.append(shape_id);
			sb.append('/');
		}
		return sb.toString();
	}

	public static void main(String[] args) {
		long t1=System.currentTimeMillis();
		for(int i=0; i<80_0000; i++) {
			String id=id();
		}
		long t2=System.currentTimeMillis();
		System.out.println(t2 - t1);

	}
}

Hopefully some experienced guys can make Rust outperform Java on this, otherwise I think I will be losing confidence in Rust.

Growing a String in a function and returning it as a &str as speedy as possible, this is what I need, but I didn't see any headroom to speed it up so far.

1 Like

On my fairly old laptop (Intel Core i5-7200U), your Rust code runs in 832ms using cargo run --release with the profile settings you specified:

$ cargo run --release -v
   Compiling ids v0.1.0 (/home/mbrubeck/src/test/ids)
     Running `rustc --crate-name ids --edition=2018 src/main.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type bin --emit=dep-info,link -C opt-level=3 -C lto -C codegen-units=1 -C metadata=f4e77e0040159d95 -C extra-filename=-f4e77e0040159d95 --out-dir /home/mbrubeck/src/test/ids/target/release/deps -L dependency=/home/mbrubeck/src/test/ids/target/release/deps`
warning: unused variable: `id`
  --> src/main.rs:15:13
   |
15 |         let id=id();
   |             ^^ help: if this is intentional, prefix it with an underscore: `_id`
   |
   = note: `#[warn(unused_variables)]` on by default

warning: 1 warning emitted

    Finished release [optimized] target(s) in 4.33s
     Running `target/release/ids`
828.110454ms

If I change id to return String instead of &'static str (so we can skip into_boxed_str, which usually has to reallocate), it runs in about 700ms:

fn id() -> String {
    let mut ids_string=String::with_capacity(41);
    for shape_id in 0..20 {
        ids_string.push_str(&shape_id.to_string());
        ids_string.push('/');
    }
    ids_string
}

Changing the loop to avoid creating a temporary String reduces the time to 440ms:

use std::fmt::Write;

fn id() -> String {
    let mut ids_string=String::with_capacity(41);
    for shape_id in 0..20 {
        write!(&mut ids_string, "{}", shape_id).unwrap();
        ids_string.push('/');
    }
    ids_string
}

Using a larger initial capacity for the String speeds it up to 410ms:

    let mut ids_string=String::with_capacity(50);

The biggest speedup came from adding a dependency on the itoa crate. Now the Rust program runs in just 90ms on my laptop:

fn id() -> String {
    let mut ids_string=String::with_capacity(50);
    for shape_id in 0..20 {
        itoa::fmt(&mut ids_string, shape_id).unwrap();
        ids_string.push('/');
    }
    ids_string
}
29 Likes

So I applied @mbrubeck's optimizations and adjusted the Java program to also start with a string of capacity of 50. With that, on my machine the Rust program runs in about 106ms while the Java program runs in 132ms.

2 Likes

A relevant discussion in the Rust issue tracker: format_args! is slow · Issue #76490 · rust-lang/rust · GitHub

3 Likes

Awsome, @mbrukeck, now really fast. One thing I would like to know is how to determine the capacity of String, it also poses the same question for HashMap/Vec/HashSet etc, apprently it's not a good strategy to set capacity=length+1, Rust has to have some mechanisms to decide when to expand capacity at some point of growing length. This reminds me to optimize my whole program with this lesson, great thanks to @mbrubeck.

The RawVec type (a resizeable array of items that underpins Vec and String) uses exponential growth when deciding how to expand capacity. Then whenever it runs out of space (i.e. self.len == self.capacity) it'll automatically trigger a resize.

In pseduocode this looks like:

impl<T> RawVec<T> {
  fn push(&mut self, item: T) {
    if self.capacity < self.len + 1 {
      self.resize(self.capacity * 2);
    }

    unsafe { 
      self.as_mut_ptr().add(self.len + 1).write(item); 
      self.len += 1;
    }
  }
}

Hashmaps are a bit different because lookups are faster when you resize at, for example, 75% capacity.

99.9% of the time you shouldn't care about any of this in the real world, though. The cost of resizing will almost certainly never be a bottleneck, especially if you ever need to do file or network IO, and it only really shows up when doing micro-benchmarks like this.

3 Likes

It is a good strategy to set capacity = length if you know the length beforehand. In your particular example the ids_string should have length of 50 at the end since the numbers 0 ~ 9 have single bytes in its ascii form and 10 ~ 19 have 2, and each have the single ascii char '/' at the end.

@Michael-F-Bryan The actual number currently is 87.5%.

4 Likes

Great thanks to @Hyeonu and @Michael-F-Bryan, learned a lot.

I wonder why the compiler didn't optimize it out. Do you have any thought?

The rust compiler isn't currently supposed to optimize them out.

1 Like

Sorry if this sounds harsh, but you should really not "lose confidence" in a language simply because of a micro-benchmark like this.

If you're considering rewriting some Java code in Rust for performance issues, I guess better questions are: "is it possible?" (the answer is yes, because Rust does not have a garbage collector - among other things) and "is it easy ?" - which is a trickier question: learning Rust is hard so you've got to ask "is the learning curve and the cost of rewrite worth the performance benefit ?".

Finally, a line-by-line rewrite may not be enough - sometimes Java' s Garbage Collector is really smart so you may also need to make some design changes.

PS: I can also recommend two very good blog posts which talk about Rust from a Java perspective:

9 Likes

Wow, the first one of those is so long I'm not sure I can live long enough to understand it all.

As it starts from throwing Java, C#, C, and C++ into the same bucket I'm not sure I even want to try. Those languages are so very obviously different in their approach to managing memory.

If you are really sure that the length will always be the same, you can always try to allocate it, populate the data and then .len() to see what is the length and then use that as the capacity, if I am correct it allows the capacity to be the same as length but if it is more than that it will allocate another time which will be slower.

Another method to speed it up is to use a BufWriter when writing, writing to standard output is slow since it requires syscall. Some ways to speed it up is to lock stdout (not sure if this is possible) and the other way is to wrap stdout like BufWriter::new(io::stdout()) and then write!(buf, ...), or maybe both (I only do the last one).

Maybe I'm misunderstanding, but you can get the capacity with String::capacity().

String is just a wrapper around Vec<u8>, so String::push is a wrapper around Vec::push. Vec::push's code is this:

    pub fn push(&mut self, value: T) {
        // This will panic or abort if we would allocate > isize::MAX bytes
        // or if the length increment would overflow for zero-sized types.
        if self.len == self.buf.capacity() {
            self.reserve(1);
        }
        unsafe {
            let end = self.as_mut_ptr().add(self.len);
            ptr::write(end, value);
            self.len += 1;
        }
    }

As you can see, internally it uses Vec::reserve if the length has reached the capacity.

Which ultimately ends up calling RawVec::grow_amortized which implements the exponential growth strategy that allows Vec and String to have the expected amortized O(1) insertion at the end.

1 Like