Problem with C/Rust speed comparison result

Hello Folks,

As my first attempt at Rust, I took this buffer comparison example from this post on :

I only adapted it to perform 1000000000 iterations, so as to have human-comparable run times.

Problem : on my Mac Studio M1 machine using clang and rustc, the Rust version is 6 times slower than the C version. Both are arm64 executables.

Did I do anything wrong?
Thanks for the help!



**jacquesmenu**@**macstudio**:**~/JMI_Developpement/Rust** > clang --version

Apple clang version 15.0.0 (clang-1500.

Target: arm64-apple-darwin23.2.0

Thread model: posix

InstalledDir: /Applications/
**jacquesmenu**@**macstudio**:**~/JMI_Developpement/Rust** > rustc --version

rustc 1.75.0 (82e1608df 2023-12-21)


clang -o BufferExampleInC BufferExampleIn.c

time ./BufferExampleInC

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct ParsedData {
  uint8_t header;
  char *payload;

void parse_buffer(uint8_t *buffer, struct ParsedData *parsed_data) {
  parsed_data->header = buffer[0];
  parsed_data->payload = (char *)&buffer[1];

void get_data(uint8_t *buffer) {
  const uint8_t data[] =
    {255, 't', 'e', 's', 't'};

  memcpy(buffer, data, sizeof(data));

int main(void) {
  // Alloc buffer for received data
  uint8_t *buffer = malloc(1024); // We'll ignore if the pointer is NULL

  // Simulate getting data from somewhere else (Ex: Socket)

  int iterations = 1000000000;

  for (int i = 1; i <= iterations; ++i ) {
    // Parse buffer into ParsedData struct
    struct ParsedData parsed_data;
    parse_buffer(buffer, &parsed_data);

    // Print payload content
//     printf("%s\n", parsed_data.payload);
  } // for


  return 0;


rustc -o BufferExampleInRust

time ./BufferExampleInRust

pub struct ParsedData<'a> {
  pub header:  u8,
  pub payload: &'a str,

impl ParsedData<'_> {
  pub fn parse(data: &[u8]) -> ParsedData {
    let header = data[0];

    let payload = std::str::from_utf8(&data[]).unwrap();

    ParsedData {

fn get_data() -> Vec<u8> {
  const DATA: [u8; 5] =
      't' as u8, 'e' as u8, 's' as u8, 't' as u8

  DATA.to_vec() // Return dynamically allocated array (Vector)

fn main() {
  // Simulate getting data from somewhere else (Ex: Socket)
  let buffer = get_data();

  let iterations = 1000000000;

  // _i because this name is not used afterwards
  for _i in 1..iterations {

    // Parse buffer into ParsedData struct
    // _parsed_data because this name is not used afterwards
    let _parsed_data = ParsedData::parse(&buffer);

    // Print payload content
    // println!("{}", _parsed_data.payload);

  } // for


**jacquesmenu**@**macstudio**:**~/JMI_Developpement/Rust** > ls -sal BufferExampleInC

72 -rwxr-xr-x 1 jacquesmenu staff 33616 Jan 14 23:29 BufferExampleInC
**jacquesmenu**@**macstudio**:**~/JMI_Developpement/Rust** > ls -sal BufferExampleInRust

824 -rwxr-xr-x 1 jacquesmenu staff 418784 Jan 14 23:29 BufferExampleInRust


**jacquesmenu**@**macstudio**:**~/JMI_Developpement/Rust** > time ./BufferExampleInC

./BufferExampleInC 2.09s user 0.00s system 99% cpu 2.105 total
**jacquesmenu**@**macstudio**:**~/JMI_Developpement/Rust** > time ./BufferExampleInRust

./BufferExampleInRust 11.97s user 0.00s system 99% cpu 11.990 total

Please format your post properly.

Thanks Alice, I missed that.

If these are the commands you compile with, it looks like you have not turned on compiler optimizations in either of C or Rust.


@James_Rust how did you compile the Rust code? Did you use cargo build --release to compile with optimisations?

Once you do compile with optimizations turned on, the C code will likely optimize to doing nothing, and the Rust code might optimize this far, too, though it will have a slightly harder time because it contains more functionality to begin with. Your C code just increments a pointer to create char *payload whereas your Rust code will do a UTF-8 validation to create payload: &str.

Edit: Looking at results on, it seems that from_utf8 cannot be inlined, so it stays (and thus with it the whole loop). As expected, the C code optimizes to nothing.


Even in light of the above I'm amazed it is only 6 times slower given the code presented. Why are you using 'str' in your Rust version? There are no strings in the C version only byte buffers.

Hello folks,

Thanks for the quick feed back!

Sorry for having newbie questions, I knew nothing about Rust and cargo some time ago.
Is there a way to get optimized code just with basic tools, as can be done in other contexts?

As to str, I just copied/pasted the examples from the original post.


You can pass the optimization level to rustc:

rustc -C opt-level=3

With rustc and optimization as suggested, I get :

jacquesmenu@macstudio:~/JMI_Developpement/Rust > time ./BufferExampleInRust
./BufferExampleInRust  3.44s user 0.00s system 91% cpu 3.776 total


-C target-cpu=native might also give you better performance if the compiler can further optimize the code for your cpu.

No change actually:

jacquesmenu@macstudio:~/JMI_Developpement/Rust > rustc -C opt-level=3 -C target-cpu=native -o BufferExampleInRust

jacquesmenu@macstudio:~/JMI_Developpement/Rust > time ./BufferExampleInRust 
./BufferExampleInRust  3.44s user 0.01s system 91% cpu 3.768 total

Something is wrong there.

When I put your code into a Cargo project and compile it with Cargo build --release and then run it as time target/release/junk then it runs in zero time. Why? Because your program has no output and hence the code is all optimised away.

That is a lot less than 3.44s. On my MacBook Pro M1.

For completeness, what Rust version was that? Also 1.75.0?

~ rustc --version
rustc 1.77.0-nightly (ca663b06c 2024-01-08)