PHP vs Rust performance question

First of all, I'm not a programmer but only a sysadmin doing more complex tasks in PHP-CLI (forgive me :smile:).
I'm very interested in doing more stuff in Rust, so as an exercise I tried to "convert" a CSV file into a TAB separated file. This is easy enough in PHP:

csv.php:

#!/usr/bin/php
<?php
while ( ( $data = fgetcsv( STDIN ) ) !== FALSE ) {
        echo implode( "\t", $data ) . "\n";
}

After a lot of googling and going through some of the docs, I came up with this solution:
Cargo.toml:

[package]
name = "csv-test"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
csv = "1.1.6"

src/main.rs:

use std::error::Error;
use std::io;
use std::process;


fn example() -> Result<(), Box<dyn Error>> {
    let mut rdr = csv::ReaderBuilder::new()
        .flexible(true)
        .has_headers(false)
        .from_reader(io::stdin());

    for result in rdr.records() {
        // The iterator yields Result<StringRecord, Error>, so we check the error here.
        let record = result?;
        // the following block appends a tab to the last field, which it not what i want
        /*for field in &record {
            print!("{}\t", field);
        }*/
        let record_vector: Vec<_> = record.iter().collect();
        print!("{}", record_vector.join("\t"));
        println!();
    }
    Ok(())
}

fn main() {
    if let Err(err) = example() {
        println!("error running example: {}", err);
        process::exit(1);
    }
}

Surprisingly enough, this works! :slight_smile: But the PHP version yields the same result and is faster:

[pigpen@shuttle:~/rust_projects/csv-test]$ time cat test.csv | ./csv.php > /tmp/foo.php

real    0m2.198s
user    0m1.862s
sys     0m0.354s
[pigpen@shuttle:~/rust_projects/csv-test]$ time cat test.csv | ./target/debug/csv-test > /tmp/f
oo.rust

real    0m3.512s
user    0m3.120s
sys     0m0.415s

I'm sure, I'm doing things all wrong, so here are my three questions:

What is the most idiomatic way to do this in Rust?
What is the most concise way to do it in Rust?
What is the most performant way?

Thank you in advance,
Marc

Try build with cargo build --release. And run ./target/release/csv-test.

6 Likes

WOW! This surely makes a difference! :star_struck:

[pigpen@shuttle:~/rust_projects/csv-test]$ time cat test.csv | ./target/release/csv-test > /tmp
/foo.rust

real    0m0.582s
user    0m0.220s
sys     0m0.381s

Thank you so much!!! :+1:
Any tips on how to improve my code?

1 Like

It's pretty close: you can change that print! to a println! and get rid of the following one, and you can maybe try to get rid of reallocation of the Vec, but honestly it doesn't seem worth it.

The bigger win is to just use csv to write out a TSV file properly: just set the delimiter and it will quote as needed and configured for you, which handles any edge cases your data has. Then you just feed the reader into the writer, and problem solved! In theory.

2 Likes

This part could be faster.

let mut iter = record.iter();
if let Some(field) = iter.next() {
   print!("{}", field);
   // print rest fields
   for field in iter {
     print!("\t{}", field);    
   }
}
println!();

We avoid allocating the Vec and the join here.

You can also try replace io::stdin() with std::io::BufReader::new(std::io::stdin().lock()), which might be faster.

2 Likes

If you're looking to make the program faster, the csv crate tutorial has a section on it: csv::tutorial - Rust

(I think you only need the "amortizing allocs" section here.)

There is also an example showing how to use the csv writer: csv::tutorial - Rust

For writing, write_byte_record is the best you can do (while using a csv Writer): csv::Writer - Rust

8 Likes

Also you can edit your profile a little in Cargo.toml:

[profile.release]
opt-level = 3
debug = false
codegen-units = 1
lto = true
strip = true

and build with cargo run -r - produces release build too but is shorter.

1 Like

Avoiding the Vec & Join actually made processing a bit slower. The lock() doesn't seem to make much of a difference.
Really appreciate your tips, though! :+1:

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.