Performance impact of Result vs Option


#1

I have a HashMap<String,HashMap<String,String>> pairs, roughly ~50,000 items for the outer hashmap and between 1-20 values for the inner hashmap.

I call the following method 300 times

pub enum PdlEntry {
    COMMENT,
    NUM(f64),
    STR(String),
    VNUM(Vec<f64>),
    VSTR(Vec<String>),
}
pub fn get_field<'a, T: 'a>(name: &str, entries: &'a HashMap<String, PdlEntry>) -> result::Result<T, Error>
where
    &'a HashMap<String, PdlEntry>: GetPdlData<T>,
{
    entries
        .get_pdl_data(name)
        .ok_or(err_msg(format!("Could not find {} in entity {:?}", name, entries)))
}

And then repeat the exercise with this method

pub fn get_field2<'a, T: 'a>(name: &str, entries: &'a HashMap<String, PdlEntry>) -> Option<T>
where
    &'a HashMap<String, PdlEntry>: GetPdlData<T>,
{
    entries.get_pdl_data(name)
}

The overhead is quite dramatic. When using Result type the timings are
~5s
when using the Option object it’s closer to
0.01s

Compiled in release mode with timings tested by

        let now = Instant::now();
        let cpdm: PowerDual = PowerDual::try_from(&pdl).unwrap();
        println!(
            "Getting fixings end {}.{:03}",
            now.elapsed().as_secs(),
            now.elapsed().subsec_millis()
        );

Should there be such a overhead using Result<> and, if so, I’m thinking that my code should generally be constructed using Option return types and only at the higher level use Result, especially not in inner loops.


#2

You’re formatting and creating a String each time here, even on success - you want ok_or_else(|| ...).


#3

That sped up everything in the code. I had ok_or() everywhere. Is there a way to profile for allocations in rust?


#4

ok_or is a perf footgun because of this. Even rustc devs got bit: https://github.com/rust-lang/rust/pull/50051. And its docs do mention:

But it’s somewhat easy to miss while slinging code around.

As for profiling, not sure there’s a great story right now. I think most people use the standard C/C++ tools, like valgrind, for alloc profiling. Take a look at https://internals.rust-lang.org/t/improve-the-heap-and-cpu-profiling-story/5996/3 and see if that helps.


#5

On Linux, perf has served me very well over the years. It has guided a large amount of the optimization work I’ve done in Rust.


#6

Use clippy, it warns you about use of ok_or (and similar methods) that perform calculations, and suggests switching to ok_or_else.


#7

How well does perf work specifically for allocation tracking? I’ve mostly played with it for CPU perf counters purpose.


#8

I don’t use it for allocation tracking. I use it to tell me where time is being spent in my program. Allocations tend to show up there. :slight_smile:


#9

That’s fair :slight_smile:. Alloc tracking is slightly different but for @rusty_ron’s case it sounds like it’s one and the same in the end.


#10

5 seconds is still a suspiciously long time, even with extra allocations. Are you measuring with --release flag? In generic code it makes rust literally a thousand times faster.


#11

He mentioned it’s release mode. Note the string includes debug formatting of the entire HashMap that’s passed in. So this is a combo of allocation + formatting cost.


#12

Ah, I’ve missed that. That makes sense.


#13

Absolutely using --release

C:\>cargo test test_coupon_construction --release -- --nocapture

sadly on windows, not linux, so I’ll have to stick to timings, rather than a specific tool for now.


#14

AFAIK, the closest thing to perf on Windows is a tool called Windows Performance Analyzer (WPA). You may want to check it out. I have never tried WPA itself, but its predecessor xperfview helped me a lot back when I was doing performance work on this OS.


#15

You could try to use some tools like perfview