When does macro result in better performance than inline functions in Rust?

I saw some comments on macros defined in Rust std such as:

saying that they "make a huge performance difference" (compared to inline functions).

But as I understand, macros and inline functions (small and simple enough to be inlined) result in roughly the same code after preprocessed & compiled, and it seems they are in fact small enough. (the iterator macro might be considered fairly large (?), but other macros that are used there are small)

For example, I thought an inline function like this one:

#[inline(always)]
fn is_empty<'a, T>(iter: Iter<'a, T>) -> bool {
    if T::IS_ZST { 
        iter.end.addr() == 0 
    } else { 
        iter.ptr.as_ptr() as *const _ == iter.end 
    }
}

would work equally as:

macro_rules! is_empty {
    ($self: ident) => {
        if T::IS_ZST { $self.end.addr() == 0 } else { $self.ptr.as_ptr() as *const _ == $self.end }
    };
}

when used in e.g., /src/core/slice/iter.rs.html#129-137 (sorry I can't put more than 2 links)

So my question is: where could the "huge" difference come from?

It comes from inlining quirks:

So before you want to boost your code, benchmark it enough and always look at low level code.

3 Likes

Implement nth_back for slice::{Iter, IterMut} by timvermeulen · Pull Request #60772 · rust-lang/rust · GitHub

Your guess is as good as mine. One hypothesis we had last time was that inlining happens after some other passes -- so manually inlined code basically is already present earlier in the pipeline and gets simplified better.
You could try to compile your benchmark to LLVM IR with the old and new version of libstd and compare. You'll see that the function did get inlined, but the IR is still vastly different.

This is interestting, thank you for the suggestions! I'll read PRs and do benchmarking next time!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.