Does `#[inline]` always optimize away fixed parameter values?

I have a helper function that I wrote to make my code much DRYer. But since it's used in multiple contexts that need to query different struct fields and that either require an additional code block to be run or not, the function has a number of parameters that are fixed values at the call sites (Option<T> with varying T, bool, &'static str, char).

By inlining the helper function, I try to get the compiler to optimize away the pattern matching and destructuring checks that are associated with the fixed parameters.

Is my assumption correct that the function body will be simplified at the call sites, eliminating the fixed parameters/parameter parts? Should I make sure to use #[inline(always)] instead?

#[inline] won't always do anything. Neither will #[inline(always)] always inline functions, though it usually does. But I really think you shouldn't worry about it too much. I don't know if rustc is smart enough to evaluate a match expression for constant arguments at compile-time, especially if that match expression is in another function, but I promise you that any performance loss would not be measurable.

It's hard to make recommendations without seeing your code. Do you mind sharing the call site and function signature so I can understand what's happening?

1 Like

It absolute is smart enough to evaluate match expression in another function, but there's a catch: when compiler determines whether it's good idea to inline your function or not it looks on it's size before inlining.

If you functions is huge but after inlining becomes just a dozen of instructions then #[inline] often refuses to work, but #[inline(always)] works… only it makes your next function huge in the compiler's POV thus it reduces it's chances of being inlined, too!

That means that if you need to use #[inline(always)] to inline your lowest-level function then you would need to use #[inline(always)] in the functions which call your functions and maybe in the next level, too…

Macros work in a much more straightforward manner if you may accept them (if you have dozen versions of tuning arguments and want to support all combinations then this may produce thousand of functions which would have other issues).

Sadly Rust doesn't yet have a good answer to the if constexpr which is used to solve that issue in C++.

2 Likes

Once you've inlined to the level where most of the code can be culled out by constant folding, the general assumption is that any higher level function considering inlining will see the optimized version of the function which has been reduced to a small number of instructions.

It's not necessarily always the case[1], but most inlining is assumed to be done bottom-up and over the fully optimized version of the potential inlinee. (But before constant propagation from the outer function, and without doing recursive inlining, since that's assumed to already have been done.)

Top-down inlining can do better in some specific cases (e.g. a function which just calls another function after some trivial data shuffling), but bottom-up is by and far the main form of inlining done by all optimizers.


  1. e.g. recursive functions obviously prevent this happening perfectly ↩︎

3 Likes

I receive a raw video frame buffer from AviSynth+ (which is 64-byte-aligned; but this thread isn't about alignment safety). The helper function is meant to simplify getting slices out of the raw buffer. A slice either covers the whole buffer or a single row.

I started with const-generics, which doesn't allow &'static str, and extended the signature with more fixed-at-call-site parameters. The state is currently formally inconsistent. I'd like to use &'static str and get rid of const-generics:

fn mut_slice_from_surface_etc<'a, B: Broker, I: MemItemType, const PLANE: char>(
    surface: &Surface<B>,
    row_index: Option<u32>,
    must_check_bounds: bool,
) -> Option<&'a mut [I]> {
    #![inline]
    //! Returns `None` only if `row_index` is `Some(_)` and out of bounds.

    let (mut ptr, pitch, mem_items_per_row, height) = match PLANE {
        'i' /* interleaved */ | 'r' | 'y' => (
            surface.i_r_or_y_plane_ptr,
            surface.i_r_or_y_plane_pitch,
            surface.i_r_or_y_plane_mem_items_per_row,
            surface.i_r_or_y_plane_height,
        ),
        'g' | 'u' => (
            surface.g_or_u_plane_ptr,
            surface.g_or_u_plane_pitch,
            surface.g_or_u_plane_mem_items_per_row,
            surface.g_or_u_plane_height,
        ),
        'b' | 'v' => (
            surface.b_or_v_plane_ptr,
            surface.b_or_v_plane_pitch,
            surface.b_or_v_plane_mem_items_per_row,
            surface.b_or_v_plane_height,
        ),
        'a' => (
            surface.a_plane_ptr,
            surface.a_plane_pitch,
            surface.a_plane_mem_items_per_row,
            surface.a_plane_height,
        ),
        _ => unreachable!(),
    };

    let len = if let Some(row_index) = row_index {
        // Bounds check.
        if must_check_bounds && row_index >= height {
            return None;
        }

        // Index.
        ptr = unsafe { ptr.offset(pitch as isize * row_index as isize) };
        mem_items_per_row as usize
    } else {
        // Whole plane. (It isn't expected that the same plane slice is requested repeatedly. So, this doesn't need to be optimized.)
        pitch as usize * height as usize / surface.bytes_per_mem_item as usize
    };

    Some(unsafe { slice::from_raw_parts_mut(ptr.cast::<I>(), len) })
}

Getting a row slice for an interleaved format like RGBA where all components of a pixel are stored in one location (for which I have a struct), with the pixels one after the other:

fn row_pixels_mut(&mut self, index: u32) -> Option<&mut [P]> {
    mut_slice_from_surface_etc::<_, _, 'i'>(self, Some(index), true)
}

Getting whole buffer of same frame format:

fn pixels_mut(&mut self) -> &'a mut [P] {
    mut_slice_from_surface_etc::<_, _, 'i'>(self, None, false).unwrap()
}

There are the similar functions row_components_mut() and components_mut() for planar frame formats where the pixel components are stored at separate memory locations (e.g., just red at one location). References to either the whole buffer or a single row are created like this:

impl<C: ComponentType> ComponentsMut<'_, C> for RgbComponentsMut<'_, C> {
    fn from_indexed_surface<B: Broker>(
        surface: &Surface<B>,
        row_index: Option<u32>,
        _chroma_row_index: Option<u32>,
    ) -> Option<Self> {
        #![inline]

        Some(Self {
            r: mut_slice_from_surface_etc::<_, _, 'r'>(surface, row_index, true)?,
            g: mut_slice_from_surface_etc::<_, _, 'g'>(surface, row_index, false).unwrap(),
            b: mut_slice_from_surface_etc::<_, _, 'b'>(surface, row_index, false).unwrap(),
            sig_bits: surface.sig_bits_per_component,
        })
    }
}

If you prefer, you could replace your const generics with trait generics. It should have equivalent performance, it might give you more flexibility, and it's arguably more idiomatic than encoding an enum as a generic char parameter.

trait Plane {
    fn get_ptr<B: Broker>(surface: &Surface<B>) -> *mut _;

    fn get_pitch<B: Broker>(surface: &Surface<B>) -> _;

    fn get_mem_items_per_row<B: Broker>(surface: &Surface<B>) -> _;

    fn get_height<B: Broker>(surface: &Surface<B>) -> u32;
}

struct IryPlane;

impl Plane for IryPlane {
    fn get_ptr<B: Broker>(surface: &Surface<B>) -> *mut _ {
        surface.i_r_or_y_plane_ptr
    }

    /* ... */
}

// TODO: impl. for the other planes...

fn mut_slice_from_surface_etc<'a, B: Broker, I: MemItemType, P: Plane>(
    surface: &Surface<B>,
    row_index: Option<u32>,
    must_check_bounds: bool,
) -> Option<&'a mut [I]> {
    let mut ptr = P::get_ptr(surface);
    let pitch = P::get_pitch(surface);
    let mem_items_per_row = P::get_mem_items_per_row(surface);
    let height = P::get_height(surface);
    
    todo!()
}

That looks interesting and I hadn't thought about this. If I were to establish ...Plane structs though, I think I'd like to have them in Surface. But, as far as I see, this would break being able to generically source the correct surface data for mut_slice_from_surface_etc() with your method, because the ...Plane struct, that would then have respective fields, would need to be branched out out of Surface at an earlier point. I don't see how this would still make it possible to write this generically:

fn row_components_mut(
    &mut self,
    index: u32,
) -> Option<<B::Components as Components<'a, C>>::Mut> {
    <B::Components as Components<'a, C>>::Mut::from_indexed_surface(
        self,
        Some(index),
        Some(index),
    )
}

I still like the option to have a function that's condensed to be very short at the call sites the most. I'd now use #[inline(always)] on multiple levels, and a regular &'static str or char parameter instead of const-generics. @CAD97's answer seems to convey that I can rely on the compiler optimizing this.

To be clear, the proposal is to directly replace the const generics with marker type generics without any runtime value. Essentially a const generic enum but without actually using const generics. For every "type" you define a trait, and every constant "value" you define a type. If you wanted to associate a surface with a certain set of planes, that can be achieved by more "strategy" type markers. If you want to bundle multiple "constant parameters" into a single "parameter", that requires defining a new type and trait impl for each set of parameters.

But that's a lot of complexity for somewhat questionable gain. If you're absolutely certain that all "instantiations" with constant parameters are "small" after constant propagation, despite not being so before constant propagation, then that's what #[inline(always)] is for. This goes up the entire stack which stays generic, though; if any function "instantiation" is ever not "small", it shouldn't be #[inline(always)].

Generic functions are also an extra layer of fun, because generic instantiations are essentially always optimized in the context of whatever crate makes the root nongeneric instantiation, meaning they don't have the ability to inline any nongeneric functions which are not tagged #[inline] during primary compilation. (LTO offers a second chance where they can.)

Without knowing your full domain/design, I can't really say one way or the other on which I'd prefer. I'm a glutton for generic types, so I'd probably go with generics a bit earlier than probably advisable, at least if the generics can be encapsulated reasonably well. E.g., if at the public interface you have different functions for the different "constant parameters" and their use is just an implementation detail for deduplicataion. (But in that case, I'd also consider just using macros instead.) (Also, if they're not encapsulated, that's a big reason not to use #[inline(always)], because you no longer control the callsite to ensure that only constant parameters are used, so you don't get the constant propagation making your function "small" again.)

Defining "small" is the hard part. Inlining thresholds are a fairly arcane art, and in most cases the compiler can do better at it than you can; the essentially only exception is when you have a "large" function which eliminates a significant portion of itself via constant propagation of arguments to become "small" once more. (And it's possible to imagine an optimizing compiler which can handle this as well.)

2 Likes

Yeah, I've thought about the problem a little more and I agree. The // TODO comment is doing a lot of heavy-lifting there; a working example would be far longer than what I provided. You could make it more manageable with a macro, but there are just better options available. For a function that's just an implementation detail, it would be make more sense to provide a get_plane: impl FnOnce(&Surface<B>) -> (...) argument that extracts the necessary fields from surface. You could then wrap it in functions specific to each plane. Or, you could just pass an enum and hope the match gets evaluated at compile-time, which it sounds like it will if the function gets inlined.

That works too!

I dislike that rust-analyzer in VS Code doesn't help anymore when writing macros.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.