Joining two similar functions

I was wondering about this already a long time. Actually I wonder even more that this has not been asked here in the last 18 months since I am watching Rust closely. Maybe it is just me who has problems to see a good Rust solution. Well most people here are really smart, but maybe for some the answer might be useful, so I will post it here as a "weekend" question, see below.

I think one solution is using an additional generic closure parameter, which is then inlined by the compiler. That was proposed by an AI some months ago, and Claude 3.7 just confirmed it. First I had some doubts, and I consider it still a bit funny, but it seems to make sense. I will not post a link to the full Claude solution, as some people do not like AI that much, but the core of the suggestion was

fn walk_rook<F>(g: &Game, kk: KK, s: &mut KKS, filter: F) 
where 
    F: Fn(KK) -> bool,

Is this the best solution, or are there other ways to solve it? Here is the actual question:

I have the two Rust functions below, which are very similar.
The first one has the test "if wanted(kk)". When I know that this test
will evaluate to true, I call walk_rook2() for utmost performance.
This is fine, but the code duplication is a bit ugly. How can I avoid
the dublication and somehow join the two functions, while still obtaining
optimal performance?

fn walk_rook1(g: &Game, kk: KK, s: &mut KKS) {
    let mut i: usize = 0;
    let mut kk = kk;
    while {
        kk.di = g.rook_path[kk.si as usize][i].pos;
        kk.di
    } >= 0
    {
        if {
            kk.df = g.board[kk.di as usize];
            kk.df
        } == 0
        {
            i += 1;
        } else {
            i = g.rook_path[kk.si as usize][i].nxt_dir_idx;
        }
        if wanted(kk) {
            s.push(kk)
        }
    }
}


fn walk_rook2(g: &Game, kk: KK, s: &mut KKS) {
    let mut i: usize = 0;
    let mut kk = kk;
    while {
        kk.di = g.rook_path[kk.si as usize][i].pos;
        kk.di
    } >= 0
    {
        if {
            kk.df = g.board[kk.di as usize];
            kk.df
        } == 0
        {
            i += 1;
        } else {
            i = g.rook_path[kk.si as usize][i].nxt_dir_idx;
        }
        s.push(kk)
    }
}

Your code is pretty hard to read, since you aren't giving us any definition of the data types involved and you're using many single letter variables with single letter types and no further comments on what they mean.

That is true, but the actual code does not really matter. Indeed, first I was going to post a separate minimal, trivial example. But with such a trivial example, the reason for my question would be not that obvious. But well, I can do it. The code is from my tiny chess engine, e.g. tiny-chess at GitHub. Both AI's had no problem understanding my question, I think the first some months ago was DeepSeek or Quen 2.5, and I just asked Claude 3.7. Both suggested the closure solution, but I am always a bit sceptical if I can not easily verify an AI answer. I would have to look at the assembly, or benchmark.

OK, I will post a trivial example soon.

Here is a minimal example for my question:

/*
The first part of both functions are very similar.
Can we join them, without less performance.
E.g. with one single generic function with an additional const or closure parameter.
*/

fn prod1(a: i32, b: i32) -> i32 {
    let h = a * b; // some calculations
    h
}

fn prod2(a: i32, b: i32) -> i32 {
    let h = a * b; // some calculations
    if h < 100 { h + 1 } else { h }
}

fn main() {
    {
        let a = 7;
        let b = 13;
        let res1 = prod2(a, b); // call prod2 as we are not sure that result is larger than 100
    }

    {
        let a = 77;
        let b = 130;
        let res1 = prod1(a, b); // call faster prod1 as we are sure that result is larger than 100
    }
    println!("Done.");
}

I hope that makes it more clear.

1 Like

You want to take one piece of source code and encourage the compiler to produce two different optimized versions in machine code. The general technique is to write a generic function, which you have already done here — it is an appropriate solution. The main thing that you could do differently is use a custom trait other than Fn, but whether that is appropriate depends on what is convenient for the actual usage of walk_rook.

Another solution is to use a non-generic function and suggest to the optimizer that it should produce separately optimized versions of the common code, using inlining hints:

fn walk_rook1(g: &Game, kk: KK, s: &mut KKS) {
    walk_rook_shared(g, kk, s, true);
}
fn walk_rook2(g: &Game, kk: KK, s: &mut KKS) {
    walk_rook_shared(g, kk, s, false);
}
#[inline(always)]
fn walk_rook_shared(g: &Game, kk: KK, s: &mut KKS, use_wanted: bool) {
    ...
    if !use_wanted || wanted(kk) {
    }
    ...
}

This should almost always produce essentially the same final machine code as the generic version.

2 Likes

I am happy that you understood my actual issue. Actually, I was going to ask this since I ported my chess engine from Nim to Rust a year ago, but as the question is a bit difficult, I just waited, hoping someone other would ask a similar question.

This is the core! So the attribute #[inline(always)] and an additional bool parameter would produce two separately optimized versions? Really interesting -- I think the book of Jim Blandy has not mentioned this. But it makes sense, thanks.

I would say this is half of the entire point of inlining. Not that there are precisely identifiably “two versions” of some code, but that inlining enables the code of the callee to be optimized in context of how it is being used in the caller, which frequently enables radical simplifications.

(The other half is removing the overhead of function calls themselves — non-inlined function calls have to move data around to fit the function signature, and save register contents to the stack, but inlined functions can entirely avoid unnecessary moves.)

1 Like

inline(always) can have unintuitive effects by the way – maybe not in this particular use case necessarily, but in general it’s a complex situation (compare e.g. also recent discussions here).

In this case, I think the Fn-based solution seems quite natural. You "just" write

fn walk_rook(g: &Game, kk: KK, s: &mut KKS, cond: impl Fn(KK) -> bool) {
    let mut i: usize = 0;
    let mut kk = kk;
    while {
        kk.di = g.rook_path[kk.si as usize][i].pos;
        kk.di
    } >= 0
    {
        if {
            kk.df = g.board[kk.di as usize];
            kk.df
        } == 0
        {
            i += 1;
        } else {
            i = g.rook_path[kk.si as usize][i].nxt_dir_idx;
        }
        if cond(kk) {
            s.push(kk)
        }
    }
}

(IMO the impl Trait argument syntax is ideal/convenient for such use cases)

and call it either like walk_rook(…, wanted) and walk_rook(…, |_| false) directly, or define helper functions for these calls. Manual inlining annotations aren’t necessary in such cases, because generic functions are always inlineable anyway, and no-op wrappers (which helper functions for these calls are, because neither wanted as a function item nor |_| false as a closure have any run-time representation – they are just 0-sized types) are very clearly something the compiler/LLVM should always sufficiently optimize anyway.


As a side-note, another fun idea/alternative – in the direction of @kpreid’s bool-based approach, could be a const-generic argument, i.e. instead of

fn walk_rook_shared(g: &Game, kk: KK, s: &mut KKS, use_wanted: bool) {
    ...
    if !use_wanted || wanted(kk) {
    }
    ...
}

doing

fn walk_rook_shared<const USE_WANTED: bool>(g: &Game, kk: KK, s: &mut KKS) {
    ...
    if !USE_WANTED || wanted(kk) {
    }
    ...
}

Those also force separate monomorphization/compilation for each value (true and false in this case) by default anyway, so no inline would ever be necessary in this version, either. (That being said, even with the use_wanted: bool parameter, it’s still possible that LLVM’s optimizer decides that inlining is the correct approach anyway; it not actually definitely necessary in this case either.)

4 Likes

Thanks for your detailed reply. Actually I was not aware that a function signature like

fn walk_rook(g: &Game, kk: KK, s: &mut KKS, cond: impl Fn(KK) -> bool) {

is already enough to generate multiple instances in the machine code, as generics do. (monomorphization).

That is more compact than the generic solution suggested by Claude like

fn walk_rook<F>(g: &Game, kk: KK, s: &mut KKS, filter: F) 
where 
    F: Fn(KK) -> bool,

I will study your full text in more detail tomorrow. (Actually, the label for this thread was chosen by me as "tutorial", because this is more about learning Rust, and not an actual question for fast help. But well, somehow it has flipped to "help".)

Yes, the tutorial label is usually used for people who have written some tutorial-like content themself (either as a text here, or e.g. linked externally as an article, video, etc…) and want to share it here.

Don’t worry about it too much, it’s just changed to help other readers with their expectations; you can click the pencil icon in the top right corner to see that in this case change of category from tutorials to help was made by @kpreid, though I had already considered the same change myself (but then got distracted or something…) Recategorizing and renaming all topics is something that all our forum members of trust level 3 “Regular” or above can do.

4 Likes

To be precise, impl Fn in a function parameter list is syntax sugar for a generic parameter <F: Fn> (except for lacking a name). They are compiled exactly the same.

4 Likes

Another thing to point out & re-iterate is that a function call like

walk_rook(…, wanted)

(to a function that expects not an fn(…) pointer but any Fn(…)-trait implementor)

will be working with function items, not function pointers

E.g. if you have two separate function items fn wanted1(…) {} and fn wanted2(…) {}, then calls to walk_rook(…, wanted1) and walk_rook(…, wanted2) will still result in two separately monomorphized versions of walk_rook that already know – hard-coded – what actual function to call at run-time, not in one common one working with function pointers at run-time.

See also e.g.:

(this listing of videos is coming from this reddit comment, but I also recall having watched both of these videos myself at some point)

4 Likes