Difference between `Box<dyn Fn(arg) -> res>` and `fn(arg) -> res`

Hi,

Let's say I have a struct that hold pointer to a function that is set during the initialization in the new() call.
This function is than called in the other method in the implementation.

I found 2 ways to do it and I struggles to understand what is the difference between them.

The first way is to hold a fn(arg) -> res.
The second way is the hold a Box<dyn Fn(arg) -> res>.

If I understand correctly, the second way has more indirection with the Box and the trait object with it's vtable.

But beside that (and I would appreciate a confirmation from someone who knows), I don't understand the pros and cons of each way or if they permits different things.

Please enlighten me !

Here bellow a toy example (Link to playground):

use std::collections::HashMap;

struct Container {
    field: HashMap<String, i32>,
    get_func_1: fn(&Self, &str) -> i32,
    get_func_2: Box<dyn Fn(&Self, &str) -> i32>,
}

fn regular_get(obj: &Container, key: &str) -> i32 {
    obj.field[key]
}

impl Container {
    fn new(val: HashMap<String, i32>) -> Container {
        Container {
            field: val,
            get_func_1: regular_get,
            get_func_2: Box::new(regular_get),
        }
    }

    fn get_1(&self, key: &str) -> i32 {
        (self.get_func_1)(self, key)
    }

    fn get_2(&self, key: &str) -> i32 {
        (self.get_func_2)(self, key)
    }
}

fn main() {
    let mut c:HashMap<String, i32> = HashMap::new();
    c.insert("dog".to_string(), 123);
    let s = Container::new(c);
    println!("{} {} {}", 123, s.get_1("dog"), s.get_2("dog")); // 123 123 123
}
3 Likes

If you just need to store a fn item e.g. fn() {} then you can just use the corresponding lowercase fn() type.

However a boxed function (e.g. Box<dyn Fn()> and friends, note the uppercase Fn) also allows storing a pointer to a closure (e.g. ||{ }) in addition to a fn item and so is more flexible at the cost of a heap allocation.

1 Like

more flexible at the cost of a heap allocation.

Note that a Box<dyn Fn()> doesn't make a heap allocation if the trait object it's storing has zero size — which is the case for function items and for closures that don't capture any variables. So while it does have some performance disadvantages (it has an extra indirection through the vtable to get to the code, and it is larger) it isn't specifically incurring an unnecessary heap allocation, compared to using a function pointer.

10 Likes

To be precise, note that fn(arg) -> res also supports closures, but only as long as the closure doesn’t capture anything.

4 Likes

I didn't know that it supports closure not capturing anything.... Somehow it feels weird.
Why wouldn't it accept capturing closures as well ?

To sum it up, when would you use fn(arg) -> res and when would you use Box<dyn Fn(arg) -> res>
please ? What are the best practices ?

A fn(arg) -> res is a function pointer, i.e. a pointer to some compiled code in the original binary file.

A Box<dyn Fn(arg) -> res> consists of three parts: One pointer to a (potentially empty) heap allocation, and two function pointers (bundled in a static vtable) that implement

  • the call functionality (and has the heap data as an additional argument), as well as
  • what happens when the Box<dyn Fn(arg) -> res> is dropped.

If your really accurate, the vtable also contains additional layout information.

A closure can capture variables, but if it does so, you’ve got to handle this extra data. fn(arg) -> res is nothing but a simple pointer code, there’s no way to store captured variables with it. Box<dyn Fn(arg) -> res> can store the values of the captured variables on the heap, and also properly drop them eventually when the closure is no longer needed.

It’s basically more-or-less equivalent to something like (pseudo-code)

struct Box<dyn Fn(Foo) -> Bar> {
    data: *const u8, // type erased, i.e. basically a "void pointer"
    //    ^^^ to be more precise this would need to be a non-null pointer,
    //        (and a unique pointer) but I'll ignore that for simplicity 
    vtable: &'static VtableOf<Box<dyn Fn(Foo) -> Bar>>,
}
struct VtableOf<Box<dyn Fn(Foo) -> Bar>> {
    on_drop: fn(*const u8),
    layout: Layout, // size and alignment of the heap data
    call: fn(*const u8, Foo) -> Bar,
    // to be precise, there’d also be two more function pointers
    // for `call_mut` and `call_once` (the `FnMut` and `FnOnce` methods)
    // included in this vtable, but I’ll ignore this for simplicity
}

and calling the closure called, say, "f(foo)", does

(f.vtable.call)(f.data, foo)

while dropping f does essentially

(f.vtable.on_drop)(f.data);
if f.vtable.layout.size() > 0 {
    deallocate(f.data, f.vtable.layout);
}

Now, if the closure represented in this way doesn’t actually contain any captured data, then the layout size is zero, so there’s no allocated data, dropping it won’t deallocate anything, also the on_drop function will be a no-op, and the value of the data pointer is completely irrelevant

This reduces the relevant parts of the Box<dyn Fn(Foo) -> Bar> to basically just the vtable.call function pointer (which still accepts an additional *const u8 data argument, but as I said, that pointer value is irrelevant i.e. it isn’t actually going to point anywhere, and it’s going to be ignored).

So in this case, there’s a small overhead compared to just using an fn(Foo) -> Bar because the data pointer itself still takes up some space, calling the closure will still have to dereference first the vtable reference, and then the call function pointer (that’s one more level of indirection), and dropping the thing will still call the on_drop function pointer (even though that doesn’t do anything [1]) and check the size to determine that nothing needs to be deallocated.


Regarding best practices, it’s really a question of use-cases. Do you need to support capturing closures? If that’s a definite no, then fn pointer is a good choice. Also fn(…) -> … pointers can be copied, which can be handy in some situations. Not sure which to choose, and Box<dyn Fn(…) -> …> works fine (because you don’t need to copy the thing)? Well, the overhead is minimal, so choosing Box<dyn Fn(…) -> …> is okay, too. There’s even more choices with Box<dyn FnMut …> (if you don’t need to support multiple parallel calls) of even Box<dyn FnOnce …> if you only need to call the thing once. And then there’s auto-traits… in a multi-threaded environment, you might need Box<dyn Fn(…) -> … + Send + Sync>. (For comparison, function pointers would already always support Send + Sync automatically, so there’s no choices you need to make there.)


  1. the case where on_drop would do something despite a zero-sized allocation (i.e. no allocation at all) is when the closure captures some zero-sized data that has a non-trivial destructor ↩︎

13 Likes

Thank you so much for the detailed full explanation !
I learned so much from you.
It all makes sense now.

FWIW, if you are to rely on holding a zero-sized function item, then using:

&'static dyn Sync + Fn(...) -> ...
  • is clearer w.r.t. its lack of heap-allocation w.r.t. Box<dyn Send + Sync + Fn...>,

  • is semantically very close to a fn(...) -> ..., but for having a less optimized layout (one more indirection).


Finally, using an impl 'static + Send + Sync + Copy + Fn... would have the advantage of being truly zero-cost, but at the cost of having infected the API with that generic or existential type, leading to instances that use different functions having different types (indeed, the only way for the function information not to take place / indirection at runtime, it has to be encoded within the type system, so instances with different functions would have to have different types).

4 Likes

Is there any case where &'static dyn Sync + Fn(...) has a practical advantage over fn(...)?

(Other than accepting custom impl Fn for MyType which isn't stable.)

1 Like

Heh, besides Box::leaking, or with TAIT to get a const ...: implFn = ...; &... (or custom impl Fn), not that I know of :smile:

  • Although in the case of Box::leak, it does have the avantage of yielding a type-unified Copy + 'static callable that is nevertheless stateful, so it could have its niche uses. More seriously though, if downgrading the requirement from Copy to Clone, Arc<dyn Send + Sync + Fn...> does that job and without requiring leakage.
1 Like

For further illustration, it’s an interesting next step to look at some assembly.

Compiling

use std::sync::atomic::AtomicI32;
use std::sync::atomic::Ordering;

pub static X: AtomicI32 = AtomicI32::new(0);

pub fn foo() -> Box<dyn Fn()> {
    Box::new(move || { X.fetch_add(1, Ordering::Relaxed); })
}

Rust Playground <- (choose "SHOW ASSEMBLY" in the playground to generate it yourself)

produces: (comments, mine)

core::ops::function::FnOnce::call_once{{vtable.shim}}:
	movq	playground::X@GOTPCREL(%rip), %rax
	lock		addl	$1, (%rax)
	retq

// non-capturing closure, the on_drop is a no-op
core::ptr::drop_in_place<playground::foo::{{closure}}>:
	retq

playground::foo:
    // the Box<dyn Fn()> is returned in two parts in two registers
	leaq	.L__unnamed_1(%rip), %rdx // returns reference to the vtable
	movl	$1, %eax // and an irrelevant pointer (pointing to address 1, because it's non-null)
	retq

playground::foo::{{closure}}:
	movq	playground::X@GOTPCREL(%rip), %rax
	lock		addl	$1, (%rax)
	retq

// the static vtable
.L__unnamed_1:
    // on_drop
	.quad	core::ptr::drop_in_place<playground::foo::{{closure}}>
    // size 0 and alignment 1
	.asciz	"\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000"
    // call_once
	.quad	core::ops::function::FnOnce::call_once{{vtable.shim}}
    // call_mut
	.quad	playground::foo::{{closure}}
    // call
	.quad	playground::foo::{{closure}}

Compiling

pub fn foo(x: Box<&mut i32>) -> Box<dyn FnMut() + '_> {
    Box::new(move || { **x += 1; })
}

produces

// call_once executes the functionality (the addl increments,
// the other lines dereference a few times to get through the Box and the &mut)
// and also drops the captured variables (the Box<&mut i32>)
core::ops::function::FnOnce::call_once{{vtable.shim}}:
	movq	(%rdi), %rdi
	movq	(%rdi), %rax
	addl	$1, (%rax)
	movl	$8, %esi
	movl	$8, %edx
	jmpq	*__rust_dealloc@GOTPCREL(%rip)

// drops the captured variables (the Box<&mut i32>)
core::ptr::drop_in_place<playground::foo::{{closure}}>:
	movq	(%rdi), %rdi
	movl	$8, %esi
	movl	$8, %edx
	jmpq	*__rust_dealloc@GOTPCREL(%rip)

playground::foo:
	pushq	%rbx
	movq	%rdi, %rbx
	movl	$8, %edi
	movl	$8, %esi
    // allocates on the heap for the `Box<dyn Fn>`
	callq	*__rust_alloc@GOTPCREL(%rip)
	testq	%rax, %rax
	je	.LBB2_1
    // move the `Box<&mut i32>` into the new allocation
	movq	%rbx, (%rax)
    // returns the vtable reference
	leaq	.L__unnamed_1(%rip), %rdx
	popq	%rbx
    // looks like %rax is used here for returning the data pointer,
    // so it already was in the right place from the __rust_alloc call
	retq

.LBB2_1:
	movl	$8, %edi
	movl	$8, %esi
	callq	*alloc::alloc::handle_alloc_error@GOTPCREL(%rip)
	ud2


// call_mut executes the functionality (the addl increments,
// the other lines dereference a few times to get through the Box and the &mut)
playground::foo::{{closure}}:
	movq	(%rdi), %rax
	movq	(%rax), %rax
	addl	$1, (%rax)
	retq

.L__unnamed_1:
	.quad	core::ptr::drop_in_place<playground::foo::{{closure}}>
    // size 8 and alignment 8 (`\b` is an escape for ASCII "backspace" and has ASCII code 8) 
	.asciz	"\b\000\000\000\000\000\000\000\b\000\000\000\000\000\000"
    // call_once
	.quad	core::ops::function::FnOnce::call_once{{vtable.shim}}
    // call_mut
	.quad	playground::foo::{{closure}}

Compiling

pub fn drop(_f: Box<dyn FnMut()>) {
    // just drops _f
}

pub fn call(f: &Box<dyn Fn()>) {
    f();
}

produces

alloc::alloc::box_free:
	movq	%rsi, %rax
	movq	8(%rsi), %rsi
	testq	%rsi, %rsi
	je	.LBB0_1
	movq	16(%rax), %rdx
	jmpq	*__rust_dealloc@GOTPCREL(%rip)

.LBB0_1:
	retq

// recall: 
// (f.vtable.on_drop)(f.data);
// if f.vtable.layout.size() > 0 {
//     deallocate(f.data, f.vtable.layout);
// }
playground::drop:
	pushq	%r15
	pushq	%r14
	pushq	%rbx
	movq	%rsi, %rbx
	movq	%rdi, %r14
    // calls f.on_drop
    // (%rsi contains f.vtable, and the on_drop is at offset zero)
    // I assume f.data is already in the right register (rdi) to be an argument
    // for this call (it's also been saved in %r14, to be used again later) 
	callq	*(%rsi)
    // prepare/pass f.layout.size to the dealloc call
	movq	8(%rbx), %rsi
    // if zero size, jump to returning below
	testq	%rsi, %rsi
	je	.LBB1_4
    // pass f.layout.alignment to the dealloc call
	movq	16(%rbx), %rdx
    // pass f.data to the dealloc call
	movq	%r14, %rdi
	popq	%rbx
	popq	%r14
	popq	%r15
    // call dealloc with tail call optimization
	jmpq	*__rust_dealloc@GOTPCREL(%rip)

.LBB1_4:
	popq	%rbx
	popq	%r14
	popq	%r15
	retq
    // I don't REALLY understand how the code below is supposed to be reachable...
    // but I also don’t really know how unwinding works in Rust at all,
    // so perhaps you somehow get here whenever the `f.on_drop` call above panics!?
	movq	%rax, %r15
	movq	%r14, %rdi
	movq	%rbx, %rsi
	callq	alloc::alloc::box_free
	movq	%r15, %rdi
	callq	_Unwind_Resume@PLT
	ud2

// recall: It's (f.vtable.call)(f.data, foo)
// where foo is nothing in this case (type `()`)
playground::call:
    // get f.data into rax
	movq	(%rdi), %rax
    // get f.vtable into rcx
	movq	8(%rdi), %rcx
    // put f.data into rdi to pass it as an argument to the call below
    // (compare previous assembly examples where the call_mut method did expect
    // data in rdi)
	movq	%rax, %rdi
    // call vtable.call
    // (remember from earlier assembly, it’s in the end after
    // on_drop, the layout, call_once, and call_mut
    // a total of 5 word-size things (layout has two parts)
    // and 5*8 == 40)
	jmpq	*40(%rcx)
3 Likes

This one has the downside that it always involves allocation, even for zero-sized types.

1 Like

True. But if we want:

  • a 'static closure type (i.e., no borrows),

  • that is cheaply cloneable,

  • that does not incur in a heap allocation for zero-sized callables such as capture-less closures or function items,

  • That does not involve double indirection for the Arc<{ closure }> case[1],

Then hand-rolling such a trait object becomes necessary, which can get unwieldy.

  • Especially when wanting to support optional Send / Sync;


  1. There should otherwise be a way to unify impl Clone + Fn… and Arc<impl Fn…> under the same helper trait to then unify both behind a Box, without involving all the nitty-gritty details of a hand-rolled trait object. ↩︎

  2. I have used an unstable Unsize bound for convenience, but the four combinations of dyn 'usability $(+ Send)? + $(Sync)? can be covered through macros as well. ↩︎

underscore+number identifiers should be illegal

If only Arc didn’t expose the strong_count and weak_count. And Weak pointers in general, too, I guess (because those become observably unusable once all strong references are dropped). Then it could arguably be specialized to be allocation-free for zero-sized Copy types. Ah, and functionality like get_mut is problematic, too. But make_mut should be okay.


Maybe that’s an interesting idea for a crate…


(Note that specializing on Copy is even possible on stable: playground)

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.