Should I borrow arrays, or pass them "by value"?

Say I've got a function that takes in several arrays (compile-time fixed length), most of which are not mutable:

use crate::types::*; // type FixedWeight = [u8; 870];

#[inline]
fn dec_write_e(
    out: &mut FixedWeight,
    m: u8,
    s: FixedWeight,
    e: FixedWeight
) {
    for (xi, (si, ei)) in zip(out.iter_mut(), zip(s.iter(), e.iter())) {
        *xi = (!m & si) | (m & ei)
    }
}

/* ... */

dec_write_e(
    (&mut preimage[1..1+FixedWeight::LEN]).try_into().unwrap(),
    m,
    s,
    e
);

This function seems to work just exactly the same (except that the signature changes) if I write it with borrows instead:

Code
use crate::types::*; // type FixedWeight = [u8; 870];

#[inline]
fn dec_write_e(
    out: &mut FixedWeight,
    m: u8,
    &s: FixedWeight,
    &e: FixedWeight
) {
    for (xi, (si, ei)) in zip(out.iter_mut(), zip(s.iter(), e.iter())) {
        *xi = (!m & si) | (m & ei)
    }
}

/* ... */

dec_write_e(
    (&mut preimage[1..1+FixedWeight::LEN]).try_into().unwrap(),
    m,
    &s,
    &e
);

I'm wondering: what are the implications of choosing between these two ways of writing the function?

I assume they both compile down to the same thing, but I'm wondering what circumstances would make me wish I'd chosen one or the other.

The compiler is allowed to change what actually happens. The general rule of thumb is to only move it (pass by value) if it makes sense for the purpose of function. If you don't need ownership of the array, you are probably better off with a reference. Otherwise, you force the user of the function to either give it away or clone it for no apparent reason.

1 Like

I guess that in my case I don't "need ownership".

But... how would I know if I did? I think I might be misunderstanding the point of a non-mut borrow.

First, &s: FixedWeight should be s: &FixedWeight.

Ownership implies being on the stack, so taking a reference allows passing in Box<FixedWeight> or Arc<FixedWeight> without copying. You can also use things that need a cheap type conversion like Vec<u8> or Box<[u8]> without copying. In this particular case, it allows passing the same bytes for s and e. It could also be cheaper when the function is not inlined, like if it is converted to a function pointer. And for large arrays (usually >1000 bytes), taking a reference reduces the chance of overflowing the stack.

1 Like

&T references allow you to look at something, &mut T references allow you to modify it, and a move takes the value away (unless it's Copy).

If you can do what you need with &T and no cloning, then thats generally fine. You get to inspect it and they get to keep it. If you find yourself cloning it to have a standalone value, it's better to take it by value. That gives the caller the choice to clone it or pass ownership.

1 Like

There is no general answer, passing by reference makes the call cheap, but if the function does a lot of work (many iterations), the cost of the call may be insignificant and passing by value may be better. Also, the compiler may figure it all out for you anyway.

That's why it's usually more helpful to go for what's more meaningful for the semantics and purpose of the function than for what the compiler may or may not do. It signals intent to the user, and also to the compiler.

3 Likes

(Thanks all for your input!)

For this function, s and e will never be the same. Specifically,

  • s is a borrowed reference from the ultimate caller (public API, so could be stack, heap, or any other addressible memory)
  • I think e will be on the stack as it's declared in the calling function with non-mut let e: [u8; 870]; and then initialized with the return of another (private) function that declares it let mut and fills it in programmatically in a loop.

That's the bit I'm not sure about. I definitely think I don't need cloning (at least, my arrays are more than 500 bytes, and the compiler isn't yelling at me about implicit cloning), so if the choice is between clone and borrow then I'll choose borrow.

But I'm still wondering about the choice between borrowing and this mere pass-by-value.


Exactly, at this point and with this trivial case (two sub-1KiB arrays) I have no doubt that the compiler is going to emit the same bytecode and wash out any differences, since a pass-by-value and a pass-by-borrow are pretty much the same semantically here.

But I am trying to learn the "best practices" to write this code correctly now, so that later on when I have to write more complex functions I know the correct semantics.


(Hah, you sniped the words right out of my mouth! I had finished writing the entire above post just before your post came through.)

Note that as an implementation detail, not something guaranteed normal fns in Rust will actually pass large arrays by pointer in LLVM, and small ones as primitives. For example, i32::from_ne_bytes actually doesn't do anything in LLVM https://rust.godbolt.org/z/xGvYhs8Gb because the array is passed the same as the i32.

So you should do what makes sense from an ownership perspective, because rustc will try to pass them a nice way. If you think of it as giving ownership of those weights to the function, then pass it by value. If you think of it as just reading weights owned by someone else, then pass a reference.

For something like [u8; 870], LLVM will probably see the same thing either way.

3 Likes

I'm still a bit new to Rust, so I'm trying to figure out what exactly that is.

I think that s (which is a uint8 array with a fixed Hamming Weight) is not being "given" to the function, as it's an arrayref of part of a larger blob borrowed from the caller.

However, that is the last place that e is referenced, and it is the only place that e is referenced aside from its initialization in the calling function. So would it be semantically appropriate to consider e as being consumed? :thinking:

I would generally suggest not treating the last one specially without good reason.

For example, I'm a big fan of

foo()?;
bar()?;
Ok(())

even when it's shorter to write

foo()?;
bar()

because I'd rather things be the same as much as possible, especially to make the diff nicer if it later becomes

foo()?;
bar()?;
qux()?;
Ok(())

So unless there's a performance reason to treat the last one differently -- like maybe if you can avoid memory allocation by passing String ownership -- I would say not to pay attention to whether this is currently the only thing you're passing it to.

2 Likes

Here's some literature:

Caller decides where to copy and place data (C-CALLER-CONTROL)

The API guidelines are specifically about public interfaces, but they're usually good advice for private interfaces as well.

If a function requires ownership of an argument, it should take ownership of the argument rather than borrowing and cloning the argument.

// Prefer this:
fn foo(b: Bar) {
    /* use b as owned, directly */
}

// Over this:
fn foo(b: &Bar) {
    let b = b.clone();
    /* use b as owned after cloning */
}

If a function does not require ownership of an argument, it should take a shared or exclusive borrow of the argument rather than taking ownership and dropping the argument.

// Prefer this:
fn foo(b: &Bar) {
    /* use b as borrowed */
}

// Over this:
fn foo(b: Bar) {
    /* use b as borrowed, it is implicitly dropped before function returns */
}

This is not really correct since you should still pass small Copy types by value even if you don't need ownership, but your type isn't small.

clippy::large_types_passed_by_value

What it does

Checks for functions taking arguments by value, where the argument type is Copy and large enough to be worth considering passing by reference. Does not trigger if the function is being exported, because that might induce API breakage, if the parameter is declared as mutable, or if the argument is a self.

Why is this bad?

Arguments passed by value might result in an unnecessary shallow copy, taking up more space in the stack and requiring a call to memcpy, which can be expensive.

Example

#[derive(Clone, Copy)]
struct TooLarge([u8; 2048]);

fn foo(v: TooLarge) {}

Use instead:

fn foo(v: &TooLarge) {}

Configuration

This lint has the following configuration variables:

  • avoid-breaking-exported-api: Suppress lints whenever the suggested change would cause breakage for other crates. (default: true)
  • pass-by-value-size-limit: The minimum size (in bytes) to consider a type for passing by reference instead of by value. (default: 256)

Clippy puts the cutoff at 256 bytes, which is of course not a precise number, but is definitely less than your type. It also brings up another reason: passing by reference will make it harder to accidentally copy a value. The other way is to wrap a Copy type in a non-Copy wrapper:

struct E(FixedWeight);

fn dec_write_e(e: E)

Then you can have the initialization return E instead of FixedWeight. This also helps you keep e and s from being accidentally swapped.

You should think of taking ownership when e logically cannot be used twice, regardless of how it's actually being used. Usually this happens when you're modifying e, but since that's not the case, perhaps you'd take ownership if it's some one-time initialization seed like a UUID, or if you're transforming it into a different representation that should be used instead of the original. Even so, since it's Copy, taking a reference isn't that much different.

It sounds like taking s by reference doesn't have any downsides.

3 Likes

The callee will see the same thing, but the caller may have to copy this if you pass by value, no? The caller can't just pass a pointer to their own data if they want to make sure it is unchanged because the calling convention doesn't guarantee it won't get changed in the pass-by-value case.

So borrowing seems like the only sensible option for large arrays of Copy types that you aren't going to change.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.