Questions about idiomatic memory management

Is there some sort of convention on when functions should return a heap allocated value vs. writing their result to a mutable buffer passed to them via an argument. Here's a rather simple example.

fn some_computation(arguments: Args) -> Vec<u8> {
    let mut result: Vec<u8> = Vec::new();
    // Some computations mutating result
    result
}

vs.

fn some_computation(arguments: Args, buffer: &mut [u8]) {
    // Same computations, but mutating buffer

If the function some_computation is only being used a few times, it doesn't really matter, but if this function is being called a large number of times, and the resulting value is dropped after being read once, the second version should be more efficient, given that heap memory is only being allocated once. So from a systems programming perspective, I can imagine why the second version is more common, and that is in fact what I've observed.

However, I come from a functional programming background, and I try to avoid functions with side effects as much as possible, which means I gravitate towards the first option. On the other hand, I would also like to avoid writing inefficient (and worse, non-idiomatic) code. What I'd like to know is whether there's some sort of convention/rule of thumb systems programmers have when it comes to deciding which version to write.

Also, I would also appreciate any other advice about idiomatic Rust code that programmers used to functional languages might not be aware of. Thanks!

One way of adapting from a functional mindset to a "rustic" mindset is to think of a function that takes &mut T as an alternate way of writing a function that takes a T and returns a T. Since &mut T is an borrowed exclusive reference, it has to be returned to the caller, and in the meantime, changes can't be observed by any other references. This keeps side effects much more manageable.

Some people have noted that the biggest problems with procedural programming do not come from mutation, nor from aliasing, but from the combination of both. Functional programming languages avoid this by strictly controlling mutation. Rust avoids it by strictly controlling aliasing.

7 Likes

There's a third option of returning an iterator impl Iterator<Item=u8> which doesn't require any heap allocation, and allows the caller to chain it with other iterators, extend some collection, etc.
However, not everything can be easily expressed as an iterator.

I'd suggest to write it in first, profile it, and optimize if it really matter. You should make it work, make it correct, and then make it fast. If it matter take it as &mut Vec<u8> would allows to reuse allocation and returns different sizes.

1 Like

As for your original question, I see it as a trade-off between convenience on one hand, and performance plus flexibility on the other.

Standard library functions almost always take &mut [u8]. This places more burden on callers, who are now responsible for allocating buffers, but (as you say) it improves performance for some use cases. It makes sense for the standard library to provide the flexible version as a low-level building block. The convenient version can be built on top of the flexible version if needed (but not vice-versa).

On the other hand, if you are not writing a library that needs to serve every possible use case, you can tailor your API to a particular work load. In application code, the convenient version might make more sense.

1 Like

Thanks! That makes a lot of sense, given that I was only reading library code, and not application code.

I don't come from a functional programming background but that always sounds like a good plan to me when I hear it.

However I don't consider that your second example does have side effects. Unless you want to be really anal about it. It's a function, it has input args and it has output in some vector. It is not touching anything else that is not specified in it's signature and it is not keeping any internal state. It will always do the same thing every time you call it.

The only difference is how the output comes out.

Which is what mbrubeck was saying, I think.

The "anal" part here, to my mind, is that you don't know which parts of that "buffer: &mut [u8]" output actually get mutated.

Which is something I was thinking about as I wrote code like this today:

let mut buf = [0; 1024];
...
something.read(&mut buf);
...
something_else.write(&buf);
...

What if that read() did not read the full length of buf?

What if that buf had some old data in it I have read previously?

Oops... I have just leaked some potentially sensitive info to the writer and caused a security risk.

The correct call to write() should be:

.write_all(&buf[0..n])

Where "n" is the number of bytes obtained by the read() call.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.