Thoughts about using function input parameters also as outputs?

finlaydotb · December 23, 2022, 12:12pm

A bit of background. I never programmed in C or C++. I have programmed a great deal in Scala, TypeScript/JavaScript, a bit of Python and dabbled a bit in Haskell due to the Scala influence. And it was while using Haskell it was ingrained in me, that it is best to consider functions as mathematical functions (nothing religious here but best practice), where you have input and the only thing the function does is to compute an output.

And from a cognitive load point of view, I agree with this. It is easier when you know an arguments to a function is just an input, and if you want something out of the function, you get it via the return value.

But I see code in Rust, especially the ones that has to deal with reading data, where a buffer variable is passed into a function, and the effect of that function is to populate this buffer variable.

Sometimes this same function will also have a return value!

I later also saw this seems to be a common practice in C too.

I personally find this to be confusing...or maybe I am looking at it from the wrong perspective?

Why i such pattern used instead of it to be considered a bad practice? And are there specific cases where it makes sense the most?

stonerfish · December 23, 2022, 12:30pm

If you pass a pointer to a buffer into a function and inside that function modify the buffer then you only need to pass a pointer into the function.
If you pass the whole buffer into the function then read back the buffer from the function at the end, you are potentially passing a large chunk of buffer memory around.

VorfeedCanal · December 23, 2022, 12:37pm

Because Rust is practical language and it's goal is not to explore theoretical possibilities of what's possible but to deliver software which people can use to do real work.

Allocation of memory for buffer is expensive thus it's better to reuse it. It's as simple as that.

steffahn · December 23, 2022, 12:38pm

My personal view of this is that, in a “mathematical function” interpretation sense, a &mut T parameter to a function ought to be interpreted as passing a T in and getting a new T out. And in practice, it is almost equivalent, you can adapt either way, one of the adapters requiring a helper function as defined e.g. by the replace_with crate:

fn foo<T>(r: &mut T) { todo!() }

fn foo_adapted<T>(x: T) -> T {
    let mut y = x;
    foo(&mut y);
    y
}

fn bar<T>(x: T) -> T { todo!() }

fn bar_adapted<T>(r: &mut T) {
    replace_with::replace_with_or_abort(r, bar)
}

Rust Explorer

The only actual difference between the two signatures is the behavior when panicking. The &mut T will leave the T intact (though possibly in a weird / unexpected state) whilst in the T -> T version, the T would be dropped on panic. Since panics are often not caught (or only caught on a much higher level where the T would have been dropped either way), this distinction commonly doesn’t matter all that much.

Of course, &mut T return types are a bit harder to assign mathematical meaning [in terms of “pure” mathematical functions] to (though I’m certain someone has thought about how that might be possible). In fact, I believe that one should consider all rust functions that only do mutation through &mut T reference (or to local variables) as pure functions without side-effects. In my view, side-effects only come into play once IO is involved, or once “interior mutability” primitives are used.

Note that the behavior on panicking is really the only semantic difference for fn(&mut T) vs fn(T) -> T in Rust. This is unlike other languages like C++. It’s essential that &mut T promises exclusive access to its target which allows us to interpret it as a side-effect-free operation. We don’t modify anything that anyone else could observe before the function returns, and this exclusive access is proven by static analysis… so, long story short, reasoning about &mut T parameters in Rust is just as easy as reasoning about passing around immutable values. In languages like C++, passing a pointer or reference will however always immediately introduce at least the possibility of shared mutable access, side-effects affecting far-away parts of out program, it only stays easy to reason above if you yourself figure out that there was unique (un-shared) mutable access, but you’ll need to analyze your program yourself to come to that conclusion, and the compiler won’t help you. Which is a shame, because in practice, you quite often do have exclusive access so reasoning about programs becomes harder unnecessarily in such cases, just by the fact that the compiler doesn’t give you any certainty about whether or not your in the “easy to reason about” case. Maybe writing good comments in such cases can rectify the situation, though that’s then more effort on the writing the program part, though probably it’s worth it.

The case of a &mut … argument being a buffer to be filled fits the easy case to interpret case of &mut T being a parameter, not a return type, so the question that arises is: Why not pass a buffer in, and return it back modified? And maybe the first question before that, why pass in an empty buffer in the first place? The last question is easily answered: Performance reasons. By passing in an existing buffer, one can possibly use existing capacity in the buffer (e.g. if it’s a String or a Vec<T>) and avoid re-allocations, particularly in case the buffer is cleared and re-used for multiple calls.

Why not pass it in and back out? I’d say there are four aspects I can come up with:

moving has some small overhead, too; &mut T is simply (slightly) more efficient than passing in and back out an owned T
convenience: mutable references are convenient to work with, especially when your functions are written in an imperative style and local variables shall be mutated. Then it’s convenient to be able to – say – call v.push(…) on your v: Vec<T> variable instead of needing some kind of v = v.push(…)
generality: there are the two alternatives T -> T and &mut T in the language; with that as a given, as you can witness above, it’s more straightforward (especially if you don’t like the possibility of aborting your program) to apply a fn(&mut T) in a situation where you need a T -> T transform than the other way. So API tends to use &mut T to be more generally useful
why have &mut T at all? As noted above, &mut T return types give functionality that’s hard to express differently. As far as I’m aware, in Haskell the lens package is able to provide somewhat comparable functionality to something like a &mut S -> &mut T function in Rust (and – admittedly – lots of other functionality that mutable references in Rust don’t provide), but that package is infamously hard to understand, so I believe the capabilities that mutable references are useful and easy to understand, so I’m glad we have them and use them. Examples for &mut S -> &mut T are e.g. accessor functions to private fields. Or e.g. array indexing (where an additional index argument is involved, too). I suppose, the simplests equivalent of the fn(&mut Vec<T>, usize) -> &mut T indexing of Rust for a hypothetical Vec type in Haskell would need to look like Int -> (t -> t) -> Vec t -> Vec t? Ah wait, no, that one doesn't allow us to just read a value at that index, unlike the Rust function… guess I’ll have to re-read a lenses tutorial to freshen up on how properly offer such an abstraction in Haskell….

There is certainly more aspects to how &mut T is useful, but I don’t have time to think them through entirely right now. One thing that sometimes comes up, and in particular often with the case of “buffers” you mentioned, is a parameter type like &mut [u8], which is a reference to an unsized type, so [u8] -> [u8] by-value is not even an option.

H2CO3 · December 23, 2022, 12:51pm

This is done for performance. If such a function always had to return a new allocation (eg. a Vec by value), then it would have to allocate on each call. Passing a buffer allows the caller to reuse the buffer across calls.

finlaydotb · December 23, 2022, 1:05pm

I think that captures it. Makes sense then.

m51 · December 23, 2022, 1:56pm

I've dabbled with Haskell as well. I loved the mathematical purity, but the reality is that not all problems can be efficiently modelled that way. FP/Immutability is a tool, but it isn't right for all jobs.

Trying to mimic mutable data structures via immutable structures leads to jumping through increasingly complex hoops and increasing performance penalties. A common requirement for a virtual machine is a large mutable array of memory. Modelling that in a purely functional immutable manner...yuck.

quinedot · December 23, 2022, 7:59pm

C has some type limitations about what is returned, or cultural limitations about what is idiomatic to return.

For instance you'll also see "out pointers", where you have something like a

int get_blob(blob ** ptr)

where the function allocates a new blob and alters *ptr to point at it and returns 0, or returns an error code. In Rust you would likely instead see

fn get_blob() -> Result<Blob, SomeErrorType>
// Or maybe `Option<Blob>`

but C doesn't have sum types.

You might also see wonky C alternatives to

// Rust code
fn get_a_couple_thing(&self) -> (Foo, Bar) { /* ... */ }

struct InfrequentlyUsed(This, That);
fn get_things_another_way(&self) -> InfrequentlyUsed { /* ... */ }

due to the lack of tuple types and a lower reluctance to create new types in Rust.

afetisov · December 24, 2022, 12:06am

There are major benefits from having a function behave like, well, a function, and return all output as its return value. You know that if you got the return value, then the function call didn't unwind and has successfully reached the end of execution. When reading a function's body, you know that you don't need to track all possible modifications of the input parameters, you just need to find all exit point to know what the function may return.

But it's also not practical to use in all circumstances. It kinda sorta works for Haskell, because Haskell doesn't care about efficiency, or access to low-level details, and because pure functions are all you have anyway. There is no point in religiously pursuing the same design in Rust: global effects are a way of life (including static variables, memory allocations, I/O etc), the &mut T references are a much better way to deal with mutable state than Haskell's StateT monad (which is just mutable state with extra steps), efficiency concerns mean that you can't always rely on the optimizer to do its magic and remove your tower of wrapping functions, you need to be able to directly encode the behaviour that you require.

Specifically for mutable output buffers, like in Read, it's not just a matter of improved efficiency. It may be literally impossible to provide a different API, because you may not own the buffer in the first place. Perhaps your function itself took it as an out parameter. Perhaps you got the buffer via FFI, so you can write to it, but you don't own it, and thus can't wrap it in a Vec or return a new buffer from the function. Perhaps the buffer is created by some custom hardened memory allocator, which is incompatible with Vec. Perhaps you are writing into the buffer self-referential types, which must never be default-moved after creation (otherwise their self-references would become invalidated), so you can't just put them into a Vec which can too easily reallocate its buffer. All of these concerns mean that out parameters are sometimes literally the only way to do what you want to do.

More generally, Rust is built around mutable references, and every &mut T is, in a certain sense, an out-parameter, because it is expected to get something new there at the end of the function. In that sense the design of Read::read isn't special in any way. The only difference between true out-parameters and general &mut T parameters is that out-parameters are expected to be passed in some dummy, likely unspecified state, which is not expected to be read, only overwritten with the real output. That is just a limitation of current Rust, and in the future there could be more direct ways to express the same logic.

There is a bit of a middle ground, called placement by return (placement new). Its goal is to allow placing the output value of the function directly at the specified memory location, thus allowing to avoid explicit out-parameters. It doesn't apply in all cases, and the most recent proposal has many outstanding issues, putting it in a bit of a limbo.

Overall, no, this pattern isn't bad practice, but it shouldn't be your first choice either. It does make your function a bit harder to use and reason about. Try to follow Haskell-s functional design principles as much as reasonable, but remember that Rust is a different language, which has different tools and idioms. Sometimes significantly better ones (like &mut T), and sometimes just different ones (like HKT vs GAT).

system · March 24, 2023, 12:06am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Questions about idiomatic memory management help	7	606	September 27, 2020
How to use a function that returns and modifies its input value help	12	1514	January 12, 2023
[Question/Implementation] Avoid copying memory on an operation over a vector and returning a slice: code review	2	424	August 17, 2020
Best practice for wrapping an unsafe function that fills dynamically-sized, uninitialized memory? help	10	1210	January 12, 2023
Use the same variable in multiple function parameters help	33	1012	January 5, 2024

Thoughts about using function input parameters also as outputs?

Related topics