API design for expensive allocations

I'm designing a library API for which I get the following problem very frequently: There is an expensive
allocation to be made by the function, but I don't want to always require the user to pass a mutable borrow, since this is error-prone (maybe the data has the wrong size), and the operation is not otherwise fallible, so I don't want to return a result only because of this kind of error (this should cause a panic).

By allocating the data myself, the user doesn't run into the risk of causing a panic, so I have to offer an
allocating version. But the user might need to call the function inside a loop, so allocating every time also
isn't an option, not only because of the cost of calling the system allocator, but because I might need to execute some initialization of the data state.

My question is, if you were using the library, which of the following patterns would you prefer to use?

pub struct ExpensiveData(Vec<u8>);

Pattern 1 Have two functions and let the user choose between them:

fn compute(arg : i32) -> ExpensiveData;
fn compute_to(data : &mut ExpensiveData, arg : i32);

Pattern 2 Have a single function, and let the user decide by moving an Option into the function
(allocate internally if argument is None)

fn compute(data : Option<ExpensiveData>, arg : i32) -> ExpensiveData;

Call site would look like this (no allocation)

data = compute(Some(data), 1);

or like this (internal allocation):

let data = compute(None, 1);

Pattern 3 Have a single function, and let the user decide by moving a mutable borrow
(allocate internally if argument is None; modify argument and return nothing if argument is Some(data)):

fn compute(data : Option<&mut ExpensiveData>, arg : i32) -> Option<ExpensiveData>;

Call site would look like this (no allocation):

let data = compute(None, 1).unwrap();

or like this (no allocation):

compute(Some(&mut data), 1);

Pattern 1 seems fine to me. Rust uses it in the Clone trait.

Pattern 3 is odd. Returning Cow could make it a bit more useful.

And of course use these patterns only if you have to allocate the data all at once. Otherwise an Iterator gives users freedom how to allocate the data.

4 Likes

I don't think returning an iterator would help unless we can split ExpensiveData up into smaller pieces. Otherwise you run into the same problem std::io::Lines has where it's allocating a new String for every line.

There's also a variation of pattern 1 where you require the &mut to compute but have to compute upon creation. Something like

impl ExpensiveData {
    pub fn compute(&mut self, input: i32) { /* ... */ }
    pub fn new() -> Self { Self::default() }
    pub fn new_with(input: i32) -> Self {
        let mut this = Self::new();
        this.compute(input);
        this
    }
}

That way if I only want to compute in a loop, I don't have to special-case the first iteration.

This may not be a good fit if you need to support other operations that are only valid after compute.

You make a good point about the standard library using (1). But I'm using it a lot, to the point where I'm questioning if there isn't a better solution (using it everywhere means duplicating or cross-referencing documentation to the same functionality all the time).

The motivation for (3) is that I'd like to pass the argument inside an Option as in (2), but here the user isn't required to bind the moved variable to the same name after the function is called. By using a Cow instead, do you mean something like:

fn compute(data : Option<&mut ExpensiveData>, arg : i32) -> Cow<ExpensiveData>;

// Call site
compute(Some(data), 1);
let out = compute(None, 1).to_owned();

That makes sense. The user could simply ignore the returned value if it is passed as the &mut argument, and using to_owned() beats using unwrap() (No need to consider a panic). I'll consider it.

In my case I'm working with a fixed-size dynamically-allocated buffer, so the iterator API doesn't apply.

This looks ok to guarantee the desired operation is executed on initialization. My problem is slightly different: I would like to implement a function to calculate a value, that maybe will trigger an allocation and initialize it, maybe it will not. Similar to borrow::Cow, but for a Owned vs. mutable borrow instead of Owned vs. immutable borrow.

Why not implement your own CowMut, then? We can add From implementations which let you convert a &mut T to a borrowed CowMut and a T into an owned variant.

fn compute<'a>(arg: i32, data: impl Into<CowMut<'a, ExpensiveData>>) -> CowMut<'a, ExpensiveData>
 {
  let maybe_borrowed: CowMut<ExpensiveData> = data.into();
  let data: &mut ExpensiveData = &mut *maybe_borrowed;
  ...

  maybe_borrowed
}

fn main() {
  let mut allocated: ExpensiveData = compute(42, ExpensiveData::new());
  let borrowed = compute(42, &mut allocated);
}

enum CowMut<'a, T> {
  Borrowed(&'a mut T),
  Owned(T),
}

impl<'a, T> From<&'a mut T> for CowMut<'a, T> {
  fn from(borrowed: &'a mut T) -> Self { CowMut::Borrowed(borrowed) }
}

impl<T> From<T> for CowMut<'_, T> {
  fn from(owned: T) -> Self { CowMut::Owned(owned) }
}

impl<'a, T> Deref for CowMut<'a, T> {
  type Target = T;
  fn deref(&self) -> &T { ... }
}

impl<'a, T> DerefMut for CowMut<'a, T> { ... }

Looks like another option!

Now that I think of it, I could simply do:

fn compute(a : i32, out : Option<&mut ExpensiveData>)->ExpensiveData;

When the user passes None, allocate. When the user passes Some(&mut data), mutate it and return a default implementation for ExpensiveData that hasn't allocated yet (similar to Vec::default()). The user can then just ignore the returned value in this case. Would this be considered an common/acceptable way to solve this?

I'm really not a fan of this flow because it's just asking for people to mess things up. There's no way to know this behaviour by just looking at the function signature so people will be surprised when passing in an existing ExpensiveData gives them a useless result (and who properly reads the docs for every function before they call them, anyway?).

It'd be much better to split it into two functions.

fn compute(a: i32) -> ExpensiveData  {
  let mut data = ExpensiveData::default();
  compute_inplace(a, &mut data);
  data
}

fn compute_inplace(a: i32, data: &mut ExpensiveData) { 
  ...
}
1 Like

True. Splitting into two functions could to be the most intuitive way for users to understand what is going on. Right now, I'm considering using a mix of this option with Option(2) (moving the value into an Option argument and re-binding the return value to the same name when required). I'm not worried about the cost of the move, since the stack-allocated data of the struct is lightweight. But I haven't seen this one used a lot in Rust. To me, it seems to be an acceptable to minimize duplication of functionality, at the price of the small annoyance of having to re-bind the returned value to the same name as the passed argument.