API design: clean versus performant?

Hello,
I'm wondering how to design an internal API.
I have a function that creates data points and I have an object that keeps summary statistics based on data points. The clean way to implement this would be something like this:

struct Statistics { 
  n_data_points: usize,
  sums: Vec<f64>,
  /* etc */
}
struct DataPoint {
  values: Vec<f64>
}
impl Statistics {
  fn add_data_point(&mut self, data_point: &DataPoint) { /* ... */ }
}
fn create_data_point() -> DataPoint { /* ... */}

and then use like this:

statistics.add_data_point(create_data_point())

But, I suspect it may be more performant to skip the creation of a DataPoint and directly write to the Statistics, i.e. instead of using create_data_point and add_data_point, implement a single function that creates the data and adds it to the Statistics:

fn create_and_record_data_point(statistics: &mut Statistics)

I'm wondering how much of a difference this might make? Or is the compiler smart enough to optimize it such that it doesn't make a difference?
Thanks!

Well, you're (under the newtype) passing a &Vec<f64> here, which is a code smell, as &[f64] is strictly more general and more performant as a function argument.

So I think there's a big question about what it means to be a data point. Could you have an owned data point and a reference data point? Could add_data_point just take something like impl Iterator<Item = f64> instead of a specific type so people can pass in data that comes from all kinds of different places?

The compiler typically can't optimize away allocations. Sometimes it can, but if you're worried about perf it's generally worth trying to avoid them in some way.

But it's repeated allocations and deallocations that cause the issue. Exactly as you have it might be completely fine if the caller of add_data_point is calling it in a loop so it can use the same Vec for every call.

Alternatively, you could consider something SoA-style where you write statistics for a single f64, and one could use a Vec<Statistics> if there's multiple things that need to have statistics, rather than the current approach of pushing the Vec into the type.

Another question to ask is: what value is DataPoint providing to your API? What confusions does it prevent or convenience does it provide, over just passing &[f64] in the relevant locations?

There are ways to make this work without imposing an allocation, but first ask if you need it at all.

Thanks for the response!
When you say allocations, you mean heap allocations, right? So, for example, if a function returns something like

struct ABCDE { a: f64, b: f64, c: f64, d: f64, e: f64 }

might it inline the function and optimize that struct way?
In my real code, a DataPoint contains more than a Vec<f64>, so I can't really replace it by [f64]. Well, technically I could, but it would be a bit of an ugly hack.
It would be nice to have an object DataPoint that contains all values related to the same event and is immutable, because it is easy to understand and prevents accidentally mixing values from different events.
An Iterator per se wouldn't work well, either, but it points to a possible solution: provide an object DataPointGen that lazily calculates the needed values.
The other solution would be a DataPointBuf that is mutable.
Thanks!

My real DataPoint is more than just a Vec<f64>. Let's assume it is

struct DataPoint { a: f64, b: Vec<f64> }

Creating a new DataPoint value for every event would be the most simple and easy to understand and ensure that data relating to a single event is always complete and not mixed with data from other events.

One option is

struct DataPoint<'a> {
    a: f64,
    b: &'a [f64],
}

This way, there are no extra allocations. The cost is that a DataPoint has to be treated like a reference — it is hard to keep around. If more flexibility than that is required, then

struct DataPoint<B> {
    a: f64,
    b: B,
}

impl<B> DataPoint<B>
where
    B: Deref<Target = [f64]>,
{ ... }

This allows DataPoint to contain Vec<f64> or &[f64]. You will see this pattern often in libraries that add an interpretation to bulk numeric data, like multidimensional arrays and images.

3 Likes

Right. Optimizers do a thing called Scalar Replacement of Aggregates (SRoA) that will remove structs entirely where possible. That's a critical part of what makes iterators so fast, for example.

Thus there's a world of difference between

#[derive(Copy, Clone)]
struct DataPoint { category: CategoryId, value1: f64, other: f64, whatever: f64 }

which one generally treats like it's free, and

// can't be `Copy`
struct DataPoint { category: CategoryId, values: Vec<f64> }

where the costs of allocating and freeing the memory in the Vec are extremely important if it's used heavily.

2 Likes

Note that you shouldn't take from this that “structs are expensive but some can be removed”. Structs are just names for arrangements of data in memory; they cost nothing more to create or use than their contents do. (So, for example, it's perfectly fine to have a complex struct containing many more structs to organize its contents; modifying your program to use fewer structs will not usually make it faster.)

SRoA, specifically, means (if I understand correctly) the idea that the compiler doesn't even have to keep the specific memory layout of the struct around, if nothing about the program requires it. That is, the contents of a struct that's stored in a variable can be rearranged in all the ways multiple local variables can be rearranged — including putting them in registers or deleting them entirely if superfluous.

Don't worry about extra structs. Do worry about extra allocations.

3 Likes

Right! But some one has to own the data, so I'd still need a Vec<f64> somewhere, and if I want to avoid a new allocation, then I'd have to recycle it. Otherwise, it's fine it being a reference, since I'm not planning to keep it around for long.

Yes, thanks!

With B: ?Sized, you also have the option of DataPoint<[f64]> (note the absence of &). The nice thing about that is &DataPoint<[f64]> no longer has double indirection. The main drawback is that you may have a hard time figuring out how to actually create one; it requires some nuanced unsafe unless you always start from an array of compile-time-known size.

Another thought: fn borrow(&self) -> DataPoint<&[f64]> can be easily implemented generically for all the various flavors of DataPoint and might make them more convenient and uniform to work with.