Porting C++ code generator to Rust


#1

Hi,

I’m trying to write a Rust code generator, starting from the already existing C++ code generator.

This C++ code generator (for instance…) generates the following code, that is ultimately going to work on audio inputs/outputs non-interleaved buffers, coming from a “native” layer:

virtual void compute(int count, float** inputs, float** outputs) {
		float* input0 = inputs[0];
		float* input1 = inputs[1];
		float* output0 = outputs[0];
		float* output1 = outputs[1];
		for (int i = 0; (i < count); i = (i + 1)) {
			output0[i] = float(float(input0[i]));
			output1[i] = float(float(input1[i]));
		}
	}

Even if the inputs /outputs parameters will be finally connected to “native” audio buffers, I am thinking in generating correct Rust type for inputs/outputs parameters.

What would be the equivalent Rust type to be used for float** inputs, float** outputs parameters? I tried to play with inputs: &[&[f32], outputs: &mut[&mut[f32]] kind of type without much success up to now.

Should I use Vec<T> kind of type? Array size concept ? Existing types coded in libraries ?

Thanks for any advice.


#2

I’m not sure how much I am allowed to read into your generated example code, but it looks like you are just using pairs of Vectors/Slices.

fn compute(inputs: (&[f32], &[f32]), outputs: (&mut [f32], &mut [f32])) {
    // Check if the input is valid
    assert!(inputs.0.len() == inputs.1.len(), "Inputs need the same size.");
    assert!(outputs.0.len() == outputs.1.len(), "Outputs need the same size.");
    assert!(inputs.0.len() == outputs.0.len(), "Input and Output need same size.");

    for i in 0..inputs.0.len() {
        output.0[i] = float(float(input.0[i]));
        output.1[i] = float(float(input.1[i]));
    }
}

If you use more than two channels (but a fixed size, known at code generation time) you could just use a longer tuple. If your channels actually have a meaning, you could define a struct which gives them useful names. (But then, is anyone ever reading the generated code?)

More ideomatic (to me) would be fn compute (inputs: ..) -> (Vec<f32>, Vec<f32>) { .. } but since you already have buffers, you shouldn’t allocate new memory for your output.


#3

The float** type in C++ actually means “any number” of float* channels. This size of each channel (the float* type) can possibly change at runtime (this is the reason to use this "count "parameter…)

Actual data will be already allocated elsewhere (usually in the native unsafe domain).

We also know that “inputs” is read only (so this could have been code also in C++ type…) but will surely be helpful in Rust, and “outputs” is mutable.


#4

A float** is roughly a 2D matrix, so I would have thought &[&[f32]] should work for you.

You’ll probably find it will be a pain trying to get the inner types to line up if you are trying to pass in a Vec<Vec<f32>>, typically you’d fix this by using generics (e.g. fn compute<F>(inputs: &[F]) ... where F: AsRef<[f32]>). That way your function can accept a wide variety of input types like both Vec<Vec<f32>> and &[&[f32]].

playpen link

I’m not sure how much flexibility you have with the code generator, but it seems like that particular function would be much better suited to the functional style, especially seeing as this example is just begging begging for a map() and collect(), plus you can use rayon’s par_iter() for free parallelism. Usually in Rust you’ll prefer to take inputs as arguments and then return the results instead of using out params like you are there. I’d return a Vec because generally you can’t know at compile time how big your output buffer needs to be.

Your example is probably a vastly simplified version of the real thing but because it looks like you are only using the first two “rows” in the input and output arrays, you can probably change the function signature to take the two input rows as either a tuple, (&[f32], &[f32]), or a two-element array [&[f32]; 2]. That’ll give you extra bonuses like compile-time bounds checks.

Another thing is that even if this is in a tight loop performance should still be fairly decent since you are storing and accessing elements consecutively(-ish), which is the exact scenario the prefetcher is optimised for.


#5

Several things:

  • we use float** type in the C++ version of the generate code because it is the natural way yo represent “any number of float* channels”. But we actually precisely know the number of inputs/outputs channels at compile time, so yes we could generate (&[f32], &[f32]) when 2 channels are needed. But then we may generate quite different signature of “compute” depending of the compiled code… Or should we use Vec<&[f32]> to express “variable number of channels” ?

  • about rayon and data-parallelism library : even if the given example is easily data parallel, real code usually do not have this property. We usually generate recursive equations. But as you say, it may be interesting to explore other code generation layout more adapted to Rust model


#6

Would rust-ndarray fit your needs?


#7

In some situations it’s good to use &[&[f32]; N] where N is 2, or 3, …


#8

For outputs then we would need mutation, so &mut[&mut[f32]; 2], but then accessing individual outputs:

  let mut output0: &mut[f32] = outputs[0];
  let mut output1: &mut[f32] = outputs[1];

fails with this error:

106 |         let mut output0: &mut[f32] = outputs[0];
    |                                      ---------- first mutable borrow occurs here
107 |         let mut output1: &mut[f32] = outputs[1];
    |                                      ^^^^^^^^^^ second mutable borrow occurs here

#9

You can keep outputs[0] and outputs[1] in your code instead of using outputs0 and outputs1.

Otherwise you can use something like this, I don’t know if there are better solutions :slight_smile:

fn foo(outputs: &mut [&mut [f32]; 2]) {
    let (output0a, output1a) = outputs.split_at_mut(1);
    let mut output0: &mut [f32] = output0a.first_mut().unwrap();
    let mut output1: &mut [f32] = output1a.first_mut().unwrap();
    println!("{:?}", output0);
    println!("{:?}", output1);
}

fn main() {
    foo(&mut [&mut [1.0], &mut [2.0]]);
}

#10

It’s not nice looking, but LLVM is able to remove all the abstractions:

#[inline(never)]
fn foo_first(outputs: &mut [&mut [f32]; 2]) -> f32 {
    unsafe {
        *outputs.get_unchecked(0).get_unchecked(0) +
        *outputs.get_unchecked(1).get_unchecked(0)
    }
}

#[inline(never)]
fn foo_second(outputs: &mut [&mut [f32]; 2]) -> f32 {
    let (output0a, output1a) = outputs.split_at_mut(1);
    let mut output0: &mut [f32] = output0a.first_mut().unwrap();
    let mut output1: &mut [f32] = output1a.first_mut().unwrap();
    unsafe { *output0.get_unchecked(0) + *output1.get_unchecked(0) }
}

fn main() {
    println!("{}", foo_first(&mut [&mut [1.0], &mut [2.0]]));
    println!("{}", foo_first(&mut [&mut [2.0, 3.0], &mut [3.0, 4.0]]));

    println!("{}", foo_second(&mut [&mut [1.0], &mut [2.0]]));
    println!("{}", foo_second(&mut [&mut [2.0, 3.0], &mut [3.0, 4.0]]));
}

/*
_ZN6test6c9foo_first17h40dafab87ac561c8E:
    movq    (%rcx), %rax
    movq    16(%rcx), %rcx
    movss   (%rax), %xmm0
    addss   (%rcx), %xmm0
    retq

_ZN6test6c10foo_second17h42d3d659f3762e3bE:
    movq    (%rcx), %rax
    movq    16(%rcx), %rcx
    movss   (%rax), %xmm0
    addss   (%rcx), %xmm0
    retq
*/