Advice on API for functions taking/returning matrices?


#1

I’m working on a module that wraps a C machine learning library, but keep going back and forth between a few options in the API design.

Main thing I’m undecided on is how to pass/return matrices to/from functions. Under the hood, they all get passed to C as a flat array plus the number of rows/columns they contain, but there seem to be a bunch of different approaches that I could take on the Rust side of things.

One obvious option would be to use ndarray, e.g.:

fn transform(matrix: ndarray::Array2<f32>) -> ndarray::Array2<f32> { ... }

It’s very convenient to use, has lots of support for common array/matrix operations (and nice slicing macros etc.).
Main worry is that it forces users to use ndarray, and I know there are several alternatives out there (nalgebra, rulinalg, etc.) which might be preferred.

Another alternative would be to stick to the Rust standard libraries, e.g.:

fn transform(data: &[u32], num_rows: usize, num_cols: usize) -> (Vec<f32>, usize, usize) { ... }

Which has the advantage of not tying users to a particular 3rd party linalg module, is closer to how the underlying C library takes/returns matrices, and might be easier to use for those not familiar with ndarray.

A 3rd alternative I’d considered was to define my own matrix structs (in a similar way to e.g. rusty_machine does things), which might look like e.g.:

fn transform(matrix: &MyMatrix) -> MyMatrix { ... }

One upside of this approach is that is that the same struct could be used to represent both dense/sparse matrices (ndarray doesn’t have sparse support), and would be easy enough to construct from a variety of sources.
Downside is that it’s yet another thing for users to learn.

Any suggestions on which approach might be preferred? Any alternatives I’ve not considered?


#2

Another possible approach is to define a Matrix trait that contains all the functions required by your code, then make your function accept and/or return any type that implements this trait:

fn transform<M: Matrix>(data: &M) -> M { ... }

You can then implement your Matrix trait for each trird-party matrix algebra type you want to support. You can declare third-party crates as optional dependencies to allow users specify which crates they want to use and avoid unnecessary dependencies. If the users want to use another matrix crate that you didn’t know of, they can create a wrapper type over a third-party matrix type and implement your matrix trait for it.

However, this approach works best if there is a common trait that everyone else is already using (ideally, in the standard library). Otherwise it’s easy to have a situation where the required trait is not implemented by the type the user want to use. Some crates (e.g. rulinalg) already contain a matrix trait but they only implement it for their own types.

I think it’s also fine to use any particular third-party matrix type in your crate, as long as it has what you need. Supporting different dependencies is harder and not always worth the effort. If you want to have as few dependencies as possible, you can go with your own matrix struct. And structs are better than enums because structs have named fields.


#3
fn transform<M: Matrix>(data: &M) -> M { ... }

This signature is a pipe dream. The number of different ways in which even the simplest piece of mathematical code may want to use matrices is astounding, and none of this is helped by the fact that a great deal of common patterns of “viewing” a matrix cannot be represented by &M.

Imagine living in a world without AsRef, Borrow, Cow, reborrowing of mutable references, and ToOwned. Matrices are often composed of smaller matrices arranged side-by-side, or straddled in alternating rows; your choices are to (a) do lots of difficult index transformations in iterative code (not recommended), (b) write helper functions to replace those index transformations with expensive and repeated clones, or (c) to define a family of types with varying modes of ownership so that you can factor out the index logic without excessive cloning.

Thankfully (c) has been done for you by ndarray, but ndarray is often a more powerful tool than you need, especially if your goal is mostly just wrapping an ffi function. I also do not think it is nice to force upon users of your crate.

I suppose that if I ever were to use a Matrix trait, it’d be nothing more than glue code for a raw conversion into my own preferred family of Matrix types.

/// Trait for viewing a matrix stored in a dense format with arbitrary strides.
pub trait Strided2d {
    type Scalar;

    fn dims(&self) -> (usize, usize);
    fn strides(&self) -> (usize, usize);
    fn data(&self) -> *const Self::Scalar;

    // provided
    fn as_matrix(&self) -> MatrixRef<'_, Self::Scalar> { ... }
}

pub trait Strided2dMut {
    fn data_mut(&mut self) -> *mut Self::Scalar;
    fn as_matrix_mut(&mut self) -> MatrixMut<'_, Self::Scalar> { ... }
}

pub trait FromCData<T> {
    /// Construct from owned, c-order contiguous data.
    fn from_c_data(width: usize, data: Box<[T]>) -> Self;
}

And if I was only just wrapping a C API, I’d probably reduce the traits further to just ask for a slice of C-contiguous data and a width.

My code may or may not internally use ndarray; but I wouldn’t put ndarray in the public signature or implement the trait on its types, as that would make it a public dependency, down to the specific version I use.


#4

Thanks for the suggestions and different viewpoints.

That’s an excellent point about adding ndarray to the public signature. I’m definitely veering towards keeping things simple and close to the C API for now, getting C contiguous data is easy enough from any number of different data structures, without limiting the API to any specific one.