Padding a 2d vec so all columns are same length

Hi all,

I have a 2d array as a vector in a vector. The number of rows doesn't matter but what is in each row has to be the same length (ie equal columns). I'm using this to do some matrix math and the row holds all of the observations for one entity. Some rows will have missing observations and I would like to pad the start of those rows with zero's so that each row will have the same number of observations.

Here is an example working in which I'm creating a new Vec<Vec<_>> and iterating through the Vec a couple of times. Playground

I'm trying to see if there is a more "Rust" way to do this with iterators. Currently I have to create a vector that holds the number of missing columns and then I iterate through the original vector creating new rows with the padded zeros into a fresh Vec<Vec<_>>.

Could I do this in a way to modify the original data?
Is there a cleaner way to do the "for" loop as a chained iterator?

Vec has resize method for this, but that adds at the end. For Vec there's no efficient way of padding the front. VecDeque can pad front efficiently, but that's probably an overkill.

The thing you have is called "jagged array". It can't reliably guarantee that rows have equal lengths, and it's relatively inefficient for 2d data.

If you know the dimensions you're working with, you should use a single "1d" Vec that has width * height size, and address elements using [x + y * width]. For such vec, vec.chunks_exact(width) will give you an efficient iterator over rows.

3 Likes

Here's a function to create a chained iterator of the requested size.

/// Given a slice `row` and a default length `length`,
/// * Returns an iterator of length `length` exactly
/// * If `row` is at least as long as `length`, the first `length` items are returned
/// * If `row` is shorter than `length`, we first return some number of `T::default()`
fn row_of_length<T: Copy + Default>(row: &[T], length: usize) -> impl '_ + Iterator<Item=T> {
    // The number of elements needed to pad `row` up to length `length`
    let head_length = length.saturating_sub(row.len());
    // The number of elements to take from `row` so that we do not exceed length `length`
    let tail_length = length - head_length;

    // The two parts of the iterator
    let head = std::iter::repeat(T::default()).take(head_length);
    let tail = row.iter().copied().take(tail_length);

    // Combine them
    head.chain(tail)
}

Playground.

But as @kornel said, you may be better off with some sort of pre-flattened storage instead.

2 Likes

Funny thing is that right after I pad the columns, I create a flatten 1d Vec to do the math in ndarray crate. So I probably need to go back to the source of how the original 2d Vec is created to see if I can create it flat as a 1d array right from the start. I'll review the vec.chunks_exact(width) idea.

This is great. I may not use if for this purpose, if I can refactor to use 1d arrays to model 2d data. Maybe for now, until I refactor, this will be a good place holder. This helps me to understand how I can return an iterator from a function --- I didn't even think of the problem in this way. I may have other uses for this kind of thinking.... thanks.

Something I often end up writing in this sort of code is a bespoke Matrix type.

struct Matrix<T> {
  buffer: Box<[T]>,
  width: usize,
  height: usize,
}

From there, you can add constructors and getters and setters and whatever other domain-specific methods you need (e.g. an iterator over rows, padding a row, or some sort of View type which uses strides to work on a smaller section of the matrix).

impl<T> Matrix<T> {
  pub fn zeroed(width: usize, height: usize) -> Self {
    let len = width*height;
    let buffer = vec![0; len].into_boxed_slice();
    Matrix { buffer, width, height }
  }

  pub fn elements(&self) -> &[T] { &self.buffer }

  pub fn get(&self, row: usize, column: usize) -> &T {
    let index = self.index_of(row, column);
    &self.buffer[index]
  }

  pub fn set(&mut self, row: usize, column: usize, value: T) {
    let index = self.index_of(row, column);
    self.buffer[index] = value;
  }

  fn index_of(&self, row: usize, column: usize) -> usize {
    assert!(row < self.height);
    assert!(column < self.width);
    self.width * row + column
  }
}

(I'm not 100% sure my indexing math is correct, but you get the gist)

1 Like