I am writing a struct SparseMat<T: Numeric>
to store some large sparse matrix data (more than 10,000 rows and more than 10,0000 nonzero entries). SparseMat<T>
should allocate data on the heap, and will be initialized with no any data, meaning that the matrix is all zeros. Then I need to addassign some nonzero values to some entries of this matrix. Finally, I need to convert this matrix to CSR format like this.
I have come up with two different implementations of SparseMat<T>
. The first one is close to CSR format, storing data as Vec<BTreeMap<usize, T>>
. A nonzero element at index (i, j) can be indexed as self.data[i].get(&j)
.
// First implementation
pub struct SparseMat<T: Numeric> {
// A row-major sparse matrix that is easy to add values, and can convert to CSR format when fixed.
pub nrow: usize,
pub ncol: usize,
pub data: Vec<BTreeMap<usize, T>>,
}
impl<T: Numeric> SparseMat<T>
{
#[inline]
pub fn new( nrow: usize, ncol: usize ) -> Self {
let data = vec![BTreeMap::<usize, T>::new(); nrow];
Self { nrow, ncol, data }
}
#[inline]
pub fn addassign_at( &mut self, i: usize, j: usize, val: T ) {
match self.data[i].entry(j) {
Entry::Occupied(mut entry) => {*entry.get_mut() += val;},
Entry::Vacant(entry) => {entry.insert(val);},
}
}
}
The second one stores data as BTreeMap<RowMatIdx, T>
, where RowMatIdx is simply a struct of (i, j) that is ordered (impl Ord trait
) by i first then j. A nonzero element at index (i, j) can be indexed as self.data.get(&RowMatIdx{i,j})
.
// Second implementation
pub struct SparseMat<T: Numeric> {
// A row-major sparse matrix that is easy to add values, and can convert to CSR format when fixed.
pub nrow: usize,
pub ncol: usize,
pub data: BTreeMap<RowMatIdx, T>,
}
impl<T: Numeric> SparseMat<T>
{
#[inline]
pub fn new( nrow: usize, ncol: usize ) -> Self {
let data = BTreeMap::<RowMatIdx, T>::new();
Self { nrow, ncol, data }
}
#[inline]
pub fn addassign_at( &mut self, i: usize, j: usize, val: T ) {
match self.data.entry(RowMatIdx{i, j}) {
Entry::Occupied(mut entry) => {*entry.get_mut() += val;},
Entry::Vacant(entry) => {entry.insert(val);},
}
}
}
pub struct RowMatIdx {
i: usize,
j: usize,
}
impl Ord for RowMatIdx
{
#[inline]
fn cmp( &self, other: &Self ) -> Ordering {
match self.i.cmp(&other.i) {
Ordering::Less => Ordering::Less,
Ordering::Greater => Ordering::Greater,
Ordering::Equal => self.j.cmp(&other.j),
}
}
}
Comparing these two implementations, the first one makes it easy to iterate all the nonzero elements in a certain row, while the second one uses flattened rather than nested data structure (that may be more efficient?). Both has the advantage of convenient insertion, and can ensure the inserted nonzero entries are always ordered first by row then by column. Which one do you think is better, or are thery any better solutions? I have heard that too many (is this case, more than 10,000) nested heap allocations like Vec<Vec<T>>
, Vec<String>
, Vec<BtreeMap>
may be not efficient, is that true?