I'm not doing that, their sample code is. It's verbatim from the linked docs. If that's the sample they offer for processing a file, it's a bit weak.
And there is no way to get the number of rows in the group from a RowGroupReader.
If your inference is correct, it would seem something like this is what's needed:
row_group.get_row_iter(None).unwrap().for_each(|record| {
let row: parquet::record::Row = record.unwrap();
let record = ACompleteRecord::from(row);
samples.push(record);
});
Just pass a large number, more than the maximum records you would expect, but not so large you would use too much memory, for num_records. It's just a safeguard.
It turns out this doesn't work, because if you pass a number greater than the number of rows in the file, you get this:
thread 'parquet_reader::test_read_records' panicked at services/src/parquet_reader.rs:10:10:
index out of bounds: the len is 66945 but the index is 66945
And, of course, there is no way to get the row count without iterating first.
I think I'm just going to go back to the plain RowIter API:
while let Some(record) = row_iter.next() {
println!("{}", format_row(&record.unwrap(), &delimiter));
}