Read a string attribute using hdf5-rust

I've got a HDF5 file with the following structure (viewed with h5dump):

❯ h5dump -n GMTCO_npp_d20181005_t2022358_e2024003_b35959_c20181008035331888329_cspp_dev.h5
HDF5 "GMTCO_npp_d20181005_t2022358_e2024003_b35959_c20181008035331888329_cspp_dev.h5" {
FILE_CONTENTS {
 group      /
 group      /All_Data
 group      /All_Data/VIIRS-MOD-GEO-TC_All
 dataset    /All_Data/VIIRS-MOD-GEO-TC_All/Height
 dataset    /All_Data/VIIRS-MOD-GEO-TC_All/Latitude
 dataset    /All_Data/VIIRS-MOD-GEO-TC_All/Longitude
 ...
 group      /Data_Products
 group      /Data_Products/VIIRS-MOD-GEO-TC
 dataset    /Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Aggr
 dataset    /Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Gran_0
 }
}

I am interested in using the hdf5-rust crate to read string attributes of both the root group /, and of the dataset /Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Gran_0. The signature of the dataset attribute is

ATTRIBUTE "N_Granule_ID" {
   DATATYPE  H5T_STRING {
      STRSIZE 16;
      STRPAD H5T_STR_NULLTERM;
      CSET H5T_CSET_ASCII;
      CTYPE H5T_C_S1;
   }
   DATASPACE  SIMPLE { ( 1, 1 ) / ( 1, 1 ) }
   DATA {
   (0,0): "NPP002194429582"
   }
}

I tried the following...

use anyhow::{Ok, Result};
use hdf5::File;
use ndarray::{Array, Array2};
use hdf5::types::VarLenUnicode;

fn main() -> Result<()> {

    filename = "GMTCO_npp_d20181005_t2022358_e2024003_b35959_c20181008035331888329_cspp_dev.h5".to_string();
    let file = File::open(filename)?;

    let dataset = file.dataset("Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Gran_0")?;
    let attribute = dataset.attr("N_Granule_ID")?;
    let datatype = attribute.dtype()?;
    let dims = attribute.ndim();

    let v_reader = attribute.as_reader();
    let v = v_reader.read::<VarLenUnicode, ndarray::Dim<[usize; 2]>>()?;

    Ok(())
}

at which the .read() method returns Error: no conversion paths found. I get the same error if I use

let v = attribute.read_2d::<VarLenUnicode>()?

or

let v = attribute.read_2d::<FixedUnicode<16_usize>>()?;

Looking through the hdf5-rust examples and tests, I haven't been able to find any examples of reading a non-scalar string attribute with anything like a hl interface. There was a previous topic about reading string attributes (Add string attribute using hdf5-rust), but I haven't been able to glean enough information from it to solve my problem, other than it looks like group and dataset attributes need to be handled separately.

Did you try setting Conversion::Hard? From what I understand that should not be necessary but might be worth a shot.

EDIT:
Did you try the FixedAscii data type?

Thanks for your reply. Yes, I tried FixedAscii, same result. Can you provide more details about "setting Conversion::Hard", settings in what?

As far as I understand the hdf5 implementation defines hard, soft and none conversion levels. With Reader::conversion you can pass which conversion the library can do. The default is soft as far as I see.

What is the output of println!("{:?}", attribute.dtype()) and println!("{:?}", attribute.dtype()?.to_descriptor())?

Thanks, from attribute.dtype().unwrap().to_descriptor() I got Ok(FixedAscii(16)), which is neat. I think I was originally using FixedAscii wrong. I got the same-ish tip over at the hdf5-rust GitHub, which helped me solve my problem (which I'm posting below)

I got a tip from the one of the hdf5-rust contributors that I should be using FixedAscii<size>. For an attribute attached to the root group

let root_attr = file.attr("Mission_Name")?;

I did

let v_reader = root_attr.as_reader();
let v = v_reader.read::<FixedAscii<4>, ndarray::Dim<[usize; 2]>>()?;
println!("\tv = {:?}", v);

or alternatively

let v = root_attr.read_2d::<FixedAscii<4>>()?;
println!("\tv = {:?}", v);

and they both gave the result

v = [["NPP"]], shape=[1, 1], strides=[1, 1], layout=CFcf (0xf), const ndim=2

and I got to the attribute payload with

if let Some(x) = v.first() {
    print!("\tx = {:?}", x.to_string());
}

which is what I was after. For the dataset attribute referenced in the original question I used

let v = attribute.read_2d::<FixedAscii<16>>()?;
println!("\tv = {:?}", v);

giving

v = [["NPP002194429582"]], shape=[1, 1], strides=[1, 1], layout=CFcf (0xf), const ndim=2

Luckily the attributes I am interested in have fixed sizes which I know ahead of time.

I was also able to read in a "vector" string attribute (something like a list of filenames), with the signature

ATTRIBUTE "N_Anc_Filename" {
   DATATYPE  H5T_STRING {
      STRSIZE 104;
      STRPAD H5T_STR_NULLTERM;
      CSET H5T_CSET_ASCII;
      CTYPE H5T_C_S1;
   }
   DATASPACE  SIMPLE { ( 15, 1 ) / ( 15, 1 ) }
   DATA {
   (0,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0744_1.O.0.0",
   (1,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0745_1.O.0.0",
   (2,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0746_1.O.0.0",
   (3,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0776_1.O.0.0",
   (4,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0777_1.O.0.0",
   (5,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0778_1.O.0.0",
   (6,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0779_1.O.0.0",
   (7,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0780_1.O.0.0",
   (8,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0781_1.O.0.0",
   (9,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0810_1.O.0.0",
   (10,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0811_1.O.0.0",
   (11,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0812_1.O.0.0",
   (12,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0813_1.O.0.0",
   (13,0): "off_Planet-Eph-ANC_Static_JPL_000f_20151008_200001010000Z_20000101000000Z_ee00000000000000Z_np",
   (14,0): "off_USNO-PolarWander-UT1-ANC_Ser7_USNO_000f_20181005_201810050000Z_20181005000106Z_ee20181012120000Z_np"
   }
}

where STRSIZE=104 is the length of the longest string (number of chars plus terminator?). The filenames are of differing sizes, but as long as the argument to FixedAscii<> is equal or greater than the longest filename, it works...

println!("\n\nReading dataset (15, 1) attribute...\n");

let dset_attr = dataset.attr("N_Anc_Filename")?;

let v = dset_attr.read_2d::<FixedAscii<104>>()?;

println!("\tv.shape() = {:?}", v.shape());
println!("\tv.strides() = {:?}", v.strides());
println!("\tv.ndim() = {:?}", v.ndim());

let arr = v.iter().collect::<Vec<_>>();

for (idx, val) in arr.iter().enumerate() {
    println!("\tarr[{:?}] = {:?} ({:?})", idx, val.to_string(), val.len());
}

giving

Reading dataset (15, 1) attribute...

v.shape() = [15, 1]
v.strides() = [1, 1]
v.ndim() = 2

arr[0] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0744_1.O.0.0" (74)
arr[1] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0745_1.O.0.0" (74)
arr[2] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0746_1.O.0.0" (74)
arr[3] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0776_1.O.0.0" (74)
arr[4] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0777_1.O.0.0" (74)
arr[5] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0778_1.O.0.0" (74)
arr[6] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0779_1.O.0.0" (74)
arr[7] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0780_1.O.0.0" (74)
arr[8] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0781_1.O.0.0" (74)
arr[9] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0810_1.O.0.0" (74)
arr[10] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0811_1.O.0.0" (74)
arr[11] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0812_1.O.0.0" (74)
arr[12] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0813_1.O.0.0" (74)
arr[13] = "off_Planet-Eph-ANC_Static_JPL_000f_20151008_200001010000Z_20000101000000Z_ee00000000000000Z_np" (94)
arr[14] = "off_USNO-PolarWander-UT1-ANC_Ser7_USNO_000f_20181005_201810050000Z_20181005000106Z_ee20181012120000Z_np" (103)

This basically covers the most complicated use case for the files I am reading.

Next time, please add cross-posts you made on other pages as links in your question to avoid duplicated effort by the community.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.